Font Size:
Comparative Analysis of the Datasets with Multimodal Content
Last modified: 2021-10-18
Abstract
Recent works have shown that multimodal content analysis is a very popular task in various applications including healthcare, security, marketing, etc. It can include a lot of subtasks, but joint vision and language understanding is one of the most trendy. It is needed to use some dataset to perform any machine learning task. Nowadays a lot of rich datasets have appeared. In this work we introduce comparison of datasets that are used in joint vision and language understanding tasks. We present a detailed analysis of the modern datasets, compare their basic characteristics, and describe their potential usage for some practical tasks, especially in the context of our previous works.
Full Text:
PDF