MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
CoRR(2024)
摘要
Social media platforms are hubs for multimodal information exchange,
encompassing text, images, and videos, making it challenging for machines to
comprehend the information or emotions associated with interactions in online
spaces. Multimodal Large Language Models (MLLMs) have emerged as a promising
solution to address these challenges, yet struggle with accurately interpreting
human emotions and complex contents like misinformation. This paper introduces
MM-Soc, a comprehensive benchmark designed to evaluate MLLMs' understanding of
multimodal social media content. MM-Soc compiles prominent multimodal datasets
and incorporates a novel large-scale YouTube tagging dataset, targeting a range
of tasks from misinformation detection, hate speech detection, and social
context generation. Through our exhaustive evaluation on ten size-variants of
four open-source MLLMs, we have identified significant performance disparities,
highlighting the need for advancements in models' social understanding
capabilities. Our analysis reveals that, in a zero-shot setting, various types
of MLLMs generally exhibit difficulties in handling social media tasks.
However, MLLMs demonstrate performance improvements post fine-tuning,
suggesting potential pathways for improvement.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要