SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
arxiv(2023)
Abstract
Despite the recent advances of the artificial intelligence, building social
intelligence remains a challenge. Among social signals, laughter is one of the
distinctive expressions that occurs during social interactions between humans.
In this work, we tackle a new challenge for machines to understand the
rationale behind laughter in video, Video Laugh Reasoning. We introduce this
new task to explain why people laugh in a particular video and a dataset for
this task. Our proposed dataset, SMILE, comprises video clips and language
descriptions of why people laugh. We propose a baseline by leveraging the
reasoning capacity of large language models (LLMs) with textual video
representation. Experiments show that our baseline can generate plausible
explanations for laughter. We further investigate the scalability of our
baseline by probing other video understanding tasks and in-the-wild videos. We
release our dataset, code, and model checkpoints on
https://github.com/postech-ami/SMILE-Dataset.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined