A studyforrest extension, an annotation of spoken language in the German dubbed movie "Forrest Gump" and its audio-description.

F1000Research(2021)

引用 4|浏览0
暂无评分
摘要
Here we present an annotation of speech in the audio-visual movie "Forrest Gump" and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset's quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation's content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.
更多
查看译文
关键词
annotation,fMRI,language,narrative,naturalistic stimulus,speech,studyforrest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要