Linguistic Resources for Speech Parsing

LREC(2006)

引用 27|浏览27
暂无评分
摘要
Abstract Wereport on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speechrepairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument,structure). The two annotations were then combined,into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development,of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work ,on speech ,parsing and ,structural event detection. Automatic ,detection of these ,speech ,phenomena ,would simultaneously,improve ,parsing accuracy ,and provide a mechanism ,for cleaning up transcriptions for ,downstream ,text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our ,efforts to develop ,a linguistic resource providing both spoken ,metadata ,and syntactic structure information, and describes the resulting corpus of English conversational speech. 1. Motivation for the Creation of this Corpus Inorder to apply,language processing techniques to speech that have been traditionally applied to text, it is important to address ,the inherent differences between these two ,types of inputs. Textual ,input typically involves words ,that are broken ,into sentences ,and clauses using punctuation that are further organized into chunks such as paragraphs, sections, chapters, articles, books, and so on. Although speech is similar in many ways to text (e.g., it is comprised of words that have the same meaning as in text), it also has many differences, some,stemming ,from the fact that people use different modalities/cognitive ,processes ,when processing/producing these inputs/outputs, and others
更多
查看译文
关键词
english language,natural language,metadata,computational linguistics,parsers,speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要