Using Related Text Sources to Improve Classification of Transcribed Speech Data.
AMLTA(2019)
摘要
Today’s content including user generated content is increasingly found in multimedia format. It is known that speech data are sometimes incorrectly transcribed especially when they are spoken by voices on which the transcribers have not been trained or when they contain unfamiliar words. A familiar mining tasks that helps in storage, indexing and retrieval is automatic classification with predefined category labels. Although state-of-the-art classifiers like neural networks, support vector machines (SVM) and logistic regression classifiers perform quite satisfactory when categorizing written text, their performance degrades when applied on speech data transcribed by automatic speech recognition (ASR) due to transcription errors like insertion and deletion of words, grammatical errors and words that are just transcribed wrongly. In this paper, we show that by incorporating content from related written sources in the training of the classification model has a benefit. We especially focus on and compare different representations that make this integration possible, such as representations of speech data that embed content from the written text and simple concatenation of speech and written content. In addition, we qualitatively demonstrate that these representations to a certain extent indirectly correct the transcription noise.
更多查看译文
关键词
transcribed speech data,related text sources,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络