An Annotated Speech Corpus of Rare Dialect for Recognition-Take Dali Dialect as an Example

Tian Huang, Dongqi Yang, Wanyun Qin, Shubo Zhang,Binyang Li, Yan Li

COGNITIVE COMPUTING, ICCC 2021(2022)

引用 0|浏览32
暂无评分
摘要
Nowadays, the widespread use of Automatic Speech Recognition technology, aided by AI products, has made people's lives easier. Despite significant progress in Mandarin recognition in Chinese, a number of rare dialects remain unrecognized due to a lack of the corresponding speech corpus. In this paper, we propose a method for creating a rare dialect speech corpus using daily spoken words as recording content. Following the labeling of all dialect audios, a dialect speech corpus with complete information, including speech, text, and labeling, is established. Experimenting with a Dali dialect speech corpus of 2400 audios demonstrates the efficacy of the proposed method. Finally, the verification of annotation consistency using the Kappa value clearly improves the quality of the Dali dialect speech corpus.
更多
查看译文
关键词
Dialect speech corpus annotation, Rare dialect, Automatic Speech Recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要