Evaluating the accuracy of forced alignment across Mandarin varieties

The Journal of the Acoustical Society of America(2022)

引用 1|浏览1
暂无评分
摘要
Forced alignment is widely used in phonetics to align transcripts with acoustic signals. These tools are trained on specific language varieties; it is unclear if they generalize to others. Previous research on English by MacKenzie and Turton (2020) finds good agreement between automated and human alignments for varieties that differ from the training variety. Such evaluation has only been carried out for English. We evaluate the level of human-aligner agreement on four Mandarin varieties (Canto, Shanghai, Beijing, and Tianjin). For each variety, two recordings from the HUB5 Corpus (LDC 1998) were aligned manually and by the Montreal Forced Aligner [McAuliffe et al. (2017)] using acoustic models trained on Beijing, Wuhan, and Hekou Mandarin [Schultz (2002)]. We find strong agreement between human and machine-aligned phone boundaries, with 17 ms as the median onset displacement. A mixed model identifies little variation across varieties or according to speech rate, but significant interindividual variation. Notably, despite the generally close agreement between the machine and human alignments, for two of the speakers, more than 10% of the alignments are displaced by over 100 ms. In sum, the Mandarin forced-aligner yields reliable alignments for out-of-training varieties, but manual checking of the results is still crucial.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要