Raw Ultrasound-Based Phonetic Segments Classification Via Mask Modeling

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
Ultrasound tongue imaging is widely used in clinical linguistics and phonetics. Recently, deep neural networks, especially convolutional neural networks, have been widely used in the interpretation and analysis of ultrasound tongue images (UTI). Despite achieving satisfactory performance, deep models rely on a large amount of manually labeled data, which is often difficult to obtain in practical settings. To address this issue, this paper focuses on how to utilize a large amount of unlabeled UTI data to improve the performance of UTI classification task. Specifically, we explore self-supervised learning with masking modeling strategy. By predicting the masked part, our pre-trained model enables the neural network to infer contextual information. Then, we fine-tune the pre-trained model with a small amount of labeled data. Compared with the previous competing algorithms, our method can improve the classification accuracy by an average of 13.33% in four different scenarios.
更多
查看译文
关键词
Self-supervised learning,Mask modeling,Masked auto-encoder,Ultrasound tongue imaging.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要