Pathological Voice Detection and Classification Based on Multimodal Transmission Network.

Journal of voice : official journal of the Voice Foundation(2022)

引用 1|浏览15
暂无评分
摘要
OBJECTIVES:Describing pronunciation features from multiple perspectives can help doctors accurately diagnose the pathological type of a patient's voice. According to the two modal information of sound signal and electroglottography (EGG) signal, this paper proposes a pathological voice detection and classification algorithm based on multimodal transmission network. METHODS:Firstly, we used the short-time Fourier transform (STFT) to map the features of the two signals, and designed the Mel filter to obtain the Mel spectogram. Then, the constructed multimodal transmission network extracted features from Mel spectogram and applied Multimodal Transfer Module (MMTM) module. Finally, the fusion layer can integrate multimodal information, and the full connection layer diagnoses and classifies voice pathology according to the fused features. RESULTS:The experiment was based on 1179 subjects in Saarbrücken voice database (SVD), and the average accuracy, recall, specificity and F1 score of pathological voice classification reached 98.02%, 98.23%, 97.82% and 97.95% respectively. Compared with other algorithms, the classification accuracy is significantly improved. CONCLUSIONS:The proposed model can integrate multiple modal information to obtain more comprehensive and stable voice features and improve the accuracy of pathological voice classification. Future research will further explore in reducing the time-consuming and complexity of the model.
更多
查看译文
关键词
multimodal transmission network,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要