Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Hari Shrawgi,Dilip Singh Sisodia, Piyush Gupta

Key Digital Trends Shaping the Future of Information and Management Science(2023)

引用 0|浏览1
暂无评分
摘要
The automated identification of spoken languages from the voice signals is attributed to automatic Language Identification (LID). Automated LID has many applications, including global customer support systems and voice-based user interfaces for different machines. The hundreds of different languages are popularly spoken around the world and learning of all languages is practically impossible for anyone. The machine learning methods have been used effectively for automation and translation of LID. However, machine learning-based automation of the LID process is heavily reliant on handcrafted feature engineering. The manual feature extraction process is subjective to individual expertise and prone to many deficiencies. The conventional feature extraction not only leads to significant delays in the development of automated LID systems but also leads to inaccurate and non-scalable systems. In this paper, a deep learning-based approach using spectrograms is proposed. The Convolutional Neural Networks (CNN) model is designed for the task of automatic language identification. The proposed model is trained on a dataset from VoxForge on the speech from five different languages, viz. Deutsche, Dutch, English, French, and Portuguese. For this study, evaluation measures like accuracy, precision, recall, and F1-score are used. The new proposed approach has been compared against traditional approaches as well as other existing deep learning approaches for LID. The proposed model outperforms its competitors with an average F1-score of above 0.9 and an accuracy of 91.5%.
更多
查看译文
关键词
spectrograms,convolutional neural networks,neural networks,language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要