Human Voice Recognition for Subtitle Generators on Khmer and Vietnamese Newsletter Videos

An Cong Tran,Loc Phuoc Son, Duy Tuong Lam, Nghi Cong Tran,Hai Thanh Nguyen

2022 RIVF International Conference on Computing and Communication Technologies (RIVF)(2022)

引用 0|浏览2
暂无评分
摘要
Synchronization of subtitles in newsletters or ethnic language programs in videos is essential because of the linguistic isolation of viewers and a problem for broadcasters. Human voice recognition in audio extracted from newsletter videos is an important step in subtitle synchronization to determine the time subtitles appear and detect human voices in the newsletter. This study proposes an approach to detecting human voices in newsletters videos through pre-processing techniques such as Mel-frequency cepstral coefficients and training on deep learning, including a convolutional neural network and a combined network between convolution and Long Short-Term Memory units. In addition, we also examine the effects of selecting hop length in human recognition performances. The proposed method has reached an accuracy of 0.926 in human voice recognition on datasets with Khmer and Vietnamese voices. After training, the model results are expected to predict the appearance of subtitles files that efficiently support subtitle generators.
更多
查看译文
关键词
Mel-frequency cepstral coefficients,Subtitle generators,Human voice recognition,Newsletters,Khmer language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要