Significance of Audio Quality in Speech-to-Text Translation Systems

Tonmoy Rajkhowa, Amartya Roy Chowdhury,S. R. Mahadeva Prasanna

SPEECH AND COMPUTER, SPECOM 2023, PT I(2023)

引用 0|浏览0
暂无评分
摘要
Research on Speech-to-Text Translation systems has shown that their performance is influenced by the size and quality of the training corpus. However, most efforts on corpus creation referred to the size and audio-transcription-translation alignment and text translation as measure of quality of a corpus while not emphasizing much on the audio quality as they are scrapped directly from web-based sources and not recorded in a studio environment. Hence, the absence of higher quality audio may present challenges for these systems in learning the mappings between audio and its corresponding text effectively by the Transformer based encoder-decoder architecture. Even with larger corpora, Direct Speech-to-Text Translation systems have struggled to achieve comparable performance as by their cascaded counterparts, unless they utilize additional external data resources or techniques. A comparative study was conducted on the performance of direct and cascaded speech translation systems without using any external data sources directly or indirectly. This study involved comparing the performance by using both the original and generated high-quality audios generated by a Text-to-Speech Synthesis system from two distinct corpora: Prabhupadavani and MuST-C. The findings revealed that direct Speech-to-Text systems can perform similar to their cascaded counterparts, given that they are trained with a larger corpus that contains high-quality audio. The findings suggest that, when building efficient and robust direct speech-to-text translation systems, it is crucial to consider not only the size and translation quality of the corpus but also the audio quality. By incorporating high-quality audio data into the training process, researchers can enhance the performance of direct Speech-to-Text translation systems forming a viable alternative to cascaded systems.
更多
查看译文
关键词
Speech-to-text translation,Direct speech-to-text translation,Text-to-speech synthesis,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要