Integrating Prior Knowledge and Data-Driven Approaches for Improving Grapheme-to-Phoneme Conversion in Korean Language

Dezhi Cao, Yanan Zhao,Licheng Wu

Research Square (Research Square)(2023)

引用 0|浏览2
暂无评分
摘要
Abstract Currently, grapheme-to-phoneme (G2P) conversion technology is dominated by two methodologies: knowledge-based and data-based approaches. Knowledge-driven methods face challenges in adapting to extensive datasets, while data-driven methods heavily rely on high-quality data and require precise feature selection for model construction. To overcome these challenges, this research proposes an integrated approach that combines prior knowledge with data-driven techniques for automatic G2P conversion in the Korean language. We extract attributes based on pronunciation rules and phonetic transformations between Korean words to construct a decision tree. Subsequently, the model is trained using the data-driven approach for automated phonetic transcription. The proposed model achieves more accurate alignment between input and output variables, effectively capturing phonological variations in continuous Korean speech and determining corresponding phonemes for graphemes. Rigorous cross-validation confirms its superiority, with an average accuracy of 94.63% in grapheme-to-phoneme conversion, outperforming existing methodologies.
更多
查看译文
关键词
korean language,conversion,data-driven,grapheme-to-phoneme
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要