Multilingual, Code-switching and Low-Resource NLP and ASR

Anuj Jitendra Diwan, Ashish Mittal, Shreya Khare, Samarth Bharadwaj, Tejas Dhamecha

semanticscholar(2021)

引用 0|浏览17
暂无评分
摘要
Multilingual, Code-switching and Low-Resource NLP and ASR by Anuj Jitendra Diwan 170070005 Department of Computer Science Many advances in NLP and Speech are powered by availability of data. Only a small fraction of languages that have access to large quantities of data (high-resource languages) benefit from these advances [1]. Data that captures a linguistic phenomenon like code-switching is scarcely available. Developing less data-intensive techniques for low-resource and code-switching languages thus demands radically new approaches. In fact, developing such techniques is extremely important for making technology more inclusive for multilingual speakers of diverse languages. In this thesis, we describe the progress and outcomes of four projects to improve speech recognition for low-resource languages: 1. Transliteration-based Transfer: Developing a novel transfer learning strategy that pretrains an ASR model using speech data from a high-resource language with its text transliterated to the target low-resource language. 2. Indian Languages’ Interspeech 2021 Special Session: Organizing the ‘Multilingual and codeswitching ASR Challenges for low resource Indian languages‘ Special Session at Interspeech 2021. I was one of the primary technical contributors and the first author on the challenge description paper. 3. Bridging Scripts by Grounding in Speech: Using monolingual labelled speech data in a highresource language and a low-resource language to build a high quality transliteration system i.e. grounding transliteration in speech. This can be used to improve the low-resource ASR system via highto low-resource transfer learning. 4. Reduce and Reconstruct: Incorporating linguistic knowledge into low-resource speech recognition by developing linguistically inspired reduced vocabularies. Parts of this thesis were submitted and accepted as papers in Interspeech 2021. These papers are [2] [3] [4].
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要