Saraiki Language Word Prediction And Spell Correction Framework

Muhammad Farjad Ali Raza,M. Asif Naeem

2022 24th International Multitopic Conference (INMIC)(2022)

引用 1|浏览4
暂无评分
摘要
Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.
更多
查看译文
关键词
Word2vec,fastText,CBOW,Skip-Gram,RoBERTa,NLP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要