A Novel classification framework for the Thirukkural for building an efficient search system

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS(2022)

引用 1|浏览0
暂无评分
摘要
Thirukkural, a Tamil classic literature, which was written in 300 BCE is a didactic literature. Though Thirukkural comprises 1330 couplets which are organized into three sections and 133 chapters, in order to retrieve meaningful Thirukkural for a given query in search systems, a better organization of the Thirukkural is needed. This paper lays such a foundation by classifying the Thirukkural into ten new categories called superclasses that is helpful for building a better Information Retrieval (IR) system. The classifier is trained using Multinomial Naive B ayes algorithm. Each superclass is further classified into two subcategories based on the didactic information. The proposed classification framework is evaluated using precision, recall and F-score metrics and achieved an overall F-score of 82.33% and a comparison analysis has been done with the Support Vector Machine, Logistic Regression and Random Forest algorithms. An IR system is built on top of the proposed system and the performance comparison has been done with the Google search and a locally built keyword search. The proposed classification framework has achieved a mean average precision score of 89%, whereas the Google search and keyword search have yielded 59% and 68% respectively.
更多
查看译文
关键词
Natural language processing, text classification, information retrieval, multinomial naive bayes classifier, the Thirukkural, morphological analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要