Learning To Rank For Multi-Label Text Classification: Combining Different Sources Of Information

NATURAL LANGUAGE ENGINEERING(2021)

引用 25|浏览111
暂无评分
摘要
Efficiently exploiting all sources of information such as labeled instances, classes' representation, and relations of them has a high impact on the performance of Multi-Label Text Classification (MLTC) systems. Most of the current approaches use labeled documents as the primary source of information for MLTC. We investigate the effectiveness of different sources of information- such as the labeled training data, textual labels of classes, and taxonomy relations of classes- for MLTC. More specifically, first, for each document-class pair, different features are extracted using different sources of information. The features reflect the similarity of classes and documents. Then, MLTC is considered to be a ranking problem, and a learning to rank (LTR) approach is used for ranking classes regarding documents and selecting labels of documents. An important characteristic of many MLTC instances is that documents can belong to multiple classes and there are implicit relations between classes. We apply score propagation on top of LTR to incorporate co-occurrence patterns of classes in labeled documents. Our main findings are the following. First, using an LTR approach integrating all features, we observe significantly better performance than previous systems for MLTC. Specifically, we show that simple classification approaches fail when there is a high number of classes. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight into the underlying classification problem. Interestingly, the results indicate that the titles of documents are more informative than all other sources of information. Third, a lean-and-mean system using only four features is able to perform at 96% of the large LTR model that we propose in this paper. Fourth, using the co-occurrence information of classes helps in classifying documents more accurately. Our results show that the co-occurrence information is more helpful when the underlying classifier has a poor performance.
更多
查看译文
关键词
Multi-label text classification, Learning to rank, Sources of information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要