TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
arxiv(2024)
摘要
Hierarchical text classification aims to categorize each document into a set
of classes in a label taxonomy. Most earlier works focus on fully or
semi-supervised methods that require a large amount of human annotated data
which is costly and time-consuming to acquire. To alleviate human efforts, in
this paper, we work on hierarchical text classification with the minimal amount
of supervision: using the sole class name of each node as the only supervision.
Recently, large language models (LLM) show competitive performance on various
tasks through zero-shot prompting, but this method performs poorly in the
hierarchical setting, because it is ineffective to include the large and
structured label space in a prompt. On the other hand, previous
weakly-supervised hierarchical text classification methods only utilize the raw
taxonomy skeleton and ignore the rich information hidden in the text corpus
that can serve as additional class-indicative features. To tackle the above
challenges, we propose TELEClass, Taxonomy Enrichment and LLM-Enhanced
weakly-supervised hierarchical text classification, which (1) automatically
enriches the label taxonomy with class-indicative topical terms mined from the
corpus to facilitate classifier training and (2) utilizes LLMs for both data
annotation and creation tailored for the hierarchical label space. Experiments
show that TELEClass can outperform previous weakly-supervised hierarchical text
classification methods and LLM-based zero-shot prompting methods on two public
datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要