Exploring Hierarchical Multi-Label Text Classification Models using Attention-Based Approaches for Vietnamese language

Van Lam, Khoi Quach,Long Nguyen,Dien Dinh

PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023(2023)

引用 0|浏览0
暂无评分
摘要
The Hierarchical Attention-based Recurrent Neural Network (HARNN) is a system designed to categorize documents efficiently, taking into account both the content of the texts and their hierarchical category structure. This system is comprised of three primary components: the Document Representation Layer (DRL), which is used for semantic encoding, the Hierarchical Attention-based Recurrent Layer (HARL), that models dependencies between different hierarchical levels, and the Hybrid Predicting Layer (HPL), which is responsible for accurate category predictions. In this research, we put HARNN to the test, using a dataset of Vietnamese articles from VnExpress. We then contrast the performance of four different word embeddings (Word2Vec, FastText, PhoBERT, and BERT multilingual). Additionally, we introduce a domain-based approach for the HARNN model to compare the accuracy with the original manner. Experimental findings indicate that HARNN performs effectively in the context of Vietnamese language and that our domain-based approach can be advantageous in specific domains HMTC task.
更多
查看译文
关键词
Hierarchical Attention-based Recurrent Neural Network,Word Embedding,Vietnamese articles
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要