Hierarchical Neural Representation for Document Classification

Cognitive Computation(2019)

引用 17|浏览40
暂无评分
摘要
Text representation, which converts text spans into real-valued vectors or matrices, is a crucial tool for machines to understand the semantics of text. Although most previous works employed classic methods based on statistics and neural networks, such methods might suffer from data sparsity and insensitivity to the text structure, respectively. To address the above drawbacks, we propose a general and structure-sensitive framework, i.e., the hierarchical architecture. Specifically, we incorporate the hierarchical architecture into three existing neural network models for document representation, thereby producing three new representation models for document classification, i.e., TextHFT, TextHRNN, and TextHCNN. Our comprehensive experimental results on two public datasets demonstrate the effectiveness of the hierarchical architecture. With a comparable (or substantially less) time expense, our proposals obtain significant improvements ranging from 4.65 to 35.08% in terms of accuracy against the baseline. We can conclude that the hierarchical architecture can enhance the classification performance. In addition, we find that the benefits provided by the hierarchical architecture can be strengthened as the document length increases.
更多
查看译文
关键词
Document representation,Neural networks,Hierarchical architecture,Document classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要