The Art of Natural Language Processing: Classical, Modern and Contemporary Approaches to Text Document Classification
Social Science Research Network(2020)
摘要
In this tutorial we introduce three approaches to preprocess text data with Natural Language Processing (NLP) and perform text document classification using machine learning. The first approach is based on 'bag-of-' models, the second one on word embeddings, while the third one introduces the two most popular Recurrent Neural Networks (RNNs), i.e. the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures. We apply all approaches on a case study where we classify movie reviews using Python and Tensorflow 2.0. The results of the case study show that extreme gradient boosting algorithms outperform adaptive boosting and random forests on bag-of-words and word embedding models, as well as LSTM and GRU RNNs, but at a steep computational cost. Finally, we provide the reader with comments on NLP applications for the insurance industry.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要