A New Method of Region Embedding for Text Classification

chao qiao
chao qiao
bo huang
bo huang
guocheng niu
guocheng niu
daren li
daren li

international conference on learning representations, 2018.

Cited by: 24|Bibtex|Views33|Links
EI
Keywords:
recurrent neural networkword orderorder informationclassification tasktask specificMore(11+)
Weibo:
This paper proposed two novel architectures for text classification tasks, which learn task specific region embeddings without hand crafted features

Abstract:

To represent a text as a bag of properly identified “phrases” and use the representation for processing the text is proved to be useful. The key question here is how to identify the phrases and represent them. The traditional method of utilizing n-grams can be regarded as an approximation of the approach. Such a method can suffer from dat...More

Code:

Data:

0
Introduction
  • Text classification is an important task for many applications, including topic categorization, search query classification, and sentiment analysis, which has been studied for years.
  • The representations do not take into account the word order information which has been proved to be useful at least in some applications such as sentiment analysis (Pang et al, 2002).
  • 1) The number of n-grams increases exponentially when the length of n-gram n increases
  • This makes it difficult to exploit large n-grams (e.g., n > 4).
  • This makes it difficult to exploit large n-grams (e.g., n > 4). 2) Since the number of parameters in an n-gram model is very large, the estimation of the parameters usually suffers from the data sparsity problem
Highlights
  • Text classification is an important task for many applications, including topic categorization, search query classification, and sentiment analysis, which has been studied for years
  • We report the n-grams and TFIDF baselines from Zhang et al (2015), as well as the character level convolutional model of Zhang & LeCun (2015), the character based convolution recurrent network of Xiao & Cho (2016), the very deep convolutional network (VDCNN) of Conneau et al (2016), the Discriminative LSTM (D-LSTM) of Yogatama et al (2017) and the bigram FastText of Joulin et al (2016)
  • This paper proposed two novel architectures for text classification tasks, which learn task specific region embeddings without hand crafted features
  • To utilize the word specific influences of each word on its context words, a local context unit for each word is learned in addition to word embedding
  • Our models achieve state-of-the-art performances on six benchmark text classification datasets, and the visualization experiments show that our proposed local context unit can capture the semantic and syntactic information for each word
Methods
  • The authors focus on learning the representations of small text regions which preserve the local internal structural information for text classification.
  • Under the influence of high, the positive polarity of crazy vanishes and phrase prices are crazy high performs negative overall.
  • For another case nothing remarkable, but not bad either, things seem more interesting.
  • Without context-unit , remarkable is positive, while nothing, not, bad perform negative, respectively.
  • Phrase but not bad either performs positive overall
Results
  • Yelp F.
  • Amz. F.
  • AG Sogou Yah. A.
Conclusion
  • This paper proposed two novel architectures for text classification tasks, which learn task specific region embeddings without hand crafted features.
  • The authors' models achieve state-of-the-art performances on six benchmark text classification datasets, and the visualization experiments show that the proposed local context unit can capture the semantic and syntactic information for each word.
  • Noticed the power of the local context unit on learning task related region embeddings, the authors are interested in its ability to unsupervised and semi-supervised learning.
  • The authors are curious about whether the authors can achieve better results by introducing more complex upper layers on text classification, and other natural language processing tasks
Summary
  • Introduction:

    Text classification is an important task for many applications, including topic categorization, search query classification, and sentiment analysis, which has been studied for years.
  • The representations do not take into account the word order information which has been proved to be useful at least in some applications such as sentiment analysis (Pang et al, 2002).
  • 1) The number of n-grams increases exponentially when the length of n-gram n increases
  • This makes it difficult to exploit large n-grams (e.g., n > 4).
  • This makes it difficult to exploit large n-grams (e.g., n > 4). 2) Since the number of parameters in an n-gram model is very large, the estimation of the parameters usually suffers from the data sparsity problem
  • Methods:

    The authors focus on learning the representations of small text regions which preserve the local internal structural information for text classification.
  • Under the influence of high, the positive polarity of crazy vanishes and phrase prices are crazy high performs negative overall.
  • For another case nothing remarkable, but not bad either, things seem more interesting.
  • Without context-unit , remarkable is positive, while nothing, not, bad perform negative, respectively.
  • Phrase but not bad either performs positive overall
  • Results:

    Yelp F.
  • Amz. F.
  • AG Sogou Yah. A.
  • Conclusion:

    This paper proposed two novel architectures for text classification tasks, which learn task specific region embeddings without hand crafted features.
  • The authors' models achieve state-of-the-art performances on six benchmark text classification datasets, and the visualization experiments show that the proposed local context unit can capture the semantic and syntactic information for each word.
  • Noticed the power of the local context unit on learning task related region embeddings, the authors are interested in its ability to unsupervised and semi-supervised learning.
  • The authors are curious about whether the authors can achieve better results by introducing more complex upper layers on text classification, and other natural language processing tasks
Tables
  • Table1: Statistics of Datasets
  • Table2: Test Set Accuracy [%] Compared to other Methods on several Datasets
  • Table3: Comparative decomposition results on Yelp Review Full dataset. For FastText(Unigram), embedding dimension is 10. For FastText(Win-pool), W.C.region.emb(Scalar) and W.C.region.emb(our model), region size is 7 and embedding dimension is 128
  • Table4: Visualization of chosen samples on Yelp Review Polarity dataset. Green denotes positive contribution while red denotes negative. Two methods are compared without context unit(No C-unit) and with context unit(With C-unit)
  • Table5: Experimental detailed records on several datasets
  • Table6: Performance variances through several repeated runs on Yelp Datasets
Download tables as Excel
Related work
  • Text classification has been studied for years, traditional approaches focused on feature engineering and using different types of machine learning algorithms. For feature engineering, bag-of-words features are efficient and popular. In addition, the hand-crafted n-grams or phrases are added to make use of word order in text data, which has been shown effective on Wang & Manning (2012). For machine learning algorithms, linear classifiers are widely used, such as naive bayes (McCallum et al, 1998), logistic regression and support vector machines (Joachims, 1998; Fan et al, 2008). However, these models commonly suffer the data sparsity problem.

    Recently, several neural models have been proposed, the pre-trained word embeddings of word2vec (Mikolov et al, 2013) have been widely used as inputs to deep neural models such as recursive tensor networks (Socher et al, 2013). On the other hand, some simple and efficient models which can directly learn task specific word embeddings or fine-tune on pre-trained word embeddings have been proposed recently, such as Deep Averaging Networks (Iyyer et al, 2015), FastText (Joulin et al, 2016). Several neural models have been proposed to make use of word order information, most models are based on convolutional neural network (CNN) (Kim, 2014; Johnson & Zhang, 2014; Zhang et al, 2015) and recurrent neural network (RNN) (Tang et al, 2015; Lai et al, 2015; Yogatama et al, 2017). More recently, the Transformer (Vaswani et al, 2017), a sequence transduction model based solely on attention mechanisms has been proposed. Although Transformer was not designed for the text classification task, it has similarities with our work. In the rest of this section, we will briefly introduce FastText, CNN and Transformer, which are the most relevant to our work.
Funding
  • This paper is supported by National Basic Research Program of China (973 program No.2014CB340505)
Reference
  • Alexis Conneau, Holger Schwenk, Loıc Barrault, and Yann Lecun. Very deep convolutional networks for natural language processing. arXiv preprint arXiv:1606.01781, 2016.
    Findings
  • Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871–1874, 2008.
    Google ScholarLocate open access versionFindings
  • Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pp. 1681–1691, 2015.
    Google ScholarLocate open access versionFindings
  • Thorsten Joachims. Making large-scale svm learning practical. Technical report, Technical Report, SFB 475: Komplexitatsreduktion in Multivariaten Datenstrukturen, Universitat Dortmund, 1998.
    Google ScholarFindings
  • Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058, 2014.
    Findings
  • Rie Johnson and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems, pp. 919–927, 2015.
    Google ScholarLocate open access versionFindings
  • Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.
    Findings
  • Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
    Findings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In AAAI, volume 333, pp. 2267–2273, 2015.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066, 2015.
    Findings
  • Madison, WI, 1998.
    Google ScholarFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119, 2013.
    Google ScholarLocate open access versionFindings
  • Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86. Association for Computational Linguistics, 2002.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1631–1642, 2013.
    Google ScholarLocate open access versionFindings
  • Duyu Tang, Bing Qin, and Ting Liu. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP, pp. 1422–1432, 2015.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 6000–6010. Curran Associates, Inc., 20URL http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
    Locate open access versionFindings
  • Sida Wang and Christopher D Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 90–94. Association for Computational Linguistics, 2012.
    Google ScholarLocate open access versionFindings
  • Yijun Xiao and Kyunghyun Cho. Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint arXiv:1602.00367, 2016.
    Findings
  • Dani Yogatama, Chris Dyer, Wang Ling, and Phil Blunsom. Generative and discriminative text classification with recurrent neural networks. stat, 1050:6, 2017.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang and Yann LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.
    Findings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments