We introduced Recursive Neural Tensor Networks and the Stanford Sentiment Treebank
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a S...更多
下载 PDF 全文
- Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010).
- The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews
- It was parsed with the Stanford parser (Klein and Manning, 2003) and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
- Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010)
- We introduce the Stanford Sentiment Treebank and a powerful Recursive Neural Tensor Network that can accurately predict the compositional semantic effects present in this new corpus
- The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language
- In order to capture the compositional effects with higher accuracy, we propose a new model called the Recursive Neural Tensor Network (RNTN)
- A more direct, possibly multiplicative, interaction would allow the model to have greater interactions between the input vectors. Motivated by these ideas we ask the question: Can a single, more powerful composition function perform better and compose aggregate meaning from smaller constituents more accurately than many input specific ones? In order to answer this question, we propose a new model called the Recursive Neural Tensor Network (RNTN)
- We introduced Recursive Neural Tensor Networks and the Stanford Sentiment Treebank
- The first type includes several large quantitative evaluations on the test set.
- Optimal performance for all models was achieved at word vector sizes between 25 and 35 dimensions and batch sizes between 20 and 30.
- Performance decreased at larger or smaller vector and batch sizes.
- This indicates that the RNTN does not outperform the standard RNN due to having more parameters.
- The combination of new model and data results in a system for single sentence sentiment detection that pushes state of the art by 5.4% for positive/negative sentence classification.
- Apart from this standard setting, the dataset poses important new challenges and allows for new evaluation metrics.
- The RNTN obtains 80.7% accuracy on fine-grained sentiment prediction across all phrases and captures negation of different sentiments and scope more accurately than previous models
- Table1: Accuracy for fine grained (5-class) and binary predictions at the sentence level (root) and for all nodes. left) shows the overall accuracy numbers for fine grained prediction at all phrase lengths and full sentences
- Table2: left) gives the accuracies over 21 positive sentences and their negation for all models. The RNTN has the highest reversal accuracy, showing its ability to structurally learn negation of positive sentences. But what if the model simply makes phrases very negative when negation is in the sentence? The next experiments show that the model captures more than such a simplistic negation rule. Accuracy of negation detection. Negated positive is measured as correct sentiment inversions. Negated negative is measured as increases in positive activations. right) shows the accuracy. In over 81% of cases, the RNTN correctly increases the positive activations
- Table3: Examples of n-grams for which the RNTN predicted the most positive and most negative responses
- This work is connected to five different areas of NLP research, each with their own large amount of related work to which we cannot do full justice given space constraints.
Semantic Vector Spaces. The dominant approach in semantic vector spaces uses distributional similarities of single words. Often, co-occurrence statistics of a word and its context are used to describe each word (Turney and Pantel, 2010; Baroni and Lenci, 2010), such as tf-idf. Variants of this idea use more complex frequencies such as how often a word appears in a certain syntactic context (Pado and Lapata, 2007; Erk and Pado, 2008). However, distributional vectors often do not properly capture the differences in antonyms since those often have similar contexts. One possibility to remedy this is to use neural word vectors (Bengio et al, 2003). These vectors can be trained in an unsupervised fashion to capture distributional similarities (Collobert and Weston, 2008; Huang et al, 2012) but then also be fine-tuned and trained to specific tasks such as sentiment detection (Socher et al, 2011b). The models in this paper can use purely supervised word representations learned entirely on the new corpus.
- Richard is partly supported by a Microsoft Research PhD fellowship
- The authors gratefully acknowledge the support of the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) prime contract no
- M. Baroni and A. Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721.
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res., 3, March.
- D. Blakemore. 1989. Denial and contrast: A relevance theoretic analysis of ‘but’. Linguistics and Philosophy, 12:15–37.
- L. Bottou. 2011. From machine learning to machine reasoning. CoRR, abs/1102.1808.
- S. Clark and S. Pulman. 2007. Combining symbolic and distributional models of meaning. In Proceedings of the AAAI Spring Symposium on Quantum Interaction, pages 52–55.
- R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML.
- J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12, July.
- K. Erk and S. Pado. 200A structured vector space model for word meaning in context. In EMNLP.
- C. Goller and A. Kuchler. 1996. Learning taskdependent distributed representations by backpropagation through structure. In Proceedings of the International Conference on Neural Networks (ICNN-96).
- E. Grefenstette and M. Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In EMNLP.
- E. Grefenstette, G. Dinu, Y.-Z. Zhang, M. Sadrzadeh, and M. Baroni. 2013. Multi-step regression learning for compositional distributional semantics. In IWCS.
- G. E. Hinton. 1990. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46(12).
- L. R. Horn. 1989. A natural history of negation, volume 960. University of Chicago Press Chicago.
- E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL.
- M. Israel. 2001.
- Minimizers, maximizers, and the rhetoric of scalar reasoning. Journal of Semantics, 18(4):297–331.
- R. Jenatton, N. Le Roux, A. Bordes, and G. Obozinski. 2012. A latent factor model for highly multi-relational data. In NIPS.
- D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In ACL.
- R. Lakoff. 1971. If’s, and’s, and but’s about conjunction. In Charles J. Fillmore and D. Terence Langendoen, editors, Studies in Linguistic Semantics, pages 114–149.
- J. Mitchell and M. Lapata. 2010. Composition in distributional models of semantics. Cognitive Science, 34(8):1388–1429.
- K. Moilanen and S. Pulman. 2007. Sentiment composition. In In Proceedings of Recent Advances in Natural Language Processing.
- T. Nakagawa, K. Inui, and S. Kurohashi. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In NAACL, HLT.
- S. Pado and M. Lapata. 2007. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199.
- B. Pang and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, pages 115–124.
- B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135.
- T. A. Plate. 1995. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623– 641.
- L. Polanyi and A. Zaenen. 2006. Contextual valence shifters. In W. Bruce Croft, James Shanahan, Yan Qu, and Janyce Wiebe, editors, Computing Attitude and Affect in Text: Theory and Applications, volume 20 of The Information Retrieval Series, chapter 1.
- J. B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46, November.
- M. Ranzato and A. Krizhevsky G. E. Hinton. 2010. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images. AISTATS.
- V. Rentoumi, S. Petrakis, M. Klenner, G. A. Vouros, and V. Karkaletsis. 2010. United we stand: Improving sentiment analysis by joining machine learning and rule based methods. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.
- S. Rudolph and E. Giesbrecht. 2010. Compositional matrix-space models of language. In ACL.
- B. Snyder and R. Barzilay. 2007. Multiple aspect ranking using the Good Grief algorithm. In HLT-NAACL.
- R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.
- R. Socher, C. Lin, A. Y. Ng, and C.D. Manning. 2011a. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML.
- R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. 2011b. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP.
- R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. 2012. Semantic compositionality through recursive matrixvector spaces. In EMNLP.
- I. Sutskever, R. Salakhutdinov, and J. B. Tenenbaum. 2009. Modelling relational data using Bayesian clustered tensor factorization. In NIPS.
- P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188.
- H. Wang, D. Can, A. Kazemzadeh, F. Bar, and S. Narayanan. 2012. A system for real-time twitter sentiment analysis of 2012 u.s. presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations.
- D. Widdows. 2008. Semantic vector products: Some initial investigations. In Proceedings of the Second AAAI Symposium on Quantum Interaction.
- A. Yessenalina and C. Cardie. 2011. Compositional matrix-space models for sentiment analysis. In EMNLP.
- D. Yu, L. Deng, and F. Seide. 2012. Large vocabulary speech recognition using deep tensor neural networks. In INTERSPEECH.
- F.M. Zanzotto, I. Korkontzelos, F. Fallucchi, and S. Manandhar. 2010. Estimating linear models for compositional distributional semantics. In COLING.
- L. Zettlemoyer and M. Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI.