Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

EMNLP, 2013.

Cited by: 4468|Bibtex|Views401|Links
EI
Keywords:
new challengetensor networkNIPSrecursive neural tensorAir Force Research LaboratoryMore(19+)
Weibo:
We introduced Recursive Neural Tensor Networks and the Stanford Sentiment Treebank

Abstract:

Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a S...More

Code:

Data:

0
Introduction
  • Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010).
  • The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews
  • It was parsed with the Stanford parser (Klein and Manning, 2003) and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
Highlights
  • Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010)
  • We introduce the Stanford Sentiment Treebank and a powerful Recursive Neural Tensor Network that can accurately predict the compositional semantic effects present in this new corpus
  • The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language
  • In order to capture the compositional effects with higher accuracy, we propose a new model called the Recursive Neural Tensor Network (RNTN)
  • A more direct, possibly multiplicative, interaction would allow the model to have greater interactions between the input vectors. Motivated by these ideas we ask the question: Can a single, more powerful composition function perform better and compose aggregate meaning from smaller constituents more accurately than many input specific ones? In order to answer this question, we propose a new model called the Recursive Neural Tensor Network (RNTN)
  • We introduced Recursive Neural Tensor Networks and the Stanford Sentiment Treebank
Methods
  • The first type includes several large quantitative evaluations on the test set.
  • Optimal performance for all models was achieved at word vector sizes between 25 and 35 dimensions and batch sizes between 20 and 30.
  • Performance decreased at larger or smaller vector and batch sizes.
  • This indicates that the RNTN does not outperform the standard RNN due to having more parameters.
Conclusion
  • The combination of new model and data results in a system for single sentence sentiment detection that pushes state of the art by 5.4% for positive/negative sentence classification.
  • Apart from this standard setting, the dataset poses important new challenges and allows for new evaluation metrics.
  • The RNTN obtains 80.7% accuracy on fine-grained sentiment prediction across all phrases and captures negation of different sentiments and scope more accurately than previous models
Summary
  • Introduction:

    Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010).
  • The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews
  • It was parsed with the Stanford parser (Klein and Manning, 2003) and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
  • Methods:

    The first type includes several large quantitative evaluations on the test set.
  • Optimal performance for all models was achieved at word vector sizes between 25 and 35 dimensions and batch sizes between 20 and 30.
  • Performance decreased at larger or smaller vector and batch sizes.
  • This indicates that the RNTN does not outperform the standard RNN due to having more parameters.
  • Conclusion:

    The combination of new model and data results in a system for single sentence sentiment detection that pushes state of the art by 5.4% for positive/negative sentence classification.
  • Apart from this standard setting, the dataset poses important new challenges and allows for new evaluation metrics.
  • The RNTN obtains 80.7% accuracy on fine-grained sentiment prediction across all phrases and captures negation of different sentiments and scope more accurately than previous models
Tables
  • Table1: Accuracy for fine grained (5-class) and binary predictions at the sentence level (root) and for all nodes. left) shows the overall accuracy numbers for fine grained prediction at all phrase lengths and full sentences
  • Table2: left) gives the accuracies over 21 positive sentences and their negation for all models. The RNTN has the highest reversal accuracy, showing its ability to structurally learn negation of positive sentences. But what if the model simply makes phrases very negative when negation is in the sentence? The next experiments show that the model captures more than such a simplistic negation rule. Accuracy of negation detection. Negated positive is measured as correct sentiment inversions. Negated negative is measured as increases in positive activations. right) shows the accuracy. In over 81% of cases, the RNTN correctly increases the positive activations
  • Table3: Examples of n-grams for which the RNTN predicted the most positive and most negative responses
Download tables as Excel
Related work
  • This work is connected to five different areas of NLP research, each with their own large amount of related work to which we cannot do full justice given space constraints.

    Semantic Vector Spaces. The dominant approach in semantic vector spaces uses distributional similarities of single words. Often, co-occurrence statistics of a word and its context are used to describe each word (Turney and Pantel, 2010; Baroni and Lenci, 2010), such as tf-idf. Variants of this idea use more complex frequencies such as how often a word appears in a certain syntactic context (Pado and Lapata, 2007; Erk and Pado, 2008). However, distributional vectors often do not properly capture the differences in antonyms since those often have similar contexts. One possibility to remedy this is to use neural word vectors (Bengio et al, 2003). These vectors can be trained in an unsupervised fashion to capture distributional similarities (Collobert and Weston, 2008; Huang et al, 2012) but then also be fine-tuned and trained to specific tasks such as sentiment detection (Socher et al, 2011b). The models in this paper can use purely supervised word representations learned entirely on the new corpus.
Funding
  • Richard is partly supported by a Microsoft Research PhD fellowship
  • The authors gratefully acknowledge the support of the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) prime contract no
Reference
  • M. Baroni and A. Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721.
    Google ScholarLocate open access versionFindings
  • Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res., 3, March.
    Google ScholarLocate open access versionFindings
  • D. Blakemore. 1989. Denial and contrast: A relevance theoretic analysis of ‘but’. Linguistics and Philosophy, 12:15–37.
    Google ScholarLocate open access versionFindings
  • L. Bottou. 2011. From machine learning to machine reasoning. CoRR, abs/1102.1808.
    Findings
  • S. Clark and S. Pulman. 2007. Combining symbolic and distributional models of meaning. In Proceedings of the AAAI Spring Symposium on Quantum Interaction, pages 52–55.
    Google ScholarLocate open access versionFindings
  • R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML.
    Google ScholarFindings
  • J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12, July.
    Google ScholarLocate open access versionFindings
  • K. Erk and S. Pado. 200A structured vector space model for word meaning in context. In EMNLP.
    Google ScholarLocate open access versionFindings
  • C. Goller and A. Kuchler. 1996. Learning taskdependent distributed representations by backpropagation through structure. In Proceedings of the International Conference on Neural Networks (ICNN-96).
    Google ScholarLocate open access versionFindings
  • E. Grefenstette and M. Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In EMNLP.
    Google ScholarFindings
  • E. Grefenstette, G. Dinu, Y.-Z. Zhang, M. Sadrzadeh, and M. Baroni. 2013. Multi-step regression learning for compositional distributional semantics. In IWCS.
    Google ScholarFindings
  • G. E. Hinton. 1990. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46(12).
    Google ScholarLocate open access versionFindings
  • L. R. Horn. 1989. A natural history of negation, volume 960. University of Chicago Press Chicago.
    Google ScholarFindings
  • E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL.
    Google ScholarFindings
  • M. Israel. 2001.
    Google ScholarFindings
  • Minimizers, maximizers, and the rhetoric of scalar reasoning. Journal of Semantics, 18(4):297–331.
    Google ScholarLocate open access versionFindings
  • R. Jenatton, N. Le Roux, A. Bordes, and G. Obozinski. 2012. A latent factor model for highly multi-relational data. In NIPS.
    Google ScholarFindings
  • D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In ACL.
    Google ScholarFindings
  • R. Lakoff. 1971. If’s, and’s, and but’s about conjunction. In Charles J. Fillmore and D. Terence Langendoen, editors, Studies in Linguistic Semantics, pages 114–149.
    Google ScholarLocate open access versionFindings
  • J. Mitchell and M. Lapata. 2010. Composition in distributional models of semantics. Cognitive Science, 34(8):1388–1429.
    Google ScholarLocate open access versionFindings
  • K. Moilanen and S. Pulman. 2007. Sentiment composition. In In Proceedings of Recent Advances in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • T. Nakagawa, K. Inui, and S. Kurohashi. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In NAACL, HLT.
    Google ScholarLocate open access versionFindings
  • S. Pado and M. Lapata. 2007. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199.
    Google ScholarLocate open access versionFindings
  • B. Pang and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, pages 115–124.
    Google ScholarLocate open access versionFindings
  • B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135.
    Google ScholarLocate open access versionFindings
  • T. A. Plate. 1995. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623– 641.
    Google ScholarLocate open access versionFindings
  • L. Polanyi and A. Zaenen. 2006. Contextual valence shifters. In W. Bruce Croft, James Shanahan, Yan Qu, and Janyce Wiebe, editors, Computing Attitude and Affect in Text: Theory and Applications, volume 20 of The Information Retrieval Series, chapter 1.
    Google ScholarLocate open access versionFindings
  • J. B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46, November.
    Google ScholarLocate open access versionFindings
  • M. Ranzato and A. Krizhevsky G. E. Hinton. 2010. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images. AISTATS.
    Google ScholarLocate open access versionFindings
  • V. Rentoumi, S. Petrakis, M. Klenner, G. A. Vouros, and V. Karkaletsis. 2010. United we stand: Improving sentiment analysis by joining machine learning and rule based methods. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta.
    Google ScholarLocate open access versionFindings
  • S. Rudolph and E. Giesbrecht. 2010. Compositional matrix-space models of language. In ACL.
    Google ScholarFindings
  • B. Snyder and R. Barzilay. 2007. Multiple aspect ranking using the Good Grief algorithm. In HLT-NAACL.
    Google ScholarFindings
  • R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.
    Google ScholarLocate open access versionFindings
  • R. Socher, C. Lin, A. Y. Ng, and C.D. Manning. 2011a. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML.
    Google ScholarLocate open access versionFindings
  • R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. 2011b. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP.
    Google ScholarFindings
  • R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. 2012. Semantic compositionality through recursive matrixvector spaces. In EMNLP.
    Google ScholarFindings
  • I. Sutskever, R. Salakhutdinov, and J. B. Tenenbaum. 2009. Modelling relational data using Bayesian clustered tensor factorization. In NIPS.
    Google ScholarFindings
  • P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188.
    Google ScholarLocate open access versionFindings
  • H. Wang, D. Can, A. Kazemzadeh, F. Bar, and S. Narayanan. 2012. A system for real-time twitter sentiment analysis of 2012 u.s. presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations.
    Google ScholarLocate open access versionFindings
  • D. Widdows. 2008. Semantic vector products: Some initial investigations. In Proceedings of the Second AAAI Symposium on Quantum Interaction.
    Google ScholarLocate open access versionFindings
  • A. Yessenalina and C. Cardie. 2011. Compositional matrix-space models for sentiment analysis. In EMNLP.
    Google ScholarFindings
  • D. Yu, L. Deng, and F. Seide. 2012. Large vocabulary speech recognition using deep tensor neural networks. In INTERSPEECH.
    Google ScholarFindings
  • F.M. Zanzotto, I. Korkontzelos, F. Fallucchi, and S. Manandhar. 2010. Estimating linear models for compositional distributional semantics. In COLING.
    Google ScholarFindings
  • L. Zettlemoyer and M. Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments