AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The main novelty of our model is the combination of matrix-vector representations with a recursive neural network

Semantic compositionality through recursive matrix-vector spaces

EMNLP-CoNLL, pp.1201-1211, (2012)

Cited by: 1480|Views287
EI
Full Text
Bibtex
Weibo

Abstract

Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and s...More

Code:

Data:

0
Introduction
  • Semantic word vector spaces are at the core of many useful natural language applications such as search query expansions (Jones et al, 2006), fact extraction for information retrieval (Pasca et al, 2006) and automatic annotation of text with disambiguated Wikipedia links (Ratinov et al, 2011), among many others (Turney and Pantel, 2010)
  • In these models the meaning of a word is encoded as a vector computed from co-occurrence statistics of a word and its neighboring words.
  • The authors extend these approaches with a more general and powerful model of semantic composition
Highlights
  • Semantic word vector spaces are at the core of many useful natural language applications such as search query expansions (Jones et al, 2006), fact extraction for information retrieval (Pasca et al, 2006) and automatic annotation of text with disambiguated Wikipedia links (Ratinov et al, 2011), among many others (Turney and Pantel, 2010)
  • The results show that the MV-recursive neural network (RNN) operators are powerful enough to capture the operational meanings of various types of adverbs
  • In our last experiment we show that the matrix-vector recursive neural network (MV-RNN) can learn how a syntactic context composes an aggregate meaning of the semantic relationships between words
  • We introduced a new model towards a complete treatment of compositionality in word vector spaces
  • By adding WordNet hypernyms, POS and NER tags our model outperforms the state of the art that uses significantly more resources
  • The main novelty of our model is the combination of matrix-vector representations with a recursive neural network
Methods
  • Uniform Mean train p (a +.
  • B) p=a⊗b p = [a; b] p = Ab RNN Linear MVR MV-RNN.
  • Avg KL 0.5 0.4 0.327 0.3 fairly annoying MV−RNN RNN.
  • 0.103 0.101 0.103 0.093 0.092 0.091 not annoying
Results
  • By adding WordNet hypernyms, POS and NER tags the model outperforms the state of the art that uses significantly more resources.
  • The state of the art recursive autoencoder model of Socher et al (2011c) obtained 77.7% accuracy.
  • Features were computed using the code of Ciaramita and Altun (2006)..
  • Features were computed using the code of Ciaramita and Altun (2006).3
  • With these features, the performance improved over the state of the art system
Conclusion
  • Evaluation and GeneralityWe introduced a new model towards a complete treatment of compositionality in word vector spaces.
  • The main novelty of the model is the combination of matrix-vector representations with a recursive neural network.
  • It can learn both the meaning vectors of a word and how that word modifies its neighbors.
  • The MV-RNN combines attractive theoretical properties with good performance on large, noisy datasets
  • It generalizes several models in the literature, can learn propositional logic, accurately predicts sentiment and can be used to classify semantic relationships between nouns in a sentence
Tables
  • Table1: Accuracy of classification on full length movie review polarity (MR)
  • Table2: Hard movie review examples of positive (1) and negative (0) sentiment (S.) that o√f all methods only the MV-RNN predicted correctly (C: ) or could not classify as correct either (C: x)
  • Table3: Examples of correct classifications of ordered, semantic relations between nouns by the MV-RNN. Note that the final classifier is a recursive, compositional function of all the words in the syntactic path between the bracketed words. The paths vary in length and the words vary in type
  • Table4: Learning methods, their feature sets and F1 results for predicting semantic relations between nouns. The MV-RNN outperforms all but one method without any additional feature sets. By adding three such features, it obtains state of the art performance
Download tables as Excel
Related work
  • Distributional approaches have become omnipresent for the recognition of semantic similarity between words and the treatment of compositionality has seen much progress in recent years. Hence, we cannot do justice to the large amount of literature. Commonly, single words are represented as vectors of distributional characteristics – e.g., their frequencies in specific syntactic relations or their co-occurrences with given context words (Pado and Lapata, 2007; Baroni and Lenci, 2010; Turney and Pantel, 2010). These representations have proven very effective in sense discrimination and disambiguation (Schutze, 1998), automatic thesaurus extraction (Lin, 1998; Curran, 2004) and selectional preferences.

    There are several sophisticated ideas for compositionality in vector spaces. Mitchell and Lapata (2010) present an overview of the most important compositional models, from simple vector addition and component-wise multiplication to tensor products, and convolution (Metcalfe, 1990). They measured the similarity between word pairs such as compound nouns or verb-object pairs and compared these with human similarity judgments. Simple vector averaging or multiplication performed best, hence our focus on related baselines above.
Funding
  • The authors gratefully acknowledges the support of the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no
Study subjects and analysis
data: 6
This objective function (see Sec. 2.4) is different to all previously published work except that of (Socher et al, 2011c). We cross-validated all models over regularization parameters for word vectors, the softmax classifier, the RNN parameter W and the word operators (10−4, 10−3) and word vector sizes (n = 6, 8, 10, 12, 15, 20). All models performed best at vector sizes of below 12

Reference
  • M. Baroni and A. Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721.
    Google ScholarLocate open access versionFindings
  • M. Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In EMNLP.
    Google ScholarFindings
  • L. Bottou. 2011. From machine learning to machine reasoning. CoRR, abs/1102.1808.
    Findings
  • M. Ciaramita and Y. Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In EMNLP.
    Google ScholarFindings
  • S. Clark and S. Pulman. 2007. Combining symbolic and distributional models of meaning. In Proceedings of the AAAI Spring Symposium on Quantum Interaction, pages 52–55.
    Google ScholarLocate open access versionFindings
  • R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML.
    Google ScholarFindings
  • J. Curran. 2004. From Distributional to Semantic Similarity. Ph.D. thesis, University of Edinburgh.
    Google ScholarFindings
  • J. L. Elman. 1991. Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2-3).
    Google ScholarLocate open access versionFindings
  • G. Frege. 1892. Uber Sinn und Bedeutung. In Zeitschrift fur Philosophie und philosophische Kritik, 100.
    Google ScholarFindings
  • D. Garrette, K. Erk, and R. Mooney. 2011. Integrating Logical Representations with Probabilistic Information using Markov Logic. In Proceedings of the International Conference on Computational Semantics.
    Google ScholarLocate open access versionFindings
  • C. Goller and A. Kuchler. 1996. Learning taskdependent distributed representations by backpropagation through structure. In Proceedings of the International Conference on Neural Networks (ICNN-96).
    Google ScholarLocate open access versionFindings
  • E. Grefenstette and M. Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In EMNLP.
    Google ScholarFindings
  • T. L. Griffiths, J. B. Tenenbaum, and M. Steyvers. 2007. Topics in semantic representation. Psychological Review, 114.
    Google ScholarLocate open access versionFindings
  • I. Hendrickx, S.N. Kim, Z. Kozareva, P. Nakov, D. O Seaghdha, S. Pado, M. Pennacchiotti, L. Romano, and S. Szpakowicz. 2010. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation.
    Google ScholarLocate open access versionFindings
  • G. E. Hinton. 1990. Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46(12).
    Google ScholarLocate open access versionFindings
  • R. Jones, B. Rey, O. Madani, and W. Greiner. 2006. Generating query substitutions. In Proceedings of the 15th international conference on World Wide Web.
    Google ScholarLocate open access versionFindings
  • D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In ACL.
    Google ScholarFindings
  • D. Lin. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL, pages 768–774.
    Google ScholarLocate open access versionFindings
  • E. J. Metcalfe. 1990. A compositive holographic associative recall model. Psychological Review, 88:627– 661.
    Google ScholarLocate open access versionFindings
  • J. Mitchell and M. Lapata. 2010. Composition in distributional models of semantics. Cognitive Science, 34(8):1388–1429.
    Google ScholarLocate open access versionFindings
  • R. Montague. 1974. English as a formal language. Linguaggi nella Societa e nella Tecnica, pages 189–224.
    Google ScholarFindings
  • T. Nakagawa, K. Inui, and S. Kurohashi. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In NAACL, HLT.
    Google ScholarLocate open access versionFindings
  • M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. 2006. Names and similarities on the web: fact extraction in the fast lane. In ACL.
    Google ScholarLocate open access versionFindings
  • S. Pado and M. Lapata. 2007. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199.
    Google ScholarLocate open access versionFindings
  • B. Pang and L. Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, pages 115–124.
    Google ScholarLocate open access versionFindings
  • T. A. Plate. 1995. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623– 641.
    Google ScholarLocate open access versionFindings
  • J. B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46, November.
    Google ScholarLocate open access versionFindings
  • C. Potts. 2010. On the negativity of negation. In David Lutz and Nan Li, editors, Proceedings of Semantics and Linguistic Theory 20. CLC Publications, Ithaca, NY.
    Google ScholarLocate open access versionFindings
  • L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and global algorithms for disambiguation to wikipedia. In ACL.
    Google ScholarFindings
  • B. Rink and S. Harabagiu. 2010. UTD: Classifying semantic relations by combining lexical and semantic resources. In Proceedings of the 5th International Workshop on Semantic Evaluation.
    Google ScholarLocate open access versionFindings
  • S. Rudolph and E. Giesbrecht. 2010. Compositional matrix-space models of language. In ACL.
    Google ScholarFindings
  • H. Schutze. 1998. Automatic word sense discrimination. Computational Linguistics, 24:97–124.
    Google ScholarLocate open access versionFindings
  • R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.
    Google ScholarLocate open access versionFindings
  • R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. 2011a. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS. MIT Press.
    Google ScholarFindings
  • R. Socher, C. Lin, A. Y. Ng, and C.D. Manning. 2011b. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML.
    Google ScholarLocate open access versionFindings
  • R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning. 2011c. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In EMNLP.
    Google ScholarFindings
  • P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37:141–188.
    Google ScholarLocate open access versionFindings
  • D. Widdows. 2008. Semantic vector products: Some initial investigations. In Proceedings of the Second AAAI Symposium on Quantum Interaction.
    Google ScholarLocate open access versionFindings
  • A. Yessenalina and C. Cardie. 2011. Compositional matrix-space models for sentiment analysis. In EMNLP.
    Google ScholarFindings
  • F.M. Zanzotto, I. Korkontzelos, F. Fallucchi, and S. Manandhar. 2010. Estimating linear models for compositional distributional semantics. COLING.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科