SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics

ACL, pp. 3695-3706, 2020.

Cited by: 1|Bibtex|Views64|Links
EI
Keywords:
tree structuredsentiment classificationpre trainingStanford Sentiment Treebankneural networksMore(6+)
Weibo:
We proposed SentiBERT, an architecture designed for capturing better compositional sentiment semantics

Abstract:

We propose SentiBERT, a variant of BERT that effectively captures compositional sentiment semantics. The model incorporates contextualized representation with binary constituency parse tree to capture semantic composition. Comprehensive experiments demonstrate that SentiBERT achieves competitive performance on phrase-level sentiment cla...More
Introduction
Highlights
  • Sentiment analysis is an important language processing task (Pang et al, 2002, 2008; Liu, 2012)
  • As phrase-level sentiment labels are expensive to obtain, we further explore if the compositional sentiment semantics learned from one task can be transferred to others
  • We demonstrate that the syntactic structure can be combined with contextualized representation such that the semantic compositionality can be better captured
  • We introduce SentiBERT, a model that captures compositional sentiment semantics based on constituency structures of sentences
  • We find the confidence of about 40%-50% of phrase nodes is above 0.9 and the accuracy of predicting these phrases is above 90% on the Stanford Sentiment Treebank dataset
  • We proposed SentiBERT, an architecture designed for capturing better compositional sentiment semantics
Methods
  • The authors evaluate SentiBERT on the SST dataset. The authors evaluate SentiBERT in a transfer learning setting and demonstrate that the compositional sentiment semantics learned on SST can be transferred to other related tasks.

    4.1 Experimental Settings

    The authors evaluate how effective SentiBERT captures the compositional sentiment semantics on SST dataset (Socher et al, 2013).

    The SST dataset has several variants.

    SST-phrase is a 5-class classification task that requires to predict the sentiment of all phrases on a binary constituency tree.
Results
  • 9. The authors observe that the results drops a bit after incorporating token-level sentiment information.
  • The authors observe that the results drops a bit after incorporating token-level sentiment information
  • This may be because the phrase sentiment is composed but the token sentiment mainly depends on the meaning of the lexicon itself rather than a kind of compositional sentiment semantics.
  • The inconsistency of the training objectives may result in the performance drop
Conclusion
  • The authors proposed SentiBERT, an architecture designed for capturing better compositional sentiment semantics.
  • SentiBERT considers the necessity of contextual information and explicit syntactic guidelines for modeling semantic composition.
  • Experiments show the effectiveness and transferability.
  • Accuracy Accuracy Average Recall XLNet. 0 10Per2c0en3ta0ge4o0f p5h0ras6e0-lev7e0l la8b0els90 100.
  • SentiBERT XLNet (a) SST-5 (b) SST-3.
  • Positive Neutral Negative While .
  • The Pumpkin fit
Summary
  • Introduction:

    Sentiment analysis is an important language processing task (Pang et al, 2002, 2008; Liu, 2012).
  • The word “not” changes the sentiment of “really funny”.
  • These types of negation and contrast are often difficult to handle when the sentences are complex (Socher et al, 2013; Tay et al, 2018; Xu et al, 2019)
  • Methods:

    The authors evaluate SentiBERT on the SST dataset. The authors evaluate SentiBERT in a transfer learning setting and demonstrate that the compositional sentiment semantics learned on SST can be transferred to other related tasks.

    4.1 Experimental Settings

    The authors evaluate how effective SentiBERT captures the compositional sentiment semantics on SST dataset (Socher et al, 2013).

    The SST dataset has several variants.

    SST-phrase is a 5-class classification task that requires to predict the sentiment of all phrases on a binary constituency tree.
  • Results:

    9. The authors observe that the results drops a bit after incorporating token-level sentiment information.
  • The authors observe that the results drops a bit after incorporating token-level sentiment information
  • This may be because the phrase sentiment is composed but the token sentiment mainly depends on the meaning of the lexicon itself rather than a kind of compositional sentiment semantics.
  • The inconsistency of the training objectives may result in the performance drop
  • Conclusion:

    The authors proposed SentiBERT, an architecture designed for capturing better compositional sentiment semantics.
  • SentiBERT considers the necessity of contextual information and explicit syntactic guidelines for modeling semantic composition.
  • Experiments show the effectiveness and transferability.
  • Accuracy Accuracy Average Recall XLNet. 0 10Per2c0en3ta0ge4o0f p5h0ras6e0-lev7e0l la8b0els90 100.
  • SentiBERT XLNet (a) SST-5 (b) SST-3.
  • Positive Neutral Negative While .
  • The Pumpkin fit
Tables
  • Table1: The averaged accuracies on SST-phrase and SST-5 tasks (%) for 5 runs. For baselines vanilla BERT and RoBERTa, we use mean-pooling on token representation of top layer to get phrase and sentence representation
  • Table2: The averaged results on sentence-level sentiment classification (%) for 5 runs. For SST-2,3, the metric is accuracy; for Twitter Sentiment Analysis, we use averaged recall value
  • Table3: The averaged results on several emotion classification tasks (%) for 5 runs. For Emotion Intensity Classification task, the metric is averaged Pearson Correlation value of the four subtasks; for EmoContext, we follow the standard metrics used in <a class="ref-link" id="cChatterjee_et+al_2019_a" href="#rChatterjee_et+al_2019_a">Chatterjee et al (2019</a>) and use F1 score as the evaluation metric
  • Table4: Evaluation for contrastive relation (%). We show the accuracy for triple-lets (‘X but Y’, ‘X’, ‘Y’). X and Y must be phrases in our experiments
  • Table5: Statistics of benchmarks
  • Table6: The distribution of nodes in terms of local difficulty
  • Table7: The distribution of nodes in terms of global difficulty
  • Table8: The distribution of nodes in terms of negation words
  • Table9: The results after incorporating token node prediction. ‘Token’ denotes token node prediction
Download tables as Excel
Related work
  • Sentiment Analysis Various approaches have been applied to build a sentiment classifier, including feature-based methods (Hu and Liu, 2004; Pang and Lee, 2004), recursive neural networks (Socher et al, 2012, 2013; Tai et al, 2015), convolution neural networks (Kim, 2014) and recurrent neural networks (Liu et al, 2015). Recently, pretrained language models such as ELMo (Peters et al, 2018), BERT (Devlin et al, 2019) and SentiLR (Ke et al, 2019) achieve high performance in sentiment analysis by constructing contextualized representation. Inspired by these prior studies, we design a transformer-based neural network model to capture compositional sentience semantics by leveraging binary constituency parse tree.

    Semantic Compositionality Semantic composition (Pelletier, 1994) has been widely studied in NLP literature. For example, Mitchell and Lapata (2008) introduce operations such as addition or element-wise product to model compositional semantics. The idea of modeling semantic composition is applied to various areas such as sentiment analysis (Socher et al, 2013; Zhu et al, 2016), semantic relatedness (Marelli et al, 2014) and capturing sememe knowledge (Qi et al, 2019). In this paper, we demonstrate that the syntactic structure can be combined with contextualized representation such that the semantic compositionality can be better captured. Our approach resembles to a few recent attempts (Harer et al, 2019; Wang et al, 2019) to integrate tree structures into self-attention. However, our design is specific for the semantic composition in sentiment analysis.
Funding
  • This material is based upon work supported in part by a gift grant from Taboola
Reference
  • Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. arXiv preprint arXiv:1607.06450.
    Findings
  • Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC, volume 10, pages 2200–2204.
    Google ScholarLocate open access versionFindings
  • Ankush Chatterjee, Kedhar Nath Narahari, Meghana Joshi, and Puneet Agrawal. 2019. Semeval-2019 Task 3: Emocontext Contextual Emotion Detection in Text. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 39–48.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
    Google ScholarLocate open access versionFindings
  • Jacob Harer, Chris Reale, and Peter Chin. 2019. TreeTransformer: A Transformer-Based Method for Correction of Tree-Structured Data. arXiv preprint arXiv:1908.00449.
    Findings
  • Dan Hendrycks and Kevin Gimpel. 2017. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. Proceedings of International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jurgen Schmidhuber. 199Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 168– 177. ACM.
    Google ScholarLocate open access versionFindings
  • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8:64–77.
    Google ScholarLocate open access versionFindings
  • Pei Ke, Haozhe Ji, Siyang Liu, Xiaoyan Zhu, and Minlie Huang. 2019. SentiLR: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis. arXiv preprint arXiv:1911.02493.
    Findings
  • Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014
    Google ScholarLocate open access versionFindings
  • Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Thomas N. Kipf and Max Welling. 2017. SemiSupervised Classification with Graph Convolutional Networks. In ICLR.
    Google ScholarFindings
  • Gunter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-Normalizing Neural Networks. In Advances in Neural Information Processing Systems, pages 971–980.
    Google ScholarLocate open access versionFindings
  • Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1):1–167.
    Google ScholarLocate open access versionFindings
  • Pengfei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang. 2015. Multi-timescale Long ShortTerm Memory Neural Network for Modelling Sentences and Documents. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2326–2335.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60.
    Google ScholarLocate open access versionFindings
  • Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. 2014. Semeval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 1–8.
    Google ScholarLocate open access versionFindings
  • Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In Proceedings of ACL-08: HLT, pages 236–244.
    Google ScholarLocate open access versionFindings
  • Saif Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. Semeval2018 Task 1: Affect in Tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 1–17.
    Google ScholarLocate open access versionFindings
  • Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 271. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs Up?: Sentiment Classification Using Machine Learning techniques. In Proceedings of the
    Google ScholarLocate open access versionFindings
  • ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, pages 79–86. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bo Pang, Lillian Lee, et al. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends R in Information Retrieval, 2(1–2):1–135.
    Google ScholarLocate open access versionFindings
  • Francis Jeffry Pelletier. 1994. The Principle of Semantic Compositionality. Topoi, 13(1):11–24.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227– 2237.
    Google ScholarLocate open access versionFindings
  • Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu, and Maosong Sun. 2019. Modeling Semantic Compositionality with Sememe knowledge. ACL.
    Google ScholarLocate open access versionFindings
  • Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. Semeval-2017 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng. 2012. Semantic Compositionality through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1201–1211. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642.
    Google ScholarLocate open access versionFindings
  • Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1556–1566, Beijing, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Jian Su. 2018. Attentive Gated Lexicon Reader with Contrastive Contextual Co-attention for Sentiment Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3443–3453.
    Google ScholarLocate open access versionFindings
  • Yaushian Wang, Hung-Yi Lee, and Yun-Nung Chen. 2019. Tree Transformer: Integrating Tree Structures into Self-Attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1061–1070, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hu Xu, Bing Liu, Lei Shu, and Philip S Yu. 2019. A Failure of Aspect Sentiment Classifiers and an Adaptive Re-weighting Solution. arXiv preprint arXiv:1911.01460.
    Findings
  • Xiaodan Zhu, Parinaz Sobhani, and Hongyu Guo. 2016. DAG-Structured Long Short-Term Memory for Semantic Compositionality. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 917–926, San Diego, California. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • × SeLU(W2 × b))), EmoInt sad: 1533 / 975 angry: 1701 / 1001 fear: 2252 / 986 (3)
    Google ScholarFindings
  • joy: 1616 / 1105 where SeLU (Klambauer et al., 2017) is an activation function and α equals 4. The two layers of
    Google ScholarFindings
Your rating :
0

 

Tags
Comments