Unsupervised Opinion Summarization as Copycat-Review Generation

ACL, pp. 5151-5169, 2020.

Cited by: 0|Bibtex|Views67|Links
EI
Keywords:
unsupervised abstractiveAmazon Mechanical Turkneural networktext summarizationopinion summarizationMore(18+)
Weibo:
We evaluate our approach on two datasets, Amazon product reviews and Yelp reviews of businesses

Abstract:

Opinion summarization is the task of automatically creating summaries that reflect subjective information expressed in multiple documents, such as product reviews. While the majority of previous work has focused on the extractive setting, i.e., selecting fragments from input reviews to produce a summary, we let the model generate novel se...More
0
Introduction
  • Summarization of user opinions expressed in online resources, such as blogs, reviews, social media, or internet forums, has drawn much attention due to its potential for various information access applications, such as creating digests, search, and report.
Highlights
  • Summarization of user opinions expressed in online resources, such as blogs, reviews, social media, or internet forums, has drawn much attention due to its potential for various information access applications, such as creating digests, search, and report
  • We evaluate our approach on two datasets, Amazon product reviews and Yelp reviews of businesses
  • We introduce a simple end-to-end approach to unsupervised abstractive summarization;
  • The full architecture used to produce the latent codes c and zi is shown in Figure 2
  • As we argued in the introduction and will revisit in experiments, a summary or summarizing review, should be generated relying on the mean of the reviews’ latent code
Methods
  • A GRU encoder (Cho et al, 2014) embeds review words w to obtain hidden states h.
  • The authors set the prior over group latent codes to the standard normal distribution, p(c) = N (c; 0, I).
  • In order to compute the approximate posterior qφ(c|r1:N ), the authors first predict the contribution (‘importance’) of each word in each review αit to the code of the group: αit =
  • As in Kingma and Welling (2013), the authors use separate linear projections (LPs) to compute the means and diagonal log-covariances.
Results
  • Opinosis is a graph-based abstractive summarizer (Ganesan et al, 2010) designed to generate short opinions based on highly redundant texts.
  • It is referred to as abstractive, it can only select words from the reviews.
  • LexRank is an unsupervised algorithm which selects sentences to appear in the summary based on graph centrality.
  • MeanSum5 is the unsupervised abstractive summarization model (Chu and Liu, 2019) discussed in the introduction.
  • When generating a summary for r1, ..., rN , the authors averaged the means of qφ
Conclusion
  • The authors presented an abstractive summarizer of opinions, which does not use any summaries in training and is trained end-to-end on a large collection of reviews.
  • The model compares favorably to the competitors, especially to the only other unsupervised abstractive multi-review summarization system.
  • Human evaluation of the generated summaries shows that those created by the model better reflect the content of the input
Summary
  • Introduction:

    Summarization of user opinions expressed in online resources, such as blogs, reviews, social media, or internet forums, has drawn much attention due to its potential for various information access applications, such as creating digests, search, and report.
  • Methods:

    A GRU encoder (Cho et al, 2014) embeds review words w to obtain hidden states h.
  • The authors set the prior over group latent codes to the standard normal distribution, p(c) = N (c; 0, I).
  • In order to compute the approximate posterior qφ(c|r1:N ), the authors first predict the contribution (‘importance’) of each word in each review αit to the code of the group: αit =
  • As in Kingma and Welling (2013), the authors use separate linear projections (LPs) to compute the means and diagonal log-covariances.
  • Results:

    Opinosis is a graph-based abstractive summarizer (Ganesan et al, 2010) designed to generate short opinions based on highly redundant texts.
  • It is referred to as abstractive, it can only select words from the reviews.
  • LexRank is an unsupervised algorithm which selects sentences to appear in the summary based on graph centrality.
  • MeanSum5 is the unsupervised abstractive summarization model (Chu and Liu, 2019) discussed in the introduction.
  • When generating a summary for r1, ..., rN , the authors averaged the means of qφ
  • Conclusion:

    The authors presented an abstractive summarizer of opinions, which does not use any summaries in training and is trained end-to-end on a large collection of reviews.
  • The model compares favorably to the competitors, especially to the only other unsupervised abstractive multi-review summarization system.
  • Human evaluation of the generated summaries shows that those created by the model better reflect the content of the input
Tables
  • Table1: A summary produced by our model; colors encode its alignment to the input reviews. The reviews are truncated, and delimited with the symbol ‘||’
  • Table2: Data statistics after pre-processing. The format in the cells is Businesses/Reviews and Products/Reviews for Yelp and Amazon, respectively
  • Table3: ROUGE scores on the Yelp test set
  • Table4: ROUGE scores on the Amazon test set
  • Table5: Human evaluation results in terms of the Best-Worst scaling on the Yelp dataset
  • Table6: Human evaluation results in terms of the Best-Worst scaling on the Amazon dataset
  • Table7: Content support on Yelp and Amazon datasets, percentages
  • Table8: Ablations, ROUGE scores on Amazon
  • Table9: Amazon summaries of the full model with sampled and mean assignment to z. The assignment to c was fixed, and was the mean value based on the approximate posterior qφ(c|r1, ..., rN )
  • Table10: Yelp summaries produced by different models
  • Table11: Table 11
  • Table12: Table 12
  • Table13: Amazon summaries produced by different models
Download tables as Excel
Related work
  • Extractive weakly-supervised opinion summarization has been an active area of research. A recent example is Angelidis and Lapata (2018). First, they learn to assign sentiment polarity to review segments in a weakly-supervised fashion. Then, they induce aspect labels for segments relying on a small sample of gold summaries. Finally, they use a heuristic to construct a summary of segments. Opinosis (Ganesan et al, 2010) does not use any supervision. The model relies on redundancies in opinionated text and PoS tags in order to generate short opinions. This approach is not well suited for the generation of coherent long summaries and although it can recombine fragments of input text, it cannot generate novel words and phrases. LexRank (Erkan and Radev, 2004) is an unsupervised extractive approach which builds a graph in order to determine the importance of sentences, and then selects the most representative ones as a summary. Isonuma et al (2019) introduce an unsupervised approach for single review summarization, where they rely on latent discourse trees. Other earlier approaches (Gerani et al, 2014; Di Fabbrizio et al, 2014) relied on text planners and templates, while our approach does not require rules and can produce fluent and varied text. Finally, conceptually related methods were applied to unsupervised single sentence compression (West et al, 2019; Baziotis et al, 2019; Miao and Blunsom, 2016). The most related approach to ours is MeanSum (Chu and Liu, 2019) which treats a summary as a discrete latent state of an autoencoder. In contrast, we define a hierarchical model of a review collection and use continuous latent codes.
Funding
  • We gratefully acknowledge the support of the European Research Council (Titov: ERC StG BroadSem 678254; Lapata: ERC CoG TransModal 681760) and the Dutch National Science Foundation (NWO VIDI 639.022.518)
Reference
  • Stefanos Angelidis and Mirella Lapata. 2018. Summarizing opinions: Aspect extraction meets sentiment prediction and they are both weakly supervised. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3675–3686.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Regina Barzilay, Kathleen R McKeown, and Michael Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pages 550–557.
    Google ScholarLocate open access versionFindings
  • Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, and Alexandros Potamianos. 2019. Seq3: Differentiable sequence-to-sequence-to-sequence autoencoder for unsupervised abstractive sentence compression. In Proceedings of the Association for Computational Linguistics, pages 673–681.
    Google ScholarLocate open access versionFindings
  • Julian Besag. 197Statistical analysis of non-lattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3):179–195.
    Google ScholarLocate open access versionFindings
  • John Blitzer, Mark Dredze, and Fernando Pereira. 2007.
    Google ScholarFindings
  • Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics, pages 440–447.
    Google ScholarLocate open access versionFindings
  • Samuel Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the Twentieth Conference on Computational Natural Language Learning (CoNLL).
    Google ScholarLocate open access versionFindings
  • Giuseppe Carenini and Jackie Chi Kit Cheung. 2008. Extractive vs. nlg-based abstractive summarization of evaluative text: The effect of corpus controversiality. In Proceedings of the Fifth International Natural Language Generation Conference, pages 33–41. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724– 1734.
    Google ScholarLocate open access versionFindings
  • Eric Chu and Peter Liu. 2019. Meansum: a neural model for unsupervised multi-document abstractive summarization. In Proceedings of International Conference on Machine Learning (ICML), pages 1223–1232.
    Google ScholarLocate open access versionFindings
  • Hoa Trang Dang. 2005. Overview of duc 2005. In Proceedings of the document understanding conference, volume 2005, pages 1–12.
    Google ScholarLocate open access versionFindings
  • Giuseppe Di Fabbrizio, Amanda Stent, and Robert Gaizauskas. 2014. A hybrid approach to multidocument summarization of opinions in reviews. pages 54–63.
    Google ScholarFindings
  • Gunes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479.
    Google ScholarLocate open access versionFindings
  • Tobias Falke, Leonardo FR Ribeiro, Prasetya Ajie Utama, Ido Dagan, and Iryna Gurevych. 2019. Ranking generated summaries by correctness: An interesting but challenging application for natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2214–2220.
    Google ScholarLocate open access versionFindings
  • Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, and Lawrence Carin. 2019. Cyclical annealing schedule: A simple approach to mitigating kl vanishing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 240– 250.
    Google ScholarLocate open access versionFindings
  • Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 340–348.
    Google ScholarLocate open access versionFindings
  • Shima Gerani, Yashar Mehdad, Giuseppe Carenini, Raymond T Ng, and Bita Nejat. 2014. Abstractive summarization of product reviews using discourse structure. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1602–1613.
    Google ScholarLocate open access versionFindings
  • Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256.
    Google ScholarLocate open access versionFindings
  • Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th international conference on world wide web, pages 507–517. International World Wide Web Conferences Steering Committee.
    Google ScholarLocate open access versionFindings
  • Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
    Findings
  • Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177. ACM.
    Google ScholarLocate open access versionFindings
  • Masaru Isonuma, Toru Fujino, Junichiro Mori, Yutaka Matsuo, and Ichiro Sakata. 2017. Extractive summarization using multi-task learning with document classification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2101–2110.
    Google ScholarLocate open access versionFindings
  • Masaru Isonuma, Junichiro Mori, and Ichiro Sakata. 2019. Unsupervised neural single-document summarization of reviews via learning latent discourse structure and its ranking. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Diederik P Kingma and Max Welling. 2013. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114.
    Findings
  • Svetlana Kiritchenko and Saif M Mohammad. 2016. Capturing reliable fine-grained sentiment associations by crowdsourcing and best–worst scaling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 811–817.
    Google ScholarLocate open access versionFindings
  • Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in neural information processing systems, pages 3294–3302.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pages 177–180.
    Google ScholarLocate open access versionFindings
  • Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
    Google ScholarLocate open access versionFindings
  • Peter J Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018. Generating wikipedia by summarizing long sequences. In Proceedings of International Conference on Learning Representations (ICLR).
    Google ScholarLocate open access versionFindings
  • Jordan J Louviere, Terry N Flynn, and Anthony Alfred John Marley. 2015. Best-worst scaling: Theory, methods and applications. Cambridge University Press.
    Google ScholarFindings
  • Jordan J Louviere and George G Woodworth. 1991. Best-worst scaling: A model for the largest difference judgments. University of Alberta: Working Paper.
    Google ScholarFindings
  • Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4):1093–1113.
    Google ScholarLocate open access versionFindings
  • Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. 2007. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, pages 171–180. ACM.
    Google ScholarLocate open access versionFindings
  • Yishu Miao and Phil Blunsom. 2016. Language as a latent variable: Discrete generative models for sentence compression. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 319–328.
    Google ScholarLocate open access versionFindings
  • Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. Aistats, 5:246–252.
    Google ScholarLocate open access versionFindings
  • Ramesh Nallapati, Bowen Zhou, Cicero dos Santos, Caglar Gulcehre, and Bing Xiang. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 280–290.
    Google ScholarLocate open access versionFindings
  • Bryan Orme. 2009. Maxdiff analysis: Simple counting, individual-level logit, and hb. Sequim, WA: Sawtooth Software.
    Google ScholarFindings
  • Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
    Google ScholarFindings
  • Romain Paulus, Caiming Xiong, and Richard Socher. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
    Findings
  • Ofir Press and Lior Wolf. 2017. Using the output embedding to improve language models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 157–163.
    Google ScholarLocate open access versionFindings
  • Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 379–389.
    Google ScholarLocate open access versionFindings
  • Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointergenerator networks. In Proceedings of Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of the 17th international conference on World Wide Web, pages 111–120. ACM.
    Google ScholarLocate open access versionFindings
  • Peter West, Ari Holtzman, Jan Buys, and Yejin Choi. 2019. Bottlesum: Unsupervised and self-supervised sentence summarization using the information bottleneck principle. arXiv preprint arXiv:1909.07405.
    Findings
  • First, we sampled 15 products from each of the Amazon review categories: Electronics; Clothing, Shoes and Jewelry; Home and Kitchen; Health and Personal Care. Then, we selected 8 reviews from each product to be summaries. We used the same requirements for workers as for human evaluation in A.4. We assigned 3 workers to each product, and instructed them to read the reviews and produce a summary text. We followed the instructions provided in (Chu and Liu, 2019), and used the following points in our instructions:
    Google ScholarLocate open access versionFindings
  • the decoder essentially becomes a uncoditional language model, for which beam search was shown to lead to generation of repetitions (Holtzman et al., 2019).
    Google ScholarFindings
Your rating :
0

 

Tags
Comments