AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We have presented a neural attention-based model for abstractive summarization, based on recent developments in neural machine translation

A Neural Attention Model for Abstractive Sentence Summarization

Conference on Empirical Methods in Natural Language Processing, (2015): 379-389

Cited by: 2241|Views531
EI

Abstract

Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the...More

Code:

Data:

0
Introduction
  • Summarization is an important challenge of natural language understanding. The aim is to produce a condensed representation of an input text that captures the core meaning of the original.
  • The goal is to produce a condensed summary.
  • M }, sentences as a sequence of indicators, and X as the set of possible inputs.
  • The authors will assume that the words in the summary come from the same vocabulary V and that the output is a sequence y1, .
  • Note that in contrast to related tasks, like machine translation, the authors will assume that the output length N is fixed, and that the system knows the length of the summary before generation.2
Highlights
  • Summarization is an important challenge of natural language understanding
  • The aim is to produce a condensed representation of an input text that captures the core meaning of the original
  • Abstractive summarization attempts to produce a bottom-up summary, aspects of which may not appear as part of the original
  • Our main results are presented in Table 1
  • We have presented a neural attention-based model for abstractive summarization, based on recent developments in neural machine translation
Methods
  • The authors experiment with the attention-based sentence summarization model on the task of headline generation.
  • The standard sentence summarization evaluation set is associated with the DUC-2003 and DUC2004 shared tasks (Over et al, 2007)
  • The data for this task consists of 500 news articles from the New York Times and Associated Press Wire services each paired with 4 different human-generated reference summaries, capped at 75 bytes.
  • The full data set is available by request at http://duc.nist.gov/data.html
Results
  • The authors' main results are presented in Table 1
  • The authors run experiments both using the DUC-2004 evaluation data set (500 sentences, 4 references, 75 bytes) with all systems and a randomly held-out.
  • The PREFIX baseline performs surprisingly well on ROUGE-1 which makes sense given the earlier observed overlap between article and summary.
  • Both ABS and MOSES+ perform better than TOPIARY, on ROUGE-2 and ROUGE-L in DUC.
  • Note that the additional extractive features bias the system towards retaining more input words, which is useful for the underlying metric
Conclusion
  • The authors have presented a neural attention-based model for abstractive summarization, based on recent developments in neural machine translation
  • The authors combine this probabilistic model with a generation algorithm which produces accurate abstractive summaries.
  • As a step the authors would like to further improve the grammaticality of the summaries in a data-driven way, as well as scale this system to generate paragraph-level summaries.
  • I(1): a detained iranian-american academic accused of acting against national security has been released from a tehran prison after a hefty bail was posted , a to p judiciary official said tuesday.
  • G: iranian-american academic held in tehran released on bail A: detained iranian-american academic released from jail after posting bail A+: detained iranian-american academic released from prison after hefty bail
Tables
  • Table1: Experimental results on the main summary tasks on various ROUGE metrics . Baseline models are described in detail in Section 7.2. We report the percentage of tokens in the summary that also appear in the input for Gigaword as Ext %
  • Table2: Perplexity results on the Gigaword validation set comparing various language models with C=5 and endto-end summarization models. The encoders are defined in Section 3
  • Table3: ROUGE scores on DUC-2003 development data for various versions of inference. Greedy and Beam are described in Section 4. Ext. is a purely extractive version of the system (Eq 2)
Download tables as Excel
Related work
  • Abstractive sentence summarization has been traditionally connected to the task of headline generation. Our work is similar to early work of Banko et al (2000) who developed a statistical machine translation-inspired approach for this task using a corpus of headline-article pairs. We extend this approach by: (1) using a neural summarization model as opposed to a count-based noisy-channel model, (2) training the model on much larger scale (25K compared to 4 million articles), (3) and allowing fully abstractive decoding.

    This task was standardized around the DUC2003 and DUC-2004 competitions (Over et al, 2007). The TOPIARY system (Zajic et al, 2004) performed the best in this task, and is described in detail in the next section. We point interested readers to the DUC web page (http://duc.nist. gov/) for the full list of systems entered in this shared task.
Funding
  • Proposes a fully data-driven approach to abstractive sentence summarization
  • Returns to the question of generation for factored models, and in Section 5 introduces a modified factored scoring function
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Findings
  • Michele Banko, Vibhu O Mittal, and Michael J Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 318–325. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Janvin. 200A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137–1155.
    Google ScholarLocate open access versionFindings
  • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 201Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
    Findings
  • James Clarke and Mirella Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, pages 399–429.
    Google ScholarLocate open access versionFindings
  • Trevor Cohn and Mirella Lapata. 2008. Sentence compression beyond word deletion. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pages 137–144. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Hal Daume III and Daniel Marcu. 2002. A noisychannel model for document compression. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 449–456. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bonnie Dorr, David Zajic, and Richard Schwartz. 2003. Hedge trimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLTNAACL 03 on Text summarization workshop-Volume 5, pages 1–Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Katja Filippova and Yasemin Altun. 2013. Overcoming the lack of parallel data in sentence compression. In EMNLP, pages 1481–1491.
    Google ScholarLocate open access versionFindings
  • David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2003. English gigaword. Linguistic Data Consortium, Philadelphia.
    Google ScholarFindings
  • Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580.
    Findings
  • Hongyan Jing. 2002. Using hidden markov modeling to decompose human-written summaries. Computational linguistics, 28(4):527–543.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner and Phil Blunsom. 20Recurrent continuous translation models. In EMNLP, pages 1700–1709.
    Google ScholarLocate open access versionFindings
  • Kevin Knight and Daniel Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91–107.
    Google ScholarLocate open access versionFindings
  • Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177–180. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81.
    Google ScholarLocate open access versionFindings
  • Thang Luong, Ilya Sutskever, Quoc V Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206.
    Findings
  • Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to information retrieval, volume 1. Cambridge university press Cambridge.
    Google ScholarFindings
  • Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60.
    Google ScholarLocate open access versionFindings
  • Courtney Napoles, Matthew Gormley, and Benjamin Van Durme. 2012. Annotated gigaword. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 95–100. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 160–167. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Paul Over, Hoa Dang, and Donna Harman. 2007. Duc in context. Information Processing & Management, 43(6):1506–1520.
    Google ScholarLocate open access versionFindings
  • Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Kristian Woodsend, Yansong Feng, and Mirella Lapata. 2010. Generation with quasi-synchronous grammar. In Proceedings of the 2010 conference on empirical methods in natural language processing, pages 513– 523. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Sander Wubben, Antal Van Den Bosch, and Emiel Krahmer. 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 1015–1024. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Omar Zaidan. 2009. Z-mert: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79–88.
    Google ScholarLocate open access versionFindings
  • David Zajic, Bonnie Dorr, and Richard Schwartz. 2004. Bbn/umd at duc-2004: Topiary. In Proceedings of the HLT-NAACL 2004 Document Understanding Workshop, Boston, pages 112–119.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科