Recent Advances on Neural Headline Generation

J. Comput. Sci. Technol., Volume 32, Issue 4, 2017, Pages 768-784.

Cited by: 82|Bibtex|Views85|Links
EI
Keywords:
neural network headline generation data analysis
Weibo:
1) We gave a broad overview of existing approaches based on neural headline generation, with particular focus on how encoders, decoders or training strategies influence the overall performance of the neural headline generation systems

Abstract:

Recently, neural models have been proposed for headline generation by learning to map documents to headlines with recurrent neural network. In this work, we give a detailed introduction and comparison of existing work and recent improvements in neural headline generation, with particular attention on how encoders, decoders and neural mode...More

Code:

Data:

Introduction
  • Automatic text summarization is the process of creating a coherent, informative and brief summary of a document.
  • Most extractive summarization systems usually select a subset of actual sentences from the original documents as a summary.
  • Generative summarization builds the semantic representation of a document and creates a summary with sentences which are not necessarily presented in the original document.
  • Generative summarization needs to understand accurately and represent the semantics of the original document, and generate informative summary according to document representation.
  • Most previous studies heavily rely on modeling latent linguistic structures of input documents, via syntactic or semantic parsing, which always brings certain errors and degrades summarization quality
Highlights
  • Automatic text summarization is the process of creating a coherent, informative and brief summary of a document
  • We discuss the performances in details to gain more insights about what works for neural headline generation systems
  • The differences include: 1) recurrent neural network (RNN) context(W) and COPYNET(W) are word-based, and the others are character-based; 2) only COPYNET(W) incorporates the Copy mechanism; 3) only MRT(C) is trained with minimum risk training algorithm, and the others are trained with maximum likelihood estimation algorithm; 4) RNN context(C) and our MLE system differ in decoding method during test time
  • The MRT training strategy could significantly improve the neural headline generation system performance, and our implemented MRT system achieves the state-of-the-art in existing neural headline generation systems
  • 1) We gave a broad overview of existing approaches based on neural headline generation, with particular focus on how encoders, decoders or training strategies influence the overall performance of the neural headline generation systems
  • 2) We presented a quantitative analysis of recent neural headline generation systems and explored which factors benefit this task
Methods
  • The authors introduce experiments that the authors conduct to help better understand neural headline generation systems.
Results
  • The authors illustrate the overall evaluation results in both English and Chinese test sets.
  • ABS[7] ABS+[7] Luong-NMT[31] words-lvt2k-1sent[3] ABS+AMR[4] RAS-LSTM[2] RAS-Elman[2] LenEmb[11] ASC+FSC[12] MLE MRT.
  • System ABS[7] ABS+[7] Luong-NMT[31] words-lvt2k-1sent[3] ABS+AMR[4] RAS-LSTM[2] RAS-Elman[2] LenEmb[11] ASC+FSC[12] MLE MRT Input.
  • The differences include: 1) RNN context(W) and COPYNET(W) are word-based, and the others are character-based; 2) only COPYNET(W) incorporates the Copy mechanism; 3) only MRT(C) is trained with minimum risk training algorithm, and the others are trained with maximum likelihood estimation algorithm; 4) RNN context(C) and the MLE system differ in decoding method during test time
Conclusion
  • The authors' contributions are as follows. 1) The authors gave a broad overview of existing approaches based on neural headline generation, with particular focus on how encoders, decoders or training strategies influence the overall performance of the neural headline generation systems. 2) The authors presented a quantitative analysis of recent neural headline generation systems and explored which factors benefit this task . 3) The authors performed a detailed error analysis of typical models and datasets to explore the capability of the neural headline generation system.

    To summarize, the authors observed several key factors that affect the performance of headline generation systems. 1) Adding more linguistic features would help to capture more complicated information of input articles. 2) Bi-directional recurrent neural networks perform better on modeling input articles. 3) Attention mechanism consistently benefit neural headline generation systems. 4) Copy mechanism is a promising method to expand the limited target vocabulary with regard to source input. 5) As a sentence-level training strategy, MRT could significantly outperform a word-level training strategy.

    While the neural headline generation system is proved to be superior to all the other systems, the analysis pointed out some aspects of neural headline generation systems that deserve further work, such as the handling of missing salient information, repeating words and extra words.
  • 2) The authors presented a quantitative analysis of recent neural headline generation systems and explored which factors benefit this task .
  • 3) The authors performed a detailed error analysis of typical models and datasets to explore the capability of the neural headline generation system.
  • The authors observed several key factors that affect the performance of headline generation systems.
  • 2) Bi-directional recurrent neural networks perform better on modeling input articles.
  • 3) Attention mechanism consistently benefit neural headline generation systems.
Summary
  • Introduction:

    Automatic text summarization is the process of creating a coherent, informative and brief summary of a document.
  • Most extractive summarization systems usually select a subset of actual sentences from the original documents as a summary.
  • Generative summarization builds the semantic representation of a document and creates a summary with sentences which are not necessarily presented in the original document.
  • Generative summarization needs to understand accurately and represent the semantics of the original document, and generate informative summary according to document representation.
  • Most previous studies heavily rely on modeling latent linguistic structures of input documents, via syntactic or semantic parsing, which always brings certain errors and degrades summarization quality
  • Methods:

    The authors introduce experiments that the authors conduct to help better understand neural headline generation systems.
  • Results:

    The authors illustrate the overall evaluation results in both English and Chinese test sets.
  • ABS[7] ABS+[7] Luong-NMT[31] words-lvt2k-1sent[3] ABS+AMR[4] RAS-LSTM[2] RAS-Elman[2] LenEmb[11] ASC+FSC[12] MLE MRT.
  • System ABS[7] ABS+[7] Luong-NMT[31] words-lvt2k-1sent[3] ABS+AMR[4] RAS-LSTM[2] RAS-Elman[2] LenEmb[11] ASC+FSC[12] MLE MRT Input.
  • The differences include: 1) RNN context(W) and COPYNET(W) are word-based, and the others are character-based; 2) only COPYNET(W) incorporates the Copy mechanism; 3) only MRT(C) is trained with minimum risk training algorithm, and the others are trained with maximum likelihood estimation algorithm; 4) RNN context(C) and the MLE system differ in decoding method during test time
  • Conclusion:

    The authors' contributions are as follows. 1) The authors gave a broad overview of existing approaches based on neural headline generation, with particular focus on how encoders, decoders or training strategies influence the overall performance of the neural headline generation systems. 2) The authors presented a quantitative analysis of recent neural headline generation systems and explored which factors benefit this task . 3) The authors performed a detailed error analysis of typical models and datasets to explore the capability of the neural headline generation system.

    To summarize, the authors observed several key factors that affect the performance of headline generation systems. 1) Adding more linguistic features would help to capture more complicated information of input articles. 2) Bi-directional recurrent neural networks perform better on modeling input articles. 3) Attention mechanism consistently benefit neural headline generation systems. 4) Copy mechanism is a promising method to expand the limited target vocabulary with regard to source input. 5) As a sentence-level training strategy, MRT could significantly outperform a word-level training strategy.

    While the neural headline generation system is proved to be superior to all the other systems, the analysis pointed out some aspects of neural headline generation systems that deserve further work, such as the handling of missing salient information, repeating words and extra words.
  • 2) The authors presented a quantitative analysis of recent neural headline generation systems and explored which factors benefit this task .
  • 3) The authors performed a detailed error analysis of typical models and datasets to explore the capability of the neural headline generation system.
  • The authors observed several key factors that affect the performance of headline generation systems.
  • 2) Bi-directional recurrent neural networks perform better on modeling input articles.
  • 3) Attention mechanism consistently benefit neural headline generation systems.
Tables
  • Table1: Data Statistics of the DUC Datasets
  • Table2: Data Statistics of English Gigaword and LCSTS Datasets
  • Table3: ROUGE Scores on DUC2004 and Gigaword English Test Sets
  • Table4: Model Architectures Corresponding to Table 3
  • Table5: Effect of Different Encoders on DUC2004
  • Table6: Effect of Different Decoders on DUC2004
  • Table7: Effect of Using Different Distance Measures in MRT on DUC2004
  • Table8: ROUGE Scores on Chinese Test Set
  • Table9: Per-Category Performance of MLE and MRT
  • Table10: Performance of Domain Knowledge and Weak Clue Cases with Regard to Original References and Our Manual References
  • Table11: Example from Domain Knowledge Cases
Download tables as Excel
Related work
  • Headline generation is a well-defined task standardized in DUC2003 and DUC2004. Various approaches have been proposed for headline generation: rule-based, statistical-based, and neural-based.

    The rule-based models create a headline for a news article using handcrafted and linguistically motivated rules to guide the choice of a potential headline. Hedge trimmer[1] is a representative example of this approach which creates a headline by removing constituents from the parse tree of the first sentence until it reaches a specific length limit. Statistical-based methods make use of large-scale training data to learn correlations between words in headlines and articles[28]. The best system on DUC2004, TOPIARY[32] combines both linguistic and
Funding
  • This work is supported by the National Basic Research 973 Program of China under Grant No 2014CB340501, the National Natural Science Foundation of China under Grant Nos. 61572273, 61532010, and Microsoft Research Asia under Grant No FY17-RESTHEME-017
Reference
  • Dorr B, Zajic D, Schwartz R. Hedge trimmer: A parse-andtrim approach to headline generation. In Proc. the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics on Text summarization workshop, Volume 5, May 2003, pp.1-8.
    Google ScholarFindings
  • Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks. In Proc. the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2016, pp.93-98.
    Google ScholarLocate open access versionFindings
  • Nallapati R, Zhou B, Santos C. Abstractive text summarization using sequence-to-sequence RNNS and beyond. http://aclweb.org/anthology/K/K16/K16-1028.pdf, May 2017.
    Findings
  • Takase S, Suzuki J, Okazaki N, Hirao T, Nagata M. Neural headline generation on abstract meaning representation. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1054-1059.
    Google ScholarLocate open access versionFindings
  • Hu B, Chen Q, Zhu F. LCSTS: A large scale Chinese short text summarization dataset. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1967-1972.
    Google ScholarLocate open access versionFindings
  • Gu J, Lu Z, Li H, Li V O. Incorporating copying mechanism in sequence-to-sequence learning. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1631-1640.
    Google ScholarLocate open access versionFindings
  • Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.379-389.
    Google ScholarLocate open access versionFindings
  • Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is diffcult. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.
    Google ScholarLocate open access versionFindings
  • Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. the Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1724-1734.
    Google ScholarLocate open access versionFindings
  • Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.
    Google ScholarLocate open access versionFindings
  • Kikuchi Y, Neubig G, Sasano R, Takamura H, Okumura M. Controlling output length in neural encoder-decoders. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1328-1338.
    Google ScholarLocate open access versionFindings
  • Miao Y, Blunsom P. Language as a latent variable: Discrete generative models for sentence compression. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.319-328.
    Google ScholarLocate open access versionFindings
  • Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681.
    Google ScholarLocate open access versionFindings
  • Bengio Y, Ducharme R, Vincent P, Jauvin C. A neural probabilistic language model. The Journal of Machine Learning Research, 2003, 3: 1137-1155.
    Google ScholarLocate open access versionFindings
  • Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. ICLR, May 2015.
    Google ScholarLocate open access versionFindings
  • Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692.
    Google ScholarLocate open access versionFindings
  • Ranzato M, Chopra S, Auli M, Zaremba W. Sequence level training with recurrent neural networks. In Proc. ICLR, May 2016.
    Google ScholarLocate open access versionFindings
  • Och F J. Minimum error rate training in statistical machine translation. In Proc. the 41st Annual Meeting on Association for Computational Linguistics, July 2003, pp.160-167.
    Google ScholarLocate open access versionFindings
  • Smith D A, Eisner J. Minimum risk annealing for training log-linear models. In Proc. the COLING/ACL Main Conference Poster Sessions, July 2006, pp.787-794.
    Google ScholarLocate open access versionFindings
  • Gao J, He X, Yih W, Deng L. Learning continuous phrase representations for translation modeling. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014.
    Google ScholarLocate open access versionFindings
  • Lin C Y. ROUGE: A package for automatic evaluation of summaries. In Proc. the Workshop on Text Summarization Branches Out, July 2004.
    Google ScholarLocate open access versionFindings
  • Gulcehre C, Ahn S, Nallapati R, Zhou B, Bengio Y. Pointing the unknown words. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.140-149.
    Google ScholarLocate open access versionFindings
  • Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. Advances in Neural Information Processing Systems, Dec. 2015, pp.2692-2700.
    Google ScholarLocate open access versionFindings
  • Jean S, Cho K, Memisevic R, Bengio Y. On using very large target vocabulary for neural machine translation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, July 2015, pp.1-10.
    Google ScholarLocate open access versionFindings
  • Vogel S, Ney H, Tillmann C. HMM-based word alignment in statistical translation. In Proc. the 16th Conference on Computational Linguistics, Aug. 1996, pp.836-841.
    Google ScholarLocate open access versionFindings
  • Tillmann C, Vogel S, Ney H, Zubiaga A. A DP-based search using monotone alignments in statistical translation. In Proc. the 35th Annual Meeting of the Association for Computational Linguistics, July 1997, pp.289-296.
    Google ScholarLocate open access versionFindings
  • Yu L, Buys J, Blunsom P. Online segment to segment neural transduction. In Proc. the Conference on Empirical Methods in Natural Language Processing, November 2016, pp.1307-1316.
    Google ScholarLocate open access versionFindings
  • Banko M, Mittal V O, Witbrock M J. Headline generation based on statistical translation. In Proc. the 38th Annual Meeting of the Association for Computational Linguistics, Oct. 2000, pp.318-325.
    Google ScholarLocate open access versionFindings
  • Napoles C, Gormley M, van Durme B. Annotated Gigaword. In Proc. the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, June 2012, pp.95-100.
    Google ScholarLocate open access versionFindings
  • Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, May 2017.
    Findings
  • Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In Proc. the Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1412-1421.
    Google ScholarLocate open access versionFindings
  • Zajic D, Dorr B, Schwartz R. BBN/UMD at DUC-2004: Topiary. In Proc. the HLT-NAACL Document Understanding Workshop, Jan. 2004, pp.112-119.
    Google ScholarLocate open access versionFindings
  • Cheng J, Lapata M. Neural summarization by extracting sentences and words. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016.
    Google ScholarLocate open access versionFindings
  • Cao Z, Li W, Li S, Wei F, Li Y. AttSum: Joint learning of focusing and summarization with neural attention. In Proc. the 26th International Conference on Computational Linguistics, December 2016, pp.547-556.
    Google ScholarLocate open access versionFindings
  • Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code. In Proc. the 33rd International Conference on Machine Learning, June 2016, pp.2091-2100. Ayana is a Ph.D. student of the Department of Computer Science and Technology, Tsinghua University, Beijing. She got her B.E. degree in computer science from the College of Computer Science and Technology, Inner Mongolia University, Hohhot, in 2006, and got her M.E. degree in 2009 from the College of Computer Science and Technology, Inner Mongolia University, Hohhot. Her research interest is document summarization.
    Google ScholarLocate open access versionFindings
  • Shi-Qi Shen is a Ph.D. student of the Department of Computer Science and Technology, Tsinghua University, Beijing. He got his B.E. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, in 2012. His research interests are in the area of machine translation and deep learning for natural language processing.
    Google ScholarLocate open access versionFindings
  • Yan-Kai Lin is a Ph.D. student of the Department of Computer Science and Technology, Tsinghua University, Beijing. He got his B.E. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, in 2014. His research interest is knowledge graph. text text text text
    Google ScholarLocate open access versionFindings
  • Cun-Chao Tu is a Ph.D. student of the Department of Computer Science and Technology, Tsinghua University, Beijing. He got his B.E. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, in 2013. His research interests are user representation and social computation.
    Google ScholarLocate open access versionFindings
  • Yu Zhao is now working at IBM China Research Lab. He got his B.E. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, in 2010, and he got his Ph.D. degree in computer science from Department of Computer Science and Technology, Tsinghua University, Beijing, in 2016. His main research interests include semantic analysis and representation learning. He has experiences in the fields of text classification, compositional semantics, and entity linking.
    Google ScholarLocate open access versionFindings
  • Zhi-Yuan Liu is an assistant researcher of the Department of Computer Science and Technology, Tsinghua University, Beijing. He got his B.E. degree and Ph.D. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, in 2006 and 2011 respectively. His research interests are natural language processing and social computation. He has published over 40 papers in international journals and conferences including ACM Transactions, IJCAI, AAAI, ACL and EMNLP. He was awarded Tsinghua Excellent Doctoral Dissertation in 2011, and obtained Excellent Doctoral Dissertation awarded by Chinese Association for Artificial Intelligence in 2012, and Excellent Post-doctoral Fellow Award at Tsinghua University in 2013.
    Google ScholarLocate open access versionFindings
  • Mao-Song Sun is a professor at the Department of Computer Science and Technology in Tsinghua University, Beijing. He received his Ph.D. degree in computational linguistics from City University of Hong Kong, Hong Kong, in 2004. His research interests include natural language processing, Web intelligence, and machine learning.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments