DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization

AAAI, Volume abs/1811.02394, 2019.

Cited by: 9|Bibtex|Views1532|Links
EI
Keywords:
Summarizationsorority houseAttentionreinforcement learningextractive document summarizationMore(21+)
Weibo:
We propose DeepChannel, consisting of a deep neural network-based channel model and an iterative extraction strategy, for extractive document summarization

Abstract:

We propose DeepChannel, a robust, data-efficient, and interpretable neural model for extractive document summarization. Given any document-summary pair, we estimate a salience score, which is modeled using an attention-based deep neural network, to represent the salience degree of the summary for yielding the document. We devise a contras...More

Code:

Data:

0
Introduction
  • Automatic document summarization is a challenging task in natural language understanding, aiming to compress a textual document to a shorter highlight that contains the most representative information of the original text.
  • SummaRuNNer (Nallapati, Zhai, and Zhou 2017) uses a Recurrent Neural Network (RNN) based sequence model for extractive summarization, Refresh (Narayan, Cohen, and Lapata 2018) assigns each document sentence a score to indicate its probability of being.
  • D: Rutgers University has banned fraternity and sorority house parties at its main campus in New Brunswick, New Jersey, for the rest of the spring semester after several alcohol-related problems this school year, including the death of a student.
Highlights
  • Automatic document summarization is a challenging task in natural language understanding, aiming to compress a textual document to a shorter highlight that contains the most representative information of the original text
  • Extractive summarization methods, on which this paper focuses, aim to select salient snippets, sentences or passages directly from the input document, while abstractive summarization generates summaries that may have words or phrases not present in the input
  • We propose DeepChannel, an extractive summarization approach consisting of a deep neural network for salience estimation and a salience-guided greedy extraction strategy;
  • To compare the robustness of models, we conducted outof-domain experiments by training models on CNN/Daily Mail training set while evaluating on DUC 2007 dataset
  • We propose DeepChannel, consisting of a deep neural network-based channel model and an iterative extraction strategy, for extractive document summarization
  • Experiments on CNN/Daily Mail demonstrate that our model performs on par with state-of-the-art summarization systems
Methods
  • 3 SummaRuNNer Refresh SWAP-NET rnn-ext + RL DeepChannel Abstractive PointerGenerator ML+RL+intra-attention controlled inconsistency loss Rouge-1 Rouge-2 Rouge-L
Results
  • Table 2 shows the performance comparison between the DeepChannel and state-of-the-art baselines on the CNN/Daily Mail dataset using full-length Rouge F-1 as the metric.
  • The authors can see that DeepChannel obtains Rouge-1 score of 19.53 at 75 bytes and 28.85 at 275 bytes, stably and significantly better than other three baselines, demonstrating the strong robustness of the model.
  • The Rouge score of SummaRunner, Refresh, and especially PointerGenerator, all suffer a drastic drop on the reduced training set.
  • Attributed to the salient estimation, DeepChannel has strong generalization ability and can learn from a very small training set and avoid overfitting to a great extent
Conclusion
  • The authors propose DeepChannel, consisting of a deep neural network-based channel model and an iterative extraction strategy, for extractive document summarization.
  • Experiments on CNN/Daily Mail demonstrate that the model performs on par with state-of-the-art summarization systems.
  • DeepChannel has three significant advantages: 1) strong robustness to domain variations; 2) high data efficiency; 3) high interpretability.
  • The authors will try to take the language model P (S) into account, to reflect the influence and coherence between adjacent sentences
Summary
  • Introduction:

    Automatic document summarization is a challenging task in natural language understanding, aiming to compress a textual document to a shorter highlight that contains the most representative information of the original text.
  • SummaRuNNer (Nallapati, Zhai, and Zhou 2017) uses a Recurrent Neural Network (RNN) based sequence model for extractive summarization, Refresh (Narayan, Cohen, and Lapata 2018) assigns each document sentence a score to indicate its probability of being.
  • D: Rutgers University has banned fraternity and sorority house parties at its main campus in New Brunswick, New Jersey, for the rest of the spring semester after several alcohol-related problems this school year, including the death of a student.
  • Methods:

    3 SummaRuNNer Refresh SWAP-NET rnn-ext + RL DeepChannel Abstractive PointerGenerator ML+RL+intra-attention controlled inconsistency loss Rouge-1 Rouge-2 Rouge-L
  • Results:

    Table 2 shows the performance comparison between the DeepChannel and state-of-the-art baselines on the CNN/Daily Mail dataset using full-length Rouge F-1 as the metric.
  • The authors can see that DeepChannel obtains Rouge-1 score of 19.53 at 75 bytes and 28.85 at 275 bytes, stably and significantly better than other three baselines, demonstrating the strong robustness of the model.
  • The Rouge score of SummaRunner, Refresh, and especially PointerGenerator, all suffer a drastic drop on the reduced training set.
  • Attributed to the salient estimation, DeepChannel has strong generalization ability and can learn from a very small training set and avoid overfitting to a great extent
  • Conclusion:

    The authors propose DeepChannel, consisting of a deep neural network-based channel model and an iterative extraction strategy, for extractive document summarization.
  • Experiments on CNN/Daily Mail demonstrate that the model performs on par with state-of-the-art summarization systems.
  • DeepChannel has three significant advantages: 1) strong robustness to domain variations; 2) high data efficiency; 3) high interpretability.
  • The authors will try to take the language model P (S) into account, to reflect the influence and coherence between adjacent sentences
Tables
  • Table1: Examples of different degrees of salience. We consider P (D|S1) > P (D|S2) because S1 contains more important information compared with S2 and thus is more salient for yielding D
  • Table2: Performance on CNN/Daily Mail test set using the full length Rouge F-1 score
  • Table3: Performance on DUC 2007 dataset using the limited length recall variants of Rouge. The upper section are results at 75 bytes, and the lower are results at 275 bytes. DeepChannel outperforms other baselines stably, indicating that it is more robust for the out-of-domain application
  • Table4: Performance when training on reduced CNN/Daily Mail training set. The full-length Rouge F-1 scores on CNN/Daily Mail test set are reported. The two sections are results of 1/10 and 1/100 respectively. Our model can obtain high scores even with only 1/100 training samples, while other baselines, especially the seq2seq-based PointerGenerator, suffer a significant performance degradation on reduced training set
  • Table5: Example documents and gold summaries from CNN/Daily Mail test set. The sentences chosen by DeepChannel for extractive summarization are highlighted in bold, and the corresponding summary sentences which have equivalent semantics are underlined
  • Table6: Performance on CNN/Daily Mail test set with different weights of the penalization term
Download tables as Excel
Related work
  • Traditional summarization methods usually depend on manual rules and expert knowledge, such as the expanding rules of noisy-channel models (Daume III and Marcu 2002; Knight and Marcu 2002), objectives and constraints of Integer Linear Programming (ILP) models (Woodsend and Lapata 2012; Parveen, Ramsl, and Strube 2015; Bing et al 2015), human-engineered features of some sequence classification methods (Shen et al 2007), and so on.

    Deep learning models can learn continuous features automatically and have made substantial progress in multiple NLP areas. Many deep learning-based summarization models have been proposed recently for both extractive and abstractive summarization tasks.

    Extractive. (Nallapati, Zhai, and Zhou 2017) considers the extraction as a sequence classification task and proposes SummaRuNNer, a simple RNN based model that decides whether or not to include a sentence in the summary. (Wu and Hu 2018) takes the coherence of summaries into account and designs a reinforcement learning (RL) method to maximize the combined ROUGE (Lin 2004) and coherence reward. (Narayan, Cohen, and Lapata 2018) conceptualizes extractive summarization as a sentence ranking task and optimizes the ROUGE evaluation metric through an RL objective. (Jadhav and Rajan 2018) models the interaction of keywords and salient sentences using a two-level pointer network and combines them to generate the extractive summary.

    Abstractive. A vast majority of abstractive summarizers are built based on the encoder-decoder structure. (See, Liu, and Manning 2017) incorporates a pointing mechanism into the encoder-decoder, such that their model can directly copy words from the source text while decoding summaries. (Paulus, Xiong, and Socher 2017) combines the standard cross-entropy loss and RL objectives to maximize the ROUGE metric at the same time of sequence prediction training. (Chen and Bansal 2018) proposes a fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary. Their hybrid approach jointly learns an extractor and a rewriter, capable of both extractive and abstractive summarization. (Hsu et al 2018) also combines extraction and abstraction, but they implement it by unifying a sentence-level attention and a word-level attention and guiding these two parts with an inconsistency loss.
Funding
  • The work is supported by NSFC key projects (U1736204, 61533018, 61661146007), Ministry of Education and China Mobile Research Fund (No 20181770250), and THUNUS NExT Co-Lab
Reference
  • Bing, L.; Li, P.; Liao, Y.; Lam, W.; Guo, W.; and Passonneau, R. 2015. Abstractive multi-document summarization via phrase selection and merging. In ACL-IJCNLP.
    Google ScholarFindings
  • Chen, Y.-C., and Bansal, M. 2018. Fast abstractive summarization with reinforce-selected sentence rewriting. arXiv preprint arXiv:1805.11080.
    Findings
  • Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    Findings
  • Daume III, H., and Marcu, D. 2002. A noisy-channel model for document compression. In ACL.
    Google ScholarFindings
  • Erhan, D.; Bengio, Y.; Courville, A.; Manzagol, P.-A.; Vincent, P.; and Bengio, S. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research.
    Google ScholarLocate open access versionFindings
  • Fan, A.; Grangier, D.; and Auli, M. 2017. Controllable abstractive summarization. arXiv preprint arXiv:1711.05217.
    Findings
  • Hermann, K. M.; Kocisky, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; and Blunsom, P. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, 1693–1701.
    Google ScholarLocate open access versionFindings
  • Hsu, W.-T.; Lin, C.-K.; Lee, M.-Y.; Min, K.; Tang, J.; and Sun, M. 201A unified model for extractive and abstractive summarization using inconsistency loss. arXiv preprint arXiv:1805.06266.
    Findings
  • Iyyer, M.; Boyd-Graber, J.; Claudino, L.; Socher, R.; and Daume III, H. 2014. A neural network for factoid question answering over paragraphs. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 633–644.
    Google ScholarLocate open access versionFindings
  • Jadhav, A., and Rajan, V. 2018. Extractive summarization with swap-net: Sentences and words from alternating pointer networks. In ACL.
    Google ScholarFindings
  • Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Knight, K., and Marcu, D. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence.
    Google ScholarFindings
  • Lin, Z.; Feng, M.; Santos, C. N. d.; Yu, M.; Xiang, B.; Zhou, B.; and Bengio, Y. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.
    Findings
  • Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.
    Google ScholarFindings
  • Logeswaran, L., and Lee, H. 2018. An efficient framework for learning sentence representations. In ICLR.
    Google ScholarFindings
  • Luong, M.-T.; Pham, H.; and Manning, C. D. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
    Findings
  • Mnih, A., and Hinton, G. E. 2009. A scalable hierarchical distributed language model. In NIPS.
    Google ScholarFindings
  • Nallapati, R.; Zhou, B.; dos Santos, C.; Gulcehre, C.; and Xiang, B. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL.
    Google ScholarFindings
  • Nallapati, R.; Zhai, F.; and Zhou, B. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI.
    Google ScholarFindings
  • Narayan, S.; Cohen, S. B.; and Lapata, M. 2018. Ranking sentences for extractive summarization with reinforcement learning. In NAACL.
    Google ScholarFindings
  • Parveen, D.; Ramsl, H.-M.; and Strube, M. 2015. Topical coherence for graph-based extractive summarization. In EMNLP.
    Google ScholarFindings
  • Paulus, R.; Xiong, C.; and Socher, R. 2017. A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304.
    Findings
  • Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In EMNLP.
    Google ScholarFindings
  • Peyrard, M., and Eckle-Kohler, J. 2017. Supervised learning of automatic pyramid for optimization-based multi-document summarization. In ACL.
    Google ScholarFindings
  • See, A.; Liu, P. J.; and Manning, C. D. 2017. Get to the point: Summarization with pointer-generator networks. In ACL.
    Google ScholarFindings
  • Shen, D.; Sun, J.-T.; Li, H.; Yang, Q.; and Chen, Z. 2007. Document summarization using conditional random fields. In IJCAI.
    Google ScholarFindings
  • Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR.
    Google ScholarLocate open access versionFindings
  • Woodsend, K., and Lapata, M. 2012. Multiple aspect summarization using integer linear programming. In EMNLP-CoNLL.
    Google ScholarFindings
  • Wu, Y., and Hu, B. 2018. Learning to extract coherent summary via deep reinforcement learning. arXiv preprint arXiv:1804.07036.
    Findings
Your rating :
0

 

Tags
Comments