Zero-Shot Cross-Lingual Neural Headline Generation

IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 2319-2327, 2018.

Cited by: 14|Bibtex|Views50|Links
EI WOS
Keywords:
Data modelsTrainingTraining dataSpeech processingNeural networks
Weibo:
Due to the lack of those parallel corpora of direct source language articles and target language headlines, we propose to deal with the cross-lingual neural headline generation under the zero-shot scenario

Abstract:

Neural headline generation (NHG) has been proven to be effective in generating a fully abstractive headline recently. Existing NHG systems are only capable of producing headline of the same language as the original document. Cross lingual headline generation is an important task since it provides an efficient way to understand the key poi...More

Code:

Data:

0
Introduction
  • H EADLINE provides an efficient and effective way for people to obtain the subject information before reading through the whole document.
  • To quickly understand the main idea of the text, people usually take a glimpse at the headline first.
  • The j-th word yj in a headline is generated in an encoder-decoder framework: Pr(yj |x, y
  • Where yj−1 is the headline word that generated in the last timestep, sj represents the j-th hidden state computed by the decoder, cj indicates the j-th context vector for generating yj , and g(·) is a non-linear function.
Highlights
  • H EADLINE provides an efficient and effective way for people to obtain the subject information before reading through the whole document
  • To address the model discrepancy and error propagation problems in pipeline methods, we propose a direct source-totarget cross-lingual neural headline generation (CNHG) model and deploy on the task of English-Chinese headline generation based on existing parallel corpora: English headline generation, Chinese headline generation, and English-Chinese translation corpora
  • We propose to model the intended English-Chinese CNHG model based on a teacher-student framework with no direct training data
  • As the neural machine translation (NMT) and Neural headline generation (NHG) models both can be utilized as the teacher model to guide the student model, we further investigate the combined “teaching” ability of the two models, as shown in Fig. 5
  • We propose a direct end-to-end CNHG model which can address the training data discrepancy problem and error propagation problem in pipeline methods under a zero-shot scenario
  • Experimental results on English-to-Chinese cross-lingual headline generation demonstrate that our proposed method significantly outperforms the baseline models
  • We introduce three methods to guide the learning process of the CNHG student model
Methods
  • Teacher Model Setup: The authors evaluate the approaches on English-Chinese headline generation task.
  • Note that in the approaches, there are two pre-trained teacher models involved, i.e., the English-Chinese NMT model and the Chinese NHG model.
  • For the English-Chinese NMT model, the training set2 consists of 1.25M sentence pairs with 27.9M Chinese words and 34.5M English words.
  • The authors use the NIST 2002 dataset as the development set to select model parameters.
  • The evaluation metric is BLEU [20], calculated by the multi-bleu.perl script
Results
  • The evaluation metric the authors utilize for headline generation tasks is the ROUGE [21], which reports recall, precision and F1 scores.
  • For Chinese headline generation task, the authors usually report full-length F1 scores [10], [16].
  • The recall scores of ROUGE are sensitive to length as it favors longer headlines.
  • The F1 scores, on the other hand, could provide fairer results by penalizing longer headlines that are noisy [14].
  • The authors use full-length F1 scores from ROUGE-1, ROUGE-2 and ROUGE-L to evaluate the systems for fair comparison
Conclusion
  • Let the CNHG model be the student model, and the authors assume it would have close generation probability with the pretrained NMT and NHG teacher models.
  • Based on this assumption, the authors introduce three methods to guide the learning process of the CNHG student model.
  • The authors will investigate the probability of integrating these plain texts to enhance CNHG for semi-supervised learning
Summary
  • Introduction:

    H EADLINE provides an efficient and effective way for people to obtain the subject information before reading through the whole document.
  • To quickly understand the main idea of the text, people usually take a glimpse at the headline first.
  • The j-th word yj in a headline is generated in an encoder-decoder framework: Pr(yj |x, y
  • Where yj−1 is the headline word that generated in the last timestep, sj represents the j-th hidden state computed by the decoder, cj indicates the j-th context vector for generating yj , and g(·) is a non-linear function.
  • Objectives:

    The authors' goal is to find a set of parameters that minimizes the training objective:.
  • Methods:

    Teacher Model Setup: The authors evaluate the approaches on English-Chinese headline generation task.
  • Note that in the approaches, there are two pre-trained teacher models involved, i.e., the English-Chinese NMT model and the Chinese NHG model.
  • For the English-Chinese NMT model, the training set2 consists of 1.25M sentence pairs with 27.9M Chinese words and 34.5M English words.
  • The authors use the NIST 2002 dataset as the development set to select model parameters.
  • The evaluation metric is BLEU [20], calculated by the multi-bleu.perl script
  • Results:

    The evaluation metric the authors utilize for headline generation tasks is the ROUGE [21], which reports recall, precision and F1 scores.
  • For Chinese headline generation task, the authors usually report full-length F1 scores [10], [16].
  • The recall scores of ROUGE are sensitive to length as it favors longer headlines.
  • The F1 scores, on the other hand, could provide fairer results by penalizing longer headlines that are noisy [14].
  • The authors use full-length F1 scores from ROUGE-1, ROUGE-2 and ROUGE-L to evaluate the systems for fair comparison
  • Conclusion:

    Let the CNHG model be the student model, and the authors assume it would have close generation probability with the pretrained NMT and NHG teacher models.
  • Based on this assumption, the authors introduce three methods to guide the learning process of the CNHG student model.
  • The authors will investigate the probability of integrating these plain texts to enhance CNHG for semi-supervised learning
Tables
  • Table1: NOTATION TABLE
  • Table2: EFFECT OF HYPER-PARAMETER α ON DUC2003 DEVELOPMENT DATASET
  • Table3: DATA STATISTICS OF THE DUC DATASETS
  • Table4: EXPERIMENTAL RESULTS ON DUC DATASETS
  • Table5: EXAMPLE HEADLINES FROM EACH SYSTEM AND THE CORRESPONDING ENGLISH TRANSLATION ARE GIVEN IN PARENTHESES, BY ITALICS
  • Table6: EFFECT OF USING DIFFERENT APPROXIMATION METHODS ON DUC2003
  • Table7: UNK” STATISTICS IN THE PIPELINE METHODS
Download tables as Excel
Related work
  • A. Neural Headline Generation

    End-to-end neural headline generation (NHG), has attracted increasing attention in recent several years. Researchers have been attempting to improve the performance of NHG from different aspects. For instance, source article representation methods [12], [14], [24] , encoder choices [9], [12], [25], decoder adoptions [9], [12], limited vocabulary problem solutions [10], [11], output length controlling problem [25] and training strategies [26].

    B. Cross Language Summarization
Funding
  • This work was supported in part by the Natural Science Foundation of China and in part by the German Research Foundation in Project Crossmodal Learning, NSFC under Grant 61621136008 / DFC TRR-169, in part by the Microsoft Research Asia FY17-RES-THEME-017, and in part by the China Association for Science and Technology under Grant 2016QNRC001
Reference
  • X. Wan, H. Li, and J. Xiao, “Cross-language document summarization based on machine translation quality prediction,” in Proc. 48th Annu. Meet. Assoc. Comput. Linguistics, 2010, pp. 917–926.
    Google ScholarLocate open access versionFindings
  • X. Wan, “Using bilingual information for cross-language document summarization,” in Proc. 49th Annu. Meet. Assoc. Comput. Linguistics: Human Lang. Technol., 2011, pp. 1546–1555.
    Google ScholarLocate open access versionFindings
  • J.-G. Yao, X. Wan, and J. Xiao, “Phrase-based compressive cross-language summarization,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 118–127.
    Google ScholarLocate open access versionFindings
  • J. Zhang, Y. Zhou, and C. Zong, “Abstractive cross-language summarization via translation model enhanced predicate argument structure fusing,” IEEE/ACM Trans. Audio Speech, Lang. Process., vol. 24, no. 10, pp. 1842– 1853, Oct. 2016.
    Google ScholarLocate open access versionFindings
  • O. Vinyals, Ł. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton, “Grammar as a foreign language,” in Proc. 28th Int. Conf. Neural Inf. Process. Syst., 2015, pp. 2773–2781.
    Google ScholarLocate open access versionFindings
  • D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • H. Chen, M. Sun, C. Tu, Y. Lin, and Z. Liu, “Neural sentiment classification with user and product attention,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 1650–1659.
    Google ScholarLocate open access versionFindings
  • Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, “Neural relation extraction with selective attention over instances,” in Proc. 54th Annu. Meet. Assoc. Comput. Linguistics, 2016, pp. 2124–2133.
    Google ScholarLocate open access versionFindings
  • A. M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence summarization,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 379–389.
    Google ScholarLocate open access versionFindings
  • J. Gu, Z. Lu, H. Li, and V. O. Li, “Incorporating copying mechanism in sequence-to-sequence learning,” in Proc. 54th Annu. Meet. Assoc. Comput. Linguistics, 2016, pp. 16–1154.
    Google ScholarLocate open access versionFindings
  • C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, “Pointing the unknown words,” in Proc. 54th Annu. Meet. Assoc. Comput. Linguistics, 2016, pp. 140–149.
    Google ScholarLocate open access versionFindings
  • S. Chopra, M. Auli, and A. M. Rush,“Abstractive sentence summarization with attentive recurrent neural networks,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol., 2016, pp. 93–98.
    Google ScholarLocate open access versionFindings
  • L. Yu, J. Buys, and P. Blunsom, “Online segment to segment neural transduction,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 1307–1316.
    Google ScholarLocate open access versionFindings
  • R. Nallapati, B. Zhou, and C. dos Santos, “Abstractive text summarization using sequence-to-sequence rnns and beyond,” in Proc. 20th SIGNLL Conf. Comput. Natural Lang. Learn., 2016, pp. 280–290.
    Google ScholarLocate open access versionFindings
  • C. Napoles, M. Gormley, and B. Van Durme, “Annotated gigaword,” in Proc. Joint Workshop Autom. Knowl. Base Construction Web-scale Knowl. Extraction, 2012, pp. 95–100.
    Google ScholarLocate open access versionFindings
  • B. Hu, Q. Chen, and F. Zhu, “Lcsts: A large scale chinese short text summarization dataset,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 1967–1972.
    Google ScholarLocate open access versionFindings
  • S. Shen et al., “Minimum risk training for neural machine translation,” in Proc. 54th Annu. Meet. Assoc. Comput. Linguistics, 2016, pp. 1683–1692.
    Google ScholarLocate open access versionFindings
  • T. Kociskyet al., “Semantic parsing with semi-supervised sequential autoencoders,” in Proc. Conf. Empir. Methods Natural Lang. Process., 2016, pp. 1078–1087.
    Google ScholarLocate open access versionFindings
  • Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 1317–1327.
    Google ScholarLocate open access versionFindings
  • K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: A method for automatic evaluation of machine translation,” in Proc. 40th Annu. Meet. Assoc. Comput. Linguistics, 2002, pp. 311–318.
    Google ScholarLocate open access versionFindings
  • C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Proc. ACL-04 Workshop, 2004.
    Google ScholarLocate open access versionFindings
  • K. Cho et al., “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2014, pp. 1724–1734.
    Google ScholarLocate open access versionFindings
  • M. D. Zeiler, “Adadelta: An adaptive learning rate method,” Comput. Sci., 2012.
    Google ScholarLocate open access versionFindings
  • S. Takase, J. Suzuki, N. Okazaki, T. Hirao, and M. Nagata, “Neural headline generation on abstract meaning representation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 1054–1059.
    Google ScholarLocate open access versionFindings
  • Y. Kikuchi, G. Neubig, R. Sasano, H. Takamura, and M. Okumura, “Controlling output length in neural encoder-decoders,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 1328–1338.
    Google ScholarLocate open access versionFindings
  • Ayana et al., “Recent advances on neural headline generation,” J. Comput. Sci. Technol., vol. 32, no. 4, pp. 768–784, 2017.
    Google ScholarLocate open access versionFindings
  • R. Funaki and H. Nakayama, “Image-mediated learning for zero-shot cross-lingual document retrieval,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 585–590.
    Google ScholarLocate open access versionFindings
  • M. Yazdani and J. Henderson, “A model of zero-shot learning of spoken language understanding,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 244–249.
    Google ScholarLocate open access versionFindings
  • M. Johnson et al., “Google’s multilingual neural machine translation system: Enabling zero-shot translation,” Trans. Assoc. Comput. Linguistics, vol. 5, pp. 339–351, 2017.
    Google ScholarLocate open access versionFindings
  • H. Nakayama and N. Nishida, “Zero-resource machine translation by multimodal encoder–decoder network with multimedia pivot,” Mach. Transl., vol. 31, no. 1/2, pp. 49–64, 2016.
    Google ScholarLocate open access versionFindings
  • O. Firat, B. Sankaran, Y. Al-Onaizan, F. T. Y. Vural, and K. Cho, “Zeroresource translation with multi-lingual neural machine translation,” in Proc. 2016 Conf. Empirical Methods Natural Lang. Process., Austin, Texas, USA, Nov. 2016, pp. 268–277.
    Google ScholarLocate open access versionFindings
  • Y. Chen, Y. Liu, Y. Cheng, and O. V. Li, “A teacher-student framework for zero-resource neural machine translation,” in Proc. 55th Annu. Meet. Assoc. Comput. Linguistics, 2017.
    Google ScholarLocate open access versionFindings
  • O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Proc. Adv. Neural Inf. Process. Syst., 2015. Ayana received the M.E. degree in 2009 from the College of Computer Science and Technology, Inner Mongolia University, Hohhot, China. She is currently working toward the Ph.D. degree with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. Her research interest is document summarization.
    Google ScholarLocate open access versionFindings
  • Shi-qi Shen received the Ph.D. degree in computer science from the Department of Computer Science and Technology, Tsinghua University, Beijing, China, in 2017. He is a Senior Researcher of Wechat, Tencent. His research interests include the area of machine translation and deep learning for natural language processing.
    Google ScholarLocate open access versionFindings
  • Yun Chen received the bachelor degree from Tsinghua University, Beijing, China, in 2013. She is currently working toward the Ph.D. degree with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, since 2014, under the supervision of Prof. Victor O. K. Li. Her research interests include in machine learning approaches that are both linguistically motivated, and tailored to natural language processing, especially neural machine translation.
    Google ScholarLocate open access versionFindings
  • Cheng Yang received the B.E. degree from Tsinghua University, Beijing, China, in 2014. He is currently working toward the Ph.D. degree with the Department of Computer Science and Technology, Tsinghua University. His research interests include natural language processing and network representation learning.
    Google ScholarLocate open access versionFindings
  • Zhi-yuan Liu received the Ph.D. degree from the Department of Computer Science and Technology, Tsinghua University, in 2011. He is currently an Associate Professor with the Department of Computer Science and Technology, Tsinghua University. His research interests include natural language processing, knowledge graph and social computation.
    Google ScholarLocate open access versionFindings
  • Mao-song Sun received the Ph.D. degree in computational linguistics from City University of Hong Kong, Hong Kong, in 2004. He is currently a Professor with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. His research interests include natural language processing, Web intelligence, and machine learning.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments