POINTER: Constrained Text Generation via Insertion-based Generative Pre-training

EMNLP 2020, 2020.

Cited by: 2|Bibtex|Views159|Links
Keywords:
innerlayer beam searchgenerative prelarge scaledynamic programminglanguage modelMore(13+)
Weibo:
We present POINTER, a simple yet novel insertion-based approach for hard-constrained text generation

Abstract:

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER, a simple yet novel ...More

Code:

Data:

0
Introduction
  • Real-world editorial assistant applications must often generate text under specified lexical constraints, for example, convert a meeting note with key phrases into a concrete meeting summary, recast a user-input search query as a fluent sentence, generate a conversational response using grounding facts (Mou et al, 2016), or create a story using a pre-specified set of keywords (Fan et al, 2018).

    Generating text under specific lexical constraints is challenging.
  • While softconstrained models are easy to design, keywords are apt to be lost during generation, especially when multiple keywords must be included, or the keywords are less correlated
  • Soft enforcing algorithms such as attention and copy mechanisms (Bahdanau et al, 2015; Gu et al, 2016; Chen et al, 2019a) can be helpful in preserving keywords, but do not guarantee that constraints will be included in the output sentence
Highlights
  • Real-world editorial assistant applications must often generate text under specified lexical constraints, for example, convert a meeting note with key phrases into a concrete meeting summary, recast a user-input search query as a fluent sentence, generate a conversational response using grounding facts (Mou et al, 2016), or create a story using a pre-specified set of keywords (Fan et al, 2018)
  • The main contributions of this paper are summarized as follows. (i) We present POINTER, a novel insertion-based Transformer model for hardconstrained text generation. (ii) Large-scale pretraining and novel beam search algorithms are proposed to further boost performance. (iii) Experiments are conducted on several datasets across different domains, demonstrating the superiority of POINTER over strong baselines
  • As Constrained Sentence Generation by MetropolisHastings Sampling (CGMH) is a sampling-based method in nature, it achieves the highest Dist-n scores
  • We have presented POINTER, a simple yet powerful approach to generating text from a given set of lexical constraints in a non-autoregressive manner
  • The proposed method leverages a large-scale pre-trained model to generate text in a progressive manner using an insertion-based Transformer. Both automatic and human evaluation demonstrate the effectiveness of POINTER and its potential in constrained text generation
Methods
  • Denote the lexical constraints as X0 = X , the generation procedure of the method can be understood as a sequence of K stages: S = {X0, X1, · · · , XK−1, XK }, such that for each k ∈ {1, .
  • 2 N-4 BLEU METEOR Entropy B-2 B-4 E-4 PPL Avg Len. The authors provide the full evaluation result data including Wikipedia zero-shot learning results in Table 8 and Table 9.
  • In most of the experiments, λ is set at 0.5
Results
  • News Generation The authors first conduct experiments on the News dataset to generate sentences from 4 lexical constraints.
  • Quantitative results are summarized in Table 2, and some qualitative examples are provided in Table 4 and Appendix A.
  • POINTERis able to take full advantage of BERT initialization and Wiki pre-training to improve relevance scores (NIST, BLEU and METEOR).
  • Leveraging the ILBS further improves performance.
  • The authors observed that the generated sentences from CGMH are relatively short; CGMH may yield less fluent generation when the constraints are more disconnected
Conclusion
  • The authors have presented POINTER, a simple yet powerful approach to generating text from a given set of lexical constraints in a non-autoregressive manner.
  • The proposed method leverages a large-scale pre-trained model to generate text in a progressive manner using an insertion-based Transformer.
  • Both automatic and human evaluation demonstrate the effectiveness of POINTER and its potential in constrained text generation.
  • The authors' model can be extended to allow inflected/variant forms and arbitrary ordering of given lexical constraints.
Summary
  • Introduction:

    Real-world editorial assistant applications must often generate text under specified lexical constraints, for example, convert a meeting note with key phrases into a concrete meeting summary, recast a user-input search query as a fluent sentence, generate a conversational response using grounding facts (Mou et al, 2016), or create a story using a pre-specified set of keywords (Fan et al, 2018).

    Generating text under specific lexical constraints is challenging.
  • While softconstrained models are easy to design, keywords are apt to be lost during generation, especially when multiple keywords must be included, or the keywords are less correlated
  • Soft enforcing algorithms such as attention and copy mechanisms (Bahdanau et al, 2015; Gu et al, 2016; Chen et al, 2019a) can be helpful in preserving keywords, but do not guarantee that constraints will be included in the output sentence
  • Objectives:

    While the MLM objective in BERT only predicts the token of a masked placeholder, the objective is different from it in comprising both (i) likelihood of an insertion indicator for each slot, and (ii) the likelihood of each new token conditioning on the activated slot.
  • Methods:

    Denote the lexical constraints as X0 = X , the generation procedure of the method can be understood as a sequence of K stages: S = {X0, X1, · · · , XK−1, XK }, such that for each k ∈ {1, .
  • 2 N-4 BLEU METEOR Entropy B-2 B-4 E-4 PPL Avg Len. The authors provide the full evaluation result data including Wikipedia zero-shot learning results in Table 8 and Table 9.
  • In most of the experiments, λ is set at 0.5
  • Results:

    News Generation The authors first conduct experiments on the News dataset to generate sentences from 4 lexical constraints.
  • Quantitative results are summarized in Table 2, and some qualitative examples are provided in Table 4 and Appendix A.
  • POINTERis able to take full advantage of BERT initialization and Wiki pre-training to improve relevance scores (NIST, BLEU and METEOR).
  • Leveraging the ILBS further improves performance.
  • The authors observed that the generated sentences from CGMH are relatively short; CGMH may yield less fluent generation when the constraints are more disconnected
  • Conclusion:

    The authors have presented POINTER, a simple yet powerful approach to generating text from a given set of lexical constraints in a non-autoregressive manner.
  • The proposed method leverages a large-scale pre-trained model to generate text in a progressive manner using an insertion-based Transformer.
  • Both automatic and human evaluation demonstrate the effectiveness of POINTER and its potential in constrained text generation.
  • The authors' model can be extended to allow inflected/variant forms and arbitrary ordering of given lexical constraints.
Tables
  • Table1: Generated example of the progressive generation process with multiple stages from the proposed POINTER model. Words in blue indicate newly generated words at the current stage. Xi denotes the generated partial sentence at Stage i. Five stages are considered in this example, X4 and X3 are the same, which indicates the end of the generation process. Interestingly, our method allows informative words (e.g., company, change) generated first, while non-informative words (e.g., the, to) generated at the end
  • Table2: Results on the News dataset. ILBS denotes beam search. “+Wiki” denotes fine-tuning on the Wikipretrained model. “Human” represents the held-out human reference
  • Table3: Results on the Yelp dataset. ILBS denotes beam search. “+Wiki” denotes fine-tuning on the Wikipretrained model. “Human” represents the held-out human reference
  • Table4: Generated examples from the News dataset
  • Table5: Generated examples from the Yelp dataset
  • Table6: Results of Human Evaluation on News and Yelp dataset for semantic consistency, fluency and informativeness, showing preferences (%) for our POINTER model vis-a-vis baselines and real human responses. Numbers in bold indicate the most preferred systems. Differences in mean preferences are statistically significant at p ≤ 0.00001
  • Table7: Speed comparison among different methods. “toks/s” represents tokens per second. Inference time is computed on 1000 test examples
  • Table8: Additional evaluation results on the News dataset. ILBS denotes beam search. “+Wiki” denotes finetuning on the Wiki-pretrained model. “Human” represents the held-out human reference. “Wiki zero-shot” represents zero-shot generation from the pre-trained model
  • Table9: Additional evaluation results on the Yelp dataset. ILBS denotes beam search. “+Wiki” denotes fine-tuning on the Wiki-pretrained model. “Human” represents the held-out human reference. “Wiki zero-shot” represents zero-shot generation from the pre-trained model
Download tables as Excel
Related work
  • Language Model Pre-training Large-scale pretrained language models, such as BERT (Devlin et al, 2019), RoBERTa (Liu et al, 2019), XLNet (Yang et al, 2019), Text-to-text Transformer (Raffel et al, 2019) and ELECTRA (Clark et al, 2020), have achieved great success on natural language understanding benchmarks. GPT-2 (Radford et al, 2018) first demonstrates great potential for leveraging Transformer models in generating realistic text with given prompts. MASS (Song et al, 2019) and BART (Lewis et al, 2019) propose methods for sequence-to-sequence pre-training. UniLM (Dong et al, 2019) unifies the generation and understanding tasks within a single pre-training scheme.

    be directly adopted.

    DialoGPT (Zhang et al, 2020) and MEENA (Adiwardana et al, 2020) focus on open-domain conversations, while SC-GPT (Peng et al, 2020) focuses on task-oriented dialog; these models demonstrate potential for human-like response generation. Controllable pre-trained language generation models have also been proposed. For example, CTRL (Keskar et al, 2019) and Grover (Zellers et al, 2019) guide text generation with pre-defined control codes, Optimus (Li et al, 2020) guides text generation with the abstract-level latent codes. Complementary to this, PPLM (Dathathri et al, 2020) introduces a controllable scheme in the text decoding stage. In addition, recent work has also investigated how to leverage BERT for conditional text generation (Chen et al, 2019b; Mansimov et al, 2019). With massive training data, these models exhibit strong capacity for generating realistic chunks of text.
Reference
  • Daniel Adiwardana, Minh-Thang Luong, David R So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, et al. 2020. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
    Google ScholarFindings
  • Richard Bellman. 1954. The theory of dynamic programming. Technical report, Rand corp santa monica ca.
    Google ScholarFindings
  • Jon Bentley. 198Programming pearls: algorithm design techniques. Communications of the ACM, 27(9):865–873.
    Google ScholarLocate open access versionFindings
  • Ricardo Campos, Vıtor Mangaravite, Arian Pasquali, Alıpio Jorge, Celia Nunes, and Adam Jatowt. 2020.
    Google ScholarFindings
  • Ricardo Campos, Vıtor Mangaravite, Arian Pasquali, Alıpio Mario Jorge, Celia Nunes, and Adam Jatowt. 2018. Yake! collection-independent automatic keyword extractor. In European Conference on Information Retrieval.
    Google ScholarLocate open access versionFindings
  • William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, and Jakob Uszkoreit. 2019. Kermit: Generative insertion-based modeling for sequences. arXiv preprint arXiv:1906.01604.
    Findings
  • Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019a. Improving sequence-to-sequence learning via optimal transport. In ICLR.
    Google ScholarFindings
  • Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, and Jingjing Liu. 2019b. Distilling the knowledge of bert for text generation. arXiv preprint arXiv:1911.03829.
    Findings
  • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Michel Galley, Chris Brockett, Mengdi Wang, and Jianfeng Gao. 2018. Towards coherent and cohesive long-form text generation. arXiv preprint arXiv:1811.00511.
    Findings
  • Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pretraining text encoders as discriminators rather than generators. In ICLR.
    Google ScholarFindings
  • Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In ICLR.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
    Google ScholarFindings
  • George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the second international conference on Human Language Technology Research.
    Google ScholarLocate open access versionFindings
  • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In NeurIPS.
    Google ScholarFindings
  • Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical neural story generation. In ACL.
    Google ScholarFindings
  • Michel Galley, Chris Brockett, Xiang Gao, Jianfeng Gao, and Bill Dolan. 2019. Grounded response generation task at dstc7. In AAAI Dialog System Technology Challenges Workshop.
    Google ScholarLocate open access versionFindings
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In EMNLP.
    Google ScholarFindings
  • David Gries. 1982. A note on a standard strategy for developing loop invariants and loops. Science of Computer Programming.
    Google ScholarFindings
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2018. Non-autoregressive neural machine translation. In ICLR.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Qi Liu, and Kyunghyun Cho. 2019. Insertion-based decoding with automatically inferred generation order. TACL.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
    Findings
  • Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. In ACL.
    Google ScholarFindings
  • J Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. 2019. Improved lexically constrained decoding for translation and monolingual rewriting. In NAACL.
    Google ScholarFindings
  • Jungo Kasai, James Cross, Marjan Ghazvininejad, and Jiatao Gu. 2020. Parallel machine translation with disentangled context transformer. arXiv preprint arXiv:2001.05136.
    Findings
  • Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL - A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858.
    Findings
  • D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
    Google ScholarFindings
  • Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation.
    Google ScholarLocate open access versionFindings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic non-autoregressive neural sequence modeling by iterative refinement. EMNLP.
    Google ScholarLocate open access versionFindings
  • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Findings
  • Chunyuan Li, Xiang Gao, Yuan Li, Xiujun Li, Baolin Peng, Yizhe Zhang, and Jianfeng Gao. 2020. Optimus: Organizing sentences via pretrained modeling of a latent space. In arXiv preprint arXiv:2004.04092.
    Findings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In NAACL.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard Hovy. 2019. Flowseq: Nonautoregressive conditional sequence generation with generative flow. arXiv preprint arXiv:1909.02480.
    Findings
  • Elman Mansimov, Alex Wang, and Kyunghyun Cho. 2019. A generalized framework of sequence generation with application to undirected sequence models. arXiv preprint arXiv:1905.12790.
    Findings
  • Ning Miao, Hao Zhou, Lili Mou, Rui Yan, and Lei Li. 2019. Cgmh: Constrained sentence generation by metropolis-hastings sampling. In AAAI.
    Google ScholarFindings
  • Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, and Zhi Jin. 2016. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. arXiv preprint arXiv:1607.00970.
    Findings
  • Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL.
    Google ScholarFindings
  • Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, and Jianfeng Gao. 2020. Few-shot natural language generation for task-oriented dialog. arXiv preprint arXiv:2002.12328.
    Findings
  • Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In NAACL.
    Google ScholarFindings
  • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS.
    Google ScholarFindings
  • Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In NeurIPS.
    Google ScholarFindings
  • Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, and Jianfeng Gao. 2019. Conversing by reading: Contentful neural conversation with on-demand machine reading. In ACL.
    Google ScholarFindings
  • Yizhe Zhang, Michel Galley, Jianfeng Gao, Zhe Gan, Xiujun Li, Chris Brockett, and Bill Dolan. 2018. Generating informative and diverse conversational responses via adversarial information maximization. In NeurIPS.
    Google ScholarFindings
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2018. Language models are unsupervised multitask learners. Technical report, OpenAI.
    Google ScholarFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Arthur Richards and Jonathan P How. 2002. Aircraft trajectory planning with collision avoidance using mixed integer linear programming. In Proceedings of American Control Conference.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, and Lawrence Carin. 2017. Deconvolutional paragraph representation learning. In NeurIPS.
    Google ScholarFindings
  • Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2020. Dialogpt: Large-scale generative pre-training for conversational response generation. In ACL (system demonstration).
    Google ScholarFindings
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and TieYan Liu. 2019. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450.
    Findings
  • Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. 2019. Insertion transformer: Flexible sequence generation via insertion operations. arXiv preprint arXiv:1902.03249.
    Findings
  • Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, and Zhihong Deng. 2019. Fast structured decoding for sequence models. In NeurIPS.
    Google ScholarFindings
  • Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, and Zhiting Hu. 2019. Target-guided open-domain conversation. In ACL.
    Google ScholarFindings
  • Sean Welleck, Kiante Brantley, Hal Daume III, and Kyunghyun Cho. 2019. Non-monotonic sequential text generation. arXiv preprint arXiv:1902.02192.
    Findings
  • Felix Wu, Angela Fan, Alexei Baevski, Yann N Dauphin, and Michael Auli. 2019. Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430.
    Findings
  • Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine
    Google ScholarFindings
Your rating :
0

 

Tags
Comments