A Probabilistic Formulation of Unsupervised Text Style Transfer

ICLR, 2020.

Cited by: 1|Bibtex|Views144|Links
EI
Keywords:
unsupervised text style transfer deep latent sequence model
Weibo:
We propose a probabilistic generative forumalation that unites past work on unsupervised text style transfer

Abstract:

We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learn...More

Code:

Data:

Introduction
Highlights
  • Text sequence transduction systems convert a given text sequence from one domain to another
  • We focus on a standard suite of style transfer tasks, including formality transfer (Rao & Tetreault, 2018), author imitation (Xu et al, 2012), word decipherment (Shen et al, 2017), sentiment transfer (Shen et al, 2017), and related language translation (Pourdamghani & Knight, 2017)
  • In experiments across a suite of unsupervised text style transfer tasks, we find that the natural objective of our model outperforms all manually defined unsupervised objectives from past work, supporting the notion that probabilistic principles can be a useful guide even in deep neural systems
  • We propose a probabilistic generative forumalation that unites past work on unsupervised text style transfer
  • We show that this probabilistic formulation provides a different way to reason about unsupervised objectives in this domain
  • Our model leads to substantial improvements on five text style transfer tasks, yielding bigger gains when the styles considered are more difficult to distinguish
Methods
  • Not to his father’s.
  • Not to his father’s house.
  • Not to his brother
  • Send thy man away.
  • Send an excellent word.
  • Send your man away
Results
  • Shen et al (2017) Yang et al (2018) UNMT BT+NLL Ours Decipher Sr-Bs Bs-Sr En-De De-En
Conclusion
  • The authors propose a probabilistic generative forumalation that unites past work on unsupervised text style transfer.
  • The authors show that this probabilistic formulation provides a different way to reason about unsupervised objectives in this domain.
  • The authors' model leads to substantial improvements on five text style transfer tasks, yielding bigger gains when the styles considered are more difficult to distinguish
Summary
  • Introduction:

    Text sequence transduction systems convert a given text sequence from one domain to another.
  • Unsupervised sequence transduction methods that require only non-parallel data are appealing and have been receiving growing attention (Bannard & Callison-Burch, 2005; Ravi & Knight, 2011; Mizukami et al, 2015; Shen et al, 2017; Lample et al, 2018; 2019)
  • This trend is most pronounced in the space of text style transfer tasks where parallel data is challenging to obtain (Hu et al, 2017; Shen et al, 2017; Yang et al, 2018).
  • General unsupervised translation has not typically been considered style transfer, but for the purpose of comparison the authors conduct evaluation on this task (Lample et al, 2017)
  • Methods:

    Not to his father’s.
  • Not to his father’s house.
  • Not to his brother
  • Send thy man away.
  • Send an excellent word.
  • Send your man away
  • Results:

    Shen et al (2017) Yang et al (2018) UNMT BT+NLL Ours Decipher Sr-Bs Bs-Sr En-De De-En
  • Conclusion:

    The authors propose a probabilistic generative forumalation that unites past work on unsupervised text style transfer.
  • The authors show that this probabilistic formulation provides a different way to reason about unsupervised objectives in this domain.
  • The authors' model leads to substantial improvements on five text style transfer tasks, yielding bigger gains when the styles considered are more difficult to distinguish
Tables
  • Table1: Results on the sentiment transfer, author imitation, and formality transfer. We list the PPL of pretrained LMs on the test sets of both domains. We only report Self-BLEU on the sentiment task to compare with existing work
  • Table2: BLEU for decipherment, related language translation (Sr-Bs), and general unsupervised translation (En-De)
  • Table3: Examples for author imitation task
  • Table4: Comparison of gradient approximation on the sentiment transfer task
  • Table5: Comparison of gradient propagation method on the sentiment transfer task
  • Table6: Random Sentiment Transfer Examples
  • Table7: Repetitive examples of BT+NLL baseline on Formality transfer
Download tables as Excel
Funding
  • The work of Junxian He and Xinyi Wang is supported by the DARPA GAILA project (award HR00111990063) and the Tang Family Foundation respectively
Reference
  • Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. Unsupervised neural machine translation. In Proceedings of ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe, Gorka Labaka, and Eneko Agirre. An effective approach to unsupervised machine translation. arXiv preprint arXiv:1902.01313, 2019.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR, 2015.
    Google ScholarLocate open access versionFindings
  • Colin Bannard and Chris Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL, 2005.
    Google ScholarLocate open access versionFindings
  • Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. In Proceedings of ConNLL, 2016.
    Google ScholarLocate open access versionFindings
  • Qing Dou and Kevin Knight. Dependency-based decipherment for resource-limited machine translation. Proceedings of EMNLP, 2012.
    Google ScholarLocate open access versionFindings
  • Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. Dual learning for machine translation. In Proceedings of NeurIPS, 2016.
    Google ScholarLocate open access versionFindings
  • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. Toward controlled generation of text. In Proceedings of ICML, 2017.
    Google ScholarLocate open access versionFindings
  • Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. In Proceedings of ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Harsh Jhamtani, Varun Gangal, Edward Hovy, and Eric Nyberg. Shakespearizing modern language using copy-enriched sequence-to-sequence models. Proceedings of EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, et al. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 2017.
    Google ScholarLocate open access versionFindings
  • Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of EMNLP, 2014.
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    Findings
  • Kevin Knight, Anish Nair, Nishit Rathod, and Kenji Yamada. Unsupervised analysis for decipherment problems. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 499– 506, 2006.
    Google ScholarLocate open access versionFindings
  • Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043, 2017.
    Findings
  • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. Phrasebased & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755, 2018.
    Findings
  • Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc’Aurelio Ranzato, and Y-Lan Boureau. Multiple-attribute text rewriting. In Proceedings of ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • Juncen Li, Robin Jia, He He, and Percy Liang. Delete, retrieve, generate: A simple approach to sentiment and style transfer. arXiv preprint arXiv:1804.06437, 2018.
    Findings
  • Yishu Miao and Phil Blunsom. Language as a latent variable: Discrete generative models for sentence compression. In Proceedings of EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • Ronen Mir, Bjarke Felbo, Nick Obradovich, and Iyad Rahwan. Evaluating style transfer for text. In Proceedings of NAACL, 2019.
    Google ScholarLocate open access versionFindings
  • Masahiro Mizukami, Graham Neubig, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. Linguistic individuality transformation for spoken language. In Natural Language Dialog Systems and Intelligent Assistants. 2015.
    Google ScholarLocate open access versionFindings
  • Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. compare-mt: A tool for holistic comparison of language generation systems. In Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) Demo Track, Minneapolis, USA, June 2019. URL http://arxiv.org/abs/1903.07926.
    Findings
  • Nima Pourdamghani and Kevin Knight. Deciphering related languages. Proceedings of EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • Sudha Rao and Joel Tetreault. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. arXiv preprint arXiv:1803.06535, 2018.
    Findings
  • Sujith Ravi and Kevin Knight. Deciphering foreign language. In Proceedings of ACL, 2011.
    Google ScholarLocate open access versionFindings
  • Alexander M Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. In Proceedings of EMNLP, 2015.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. In Proceedings of ACL, 2016.
    Google ScholarLocate open access versionFindings
  • Claude Elwood Shannon. A mathematical theory of communication. Bell system technical journal, 27(3):379–423, 1948.
    Google ScholarLocate open access versionFindings
  • Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. Style transfer from non-parallel text by cross-alignment. In Proceeings of NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of NeurIPS, 2000.
    Google ScholarLocate open access versionFindings
  • Published as a conference paper at ICLR 2020 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
    Google ScholarFindings
  • Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of NeurIPS, 2017. Wei Xu, Alan Ritter, William B. Dolan, Ralph Grishman, and Cherry Colin. Paraphrasing for style.
    Google ScholarLocate open access versionFindings
  • COLING, 2012. Zichao Yang, Zhiting Hu, Chris Dyer, Eric P Xing, and Taylor Berg-Kirkpatrick. Unsupervised text style transfer using language models as discriminators. In Proceedings of NeurIPS, 2018. Pengcheng Yin, Chunting Zhou, Junxian He, and Graham Neubig. Structvae: Tree-structured latent variable models for semi-supervised semantic parsing. In Proceedings of ACL, 2018. Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of ACL, 2017.
    Google ScholarLocate open access versionFindings
  • A.2 HYPERPARAMETER TUNING. We vary pooling windows size as {1, 5}, the decaying patience hyperparameter k for selfreconstruction loss (Eq. 4) as {1, 2, 3}. For the baseliens UNMT and BT+NLL, we also try the option of not annealing the self-reconstruction loss at all as in the unsupervised machine translation task (Lample et al., 2018). We vary the weight λ for the NLL term (BT+NLL) or the KL term (our method) as {0.001, 0.01, 0.03, 0.05, 0.1}.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments