Self-Adversarial Learning with Comparative Discrimination for Text Generation

ICLR, 2020.

Cited by: 2|Bibtex|Views95|Links
EI
Keywords:
adversarial learning text generation
Weibo:
Its performance in NLLgen is worse than maximum likelihood estimation as maximum likelihood estimation directly optimizes the metric, it yields better quality-diversity trade-off than maximum likelihood estimation training, which has not been achieved by the previous Generative A...

Abstract:

Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples. To address the issues, we propose a novel self-adversarial learning (SAL) paradigm for improving GANs' performance in text generation. In contrast to s...More

Code:

Data:

Introduction
  • Generative Adversarial Networks (Goodfellow et al, 2014) (GANs) have achieved tremendous success for image generation and received much attention in computer vision.
  • Adversarial text generation has drawn much attention in recent years due to its advantages (e.g., sequence-level guidance without the exposure bias issue (Bengio et al, 2015)) over maximum likelihood estimation (MLE) for natural language generation
  • It formulates the learning process as a minimax game between a generator Gθ parameterized by θ and a discriminator Dφ parameterized by φ: the discriminator is trained to distinguish between the samples drawn from the real data distribution pdata and the samples generated by the generator; while the generator is trained to generate samples that can “fool” the discriminator.
  • Where x is a sample from the real data, Gθ(z) is a sample generated by the generator with the initialization z that is drawn from the noise distribution pz
Highlights
  • Generative Adversarial Networks (Goodfellow et al, 2014) (GANs) have achieved tremendous success for image generation and received much attention in computer vision
  • Following the experimental settings in previous work (Lin et al, 2017; Guo et al, 2018; Shi et al, 2018; Zhu et al, 2018; Nie et al, 2018), we evaluate our approach in both synthetic and real datasets based on Texygen (Zhu et al, 2018), which is a benchmark platform for evaluating adversarial text generation models
  • Its performance in NLLgen is worse than maximum likelihood estimation as maximum likelihood estimation directly optimizes the metric, it yields better quality-diversity trade-off than maximum likelihood estimation training, which has not been achieved by the previous Generative Adversarial Networks, which is shown by the fact that the NLLoracle+NLLgen for self-adversarial learning is lower than that yielded by maximum likelihood estimation, while other Generative Adversarial Networks have the same sum score with maximum likelihood estimation, indicating that they fail to improve the quality-diversity trade-off after pretraining
  • We find that the training of self-adversarial learning is more stable compared with other text Generative Adversarial Networks
  • We present a self-adversarial learning (SAL) paradigm for adversarial text generation
  • Through the self-improvement reward mechanism, the problem of reward sparsity and mode collapse can be alleviated and training of text Generative Adversarial Networks are more stable, which results in a better performance in the text generation benchmarks in terms of both quality, diversity, and lower variance
Methods
  • We use the best NLLoracle + NLLgen obtained during training to evaluate the quality-diversity trade-off.
  • For the real data experiments, we follow the previous work to apply the commonly-used BLEU scores (Papineni et al, 2002) (BLEU(F)) and the perplexity of generated samples evaluated by an open-sourced pretrained language model (Jozefowicz et al, 2016) as quality metrics since NLLoracle cannot be evaluated without an oracle language model.
  • We provide results of the combination of LeakGAN with SAL in the Appendix
Results
  • 4.2.1 RESULTS IN SYNTHETIC DATA

    Table 2 shows the results in the synthetic dataset. We can observe that SAL largely outperforms the previous GANs in all metrics in both cases of sequence length 20 and 40.
  • As in the synthetic data, we observe that our SAL consistently yields better results in all the metrics with stable performance compared to the previous GANs. According to Table 18 and Figure 4, SeqGAN and our SAL can improve MLE in the quality metrics (i.e., BLEU (F) and Perplexity) while MaliGAN and RankGAN.
Conclusion
  • To better understand SAL, we perform multiple ablation tests in both the synthetic and the real data.
  • It is notable that the proposed comparative discriminator alone (i.e., CAL) can yield good performance, demonstrating the effectiveness of learning by comparison.
  • We present a self-adversarial learning (SAL) paradigm for adversarial text generation.
  • Through the self-improvement reward mechanism, the problem of reward sparsity and mode collapse can be alleviated and training of text GANs are more stable, which results in a better performance in the text generation benchmarks in terms of both quality, diversity, and lower variance.
  • Generated samples are presented in the Appendix together with other details, including human evaluation details and qualitative analysis of the proposed SAL
Summary
  • Introduction:

    Generative Adversarial Networks (Goodfellow et al, 2014) (GANs) have achieved tremendous success for image generation and received much attention in computer vision.
  • Adversarial text generation has drawn much attention in recent years due to its advantages (e.g., sequence-level guidance without the exposure bias issue (Bengio et al, 2015)) over maximum likelihood estimation (MLE) for natural language generation
  • It formulates the learning process as a minimax game between a generator Gθ parameterized by θ and a discriminator Dφ parameterized by φ: the discriminator is trained to distinguish between the samples drawn from the real data distribution pdata and the samples generated by the generator; while the generator is trained to generate samples that can “fool” the discriminator.
  • Where x is a sample from the real data, Gθ(z) is a sample generated by the generator with the initialization z that is drawn from the noise distribution pz
  • Methods:

    We use the best NLLoracle + NLLgen obtained during training to evaluate the quality-diversity trade-off.
  • For the real data experiments, we follow the previous work to apply the commonly-used BLEU scores (Papineni et al, 2002) (BLEU(F)) and the perplexity of generated samples evaluated by an open-sourced pretrained language model (Jozefowicz et al, 2016) as quality metrics since NLLoracle cannot be evaluated without an oracle language model.
  • We provide results of the combination of LeakGAN with SAL in the Appendix
  • Results:

    4.2.1 RESULTS IN SYNTHETIC DATA

    Table 2 shows the results in the synthetic dataset. We can observe that SAL largely outperforms the previous GANs in all metrics in both cases of sequence length 20 and 40.
  • As in the synthetic data, we observe that our SAL consistently yields better results in all the metrics with stable performance compared to the previous GANs. According to Table 18 and Figure 4, SeqGAN and our SAL can improve MLE in the quality metrics (i.e., BLEU (F) and Perplexity) while MaliGAN and RankGAN.
  • Conclusion:

    To better understand SAL, we perform multiple ablation tests in both the synthetic and the real data.
  • It is notable that the proposed comparative discriminator alone (i.e., CAL) can yield good performance, demonstrating the effectiveness of learning by comparison.
  • We present a self-adversarial learning (SAL) paradigm for adversarial text generation.
  • Through the self-improvement reward mechanism, the problem of reward sparsity and mode collapse can be alleviated and training of text GANs are more stable, which results in a better performance in the text generation benchmarks in terms of both quality, diversity, and lower variance.
  • Generated samples are presented in the Appendix together with other details, including human evaluation details and qualitative analysis of the proposed SAL
Tables
  • Table1: Description of the datasets used for evaluation
  • Table2: Performance comparison of different models in synthetic tests where sequence length is set to 20 and 40 respectively. For all the metrics presented, the lower, the better
  • Table3: Performance comparison of different models in the COCO caption generation task. Metrics from top to bottom represent respectively the generation quality, the generation diversity, and the divergence between real data of generated sentences. For all the BLEU metrics, the higher, the better; for NLLgen and FD, the lower, the better
  • Table4: Performance comparison of different models in the EMNLP2017 WMT news generation task. Metrics from top to bottom represent respectively the generation quality, the generation diversity, and the divergence between real and generated data. For all the BLEU metrics, the higher, the better. For NLLgen and FD, the lower, the better
  • Table5: Human evaluation results of different models in both datasets. Scores are between 1-5, higher score indicates better quality
  • Table6: Results of the ablation tests in the synthetic data and the COCO dataset
  • Table7: Samples generated by SAL in Image COCO dataset a picture of a person ’s umbrella in a cell phone . a man stands in a green field . a young boy riding a truck . a man on a motorcycle is flying on a grassy field . a girl on a motorcycle parked on a city street . a motorcycle parked in a city street . a group of bikers riding bikes on a city street . a kitchen with a cat on the hood and a street . a bathroom containing a toilet and a sink . a young woman in a kitchen with a smiley face . a jet plane on the side of a street . a dish is sitting on a sidewalk next to a baby giraffe . a dog on a large green bike parked outside of the motor bike . a person on a kawasaki bike on a race track . a commercial aircraft is parked in front of a kitchen
  • Table8: Samples generated by CAL in Image COCO dataset a man is on a towel on a table outside of a real kitchen . a group of lambs at a tall building . a young boy riding a truck . a man on a motorcycle is flying on a grassy field . a man with a computer desk next to a white car . a cat is on the walls of a cat . a plane on a runway with a plane . an elegant , dilapidated plane are standing in front of a parking bag . the woman is riding a bike on their way . a man wearing an old bathroom with a banana . a plane is taking off from the ground . a man holding a man in front of herself . a woman is walking across the road . a kitchen with an island in green tiles . a clean kitchen with two small appliances
  • Table9: Samples generated by SeqGAN in Image COCO dataset a large image of a herd of racing train . man and woman on horse . a plane on a runway with a plane . a man preparing a table with wood lid . a view , tiled floors and a man prepares food . a man wearing an old bathroom with a banana . a man is is with a camera . two people are parked on a street . a white and white black kitten eating on a table . a toilet is lit on the walls . a kitchen is taking off from a window . a man is wearing glasses wearing scarf . a kitchen with graffiti hanging off from an open plain . two women playing with the orange . a kitchen with an island in a clear glass
  • Table10: Samples generated by MLE in Image COCO dataset a jet airplane flies flying through front from an airplane . a furry tub and overhead pot . a man in a kitchen filled with dark lights green side , .. a cross baby field dressed making cardboard a bathroom with a small tub and oven . a man above a bathroom with an oven room . a jet airliner flying through the sky . a kitchen with a dishwasher , and plenty of pots , pans . a person holding onto two red era arena sits on the street . a bathroom with a toilet and a bath tub . a cat perched on the phone and a baseball cap . the view of the street filled with really parked at the gates on the road . a large hairy dog on a high bike with a cake . a man is riding a white back bench . a narrow bed and white spotted dark tiled walls
  • Table11: Samples generated by SAL in EMNLP2017 WMT dataset (1) it ’ s likely to be egyptian and many of the canadian refugees , but for a decade . (2) the ministry spokesperson also said it now significant connected to the mountain. (3) it is the time they can more competitive , where we have another $ 99 . 100 per cent , and completely on the alternative , and that ’ s being affected . (4) we expect $ 200 and 0 . 3 percent for all you form other , and , which then well , it ’ s done . (5) so we wouldn ’ t feel very large in the game , but you fail to fund , and and the paper that ’ s like its start . (6) other countries made a playoff cut with pages by mrs . trump ’ s eighth consecutive season as a president
  • Table12: Samples generated by CAL in EMNLP2017 WMT dataset (1) i didn ’ t put relatively quiet , we have , ’ his work right in the particular heat rate , take steps traditionally clean . (2) why the u . s . then the table is our cabinet to do getting an vital company for the correct review . (3) those had trained for that , but no thin percentage of the nhs about being warned about the palestinian election before obama is not connected in israel . (4) in course , voters - obama said : “ torture is the outcome , the most powerful trade popularity is happening in it as a success . (5) “ in 2012 , it is nice to remain - no trump actor established this night - scoring three films . (6) we kind of not listen to knowing my most one , only , for a really good vote , and where things fun , you know
  • Table13: Samples generated by SeqGAN in EMNLP2017 WMT dataset (1) his missed 4 , 000 the first 95 really 69 - year - olds . (2) but just things , you want to thank it as my playing side has begun meeting with “ and “ the score had to train up , so he was tied for 11 years . (3) and when he got back doing fresh ties with his election , he will now step in january , back. (4) when you ’ t know if i saw her task to find himself more responsibility ago . (5) his hold over - up to a nine hike in 2015 , 13 percent of recently under suspects dead day , 24 , and to the city . (6) “ i look up on by the city ’ s vehicle on the day in a meeting in november
  • Table14: Samples generated by MLE in EMNLP2017 WMT dataset (1) you know that that is great for our ability to make thinking about how you know and you ? (2) when it ’ s a real thing possible , is if you the first time in a time here and get . (3) u . s , now government spending at the second half of four years , a country where the law will join the region to leave japan in germany . (4) deputy president , the issue of government and geneva probe threats and not - backed trump , but well - changing violence for their islamic state militants were innocent people . (5) he suggested in a presidential primary source and comment on its size following protests conducted by 18 , some in 2012 will be looked at tech energy hub . (6) “ it ’ s growing heavy hard , ” mr . romney said , he says matters that can ’ t again become the asian player
  • Table15: Case study of comparative discrimination and self-adversarial learning
  • Table16: The human evaluation scale from 1 to 5 with corresponding criteria and example sentences
  • Table17: Performance comparison of different models in synthetic tests where sequence length is set to 20 and 40 respectively. For all metrics presented, lower value is better
  • Table18: Performance comparison of different models in the COCO caption generation task. Metrics from top to bottom represent respectively the generation quality, the generation diversity, and the divergence between real data of generated sentences. For all BLEU metrics, higher value is better, for NLLgen and FD, lower is better
Download tables as Excel
Related work
  • Many variants of GANs (including TextGAN (Zhang et al, 2017), GSGAN (Kusner & HernándezLobato, 2016), SeqGAN (Yu et al, 2017), MaliGAN (Che et al, 2017), RankGAN (Lin et al, 2017), FMGAN (Chen et al, 2018), LeakGAN (Guo et al, 2018), and RelGAN (Nie et al, 2018)) have been proposed for text generation as adversarial training has received increasing attention in recent years. Typically, they address the non-differentiable issue by making continuous approximation or reinforcement learning. These approaches introduce several different architectures and optimization objectives of both the generator and the discriminator for adversarial text generation. Among the previous studies for adversarial text generation, the most related work to ours is RankGAN (Lin et al, 2017) which proposes a ranker to replace the conventional binary classifier as its discriminator for allowing the discrimination process to involve richer information. Another work whose idea is similar to ours is the relativistic discriminator (Jolicoeur-Martineau, 2018) (RGAN). It compares binary scores assigned to generated samples and real samples by subtraction as the learning signal to implicitly represent the inductive bias that half of the samples received by the discriminator is fake. In contrast, our comparative discriminator directly encodes this inductive bias and assesses generated sentences by comparison with a pairwise classifier, which provides more informative learning signals than subtraction in RGAN (Jolicoeur-Martineau, 2018) and normalized feature similarity in RankGAN (Lin et al, 2017). Our work is also related to the concurrent work (Zhou & Xu, 2020) that learns a comparative evaluator to evaluate open-domain natural language generation models.
Funding
  • Proposes a novel self-adversarial learning paradigm for improving GANs’ performance in text generation
  • Proposes a novel self-adversarial learning paradigm for improving adversarial text generation
  • Evaluates the proposed self-adversarial learning paradigm in both synthetic data and real data on the text generation benchmark platform
  • Provides a more detailed qualitative analysis of why the proposed self-adversarial learning paradigm can alleviate these problems in Appendix
Reference
  • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pp. 1171–1179, 2015.
    Google ScholarLocate open access versionFindings
  • Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu Song, and Yoshua Bengio. Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983, 2017.
    Findings
  • Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. Adversarial text generation via feature-mover’s distance. In Advances in Neural Information Processing Systems, pp. 4666–4677, 2018.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017.
    Findings
  • Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow. On distinguishability criteria for estimating generative models. arXiv preprint arXiv:1412.6515, 2014.
    Findings
  • Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
    Google ScholarLocate open access versionFindings
  • Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735–1780, 1997.
    Google ScholarLocate open access versionFindings
  • Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard gan. arXiv preprint arXiv:1807.00734, 2018.
    Findings
  • Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
    Findings
  • Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
    Findings
  • Matt J Kusner and José Miguel Hernández-Lobato. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051, 2016.
    Findings
  • John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 817–824.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. Adversarial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2157–2169, 2017.
    Google ScholarLocate open access versionFindings
  • Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    Findings
  • Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. Adversarial ranking for language generation. In Advances in Neural Information Processing Systems, pp. 3155–3165, 2017.
    Google ScholarLocate open access versionFindings
  • Weili Nie, Nina Narodytska, and Ankit Patel. Relgan: Relational generative adversarial networks for text generation. 2018.
    Google ScholarFindings
  • Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics, 2002.
    Google ScholarLocate open access versionFindings
  • Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Self-critical sequence training for image captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017. doi: 10.1109/cvpr.2017.131. URL http://dx.doi.org/10.1109/CVPR.2017.131.
    Locate open access versionFindings
  • Zhan Shi, Xinchi Chen, Xipeng Qiu, and Xuanjing Huang. Towards diverse text generation with inverse reinforcement learning. arXiv preprint arXiv:1804.11258, 2018.
    Findings
  • David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
    Google ScholarLocate open access versionFindings
  • Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets with policy gradient. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, and Lawrence Carin. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 4006–4015. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • Wangchunshu Zhou and Ke Xu. Learning to compare for better training and evaluation of open domain text generation models. In Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
    Google ScholarLocate open access versionFindings
  • Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1097–1100. ACM, 2018.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments