Multimodal Storytelling via Generative Adversarial Imitation Learning

IJCAI, pp. 3967-3973, 2017.

Cited by: 7|Bibtex|Views8|Links
EI
Keywords:
imitation learninggenerative adversarial imitationevent storylinesingle modalitycritical challengeMore(7+)
Weibo:
We proposed a multimodal imitation learning approach for generating storyline on unseen events

Abstract:

Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumpti...More

Code:

Data:

0
Introduction
  • As the Internet becomes more pervasive, information overload becomes increasingly more severe.
  • Even with the help of search engines such as Google and Yahoo, people cannot understand a series of coherent news events.
  • A person who desires to learn about the 2016 Presidential Election needs iteratively search through several keywords many times and review numerous news documents so that he or she can generate a cohesive picture, e.g., knowledge graph.
  • By inferring the entity nodes connections, the original documents can be represented through a knowledge graph which consists of a set of storylines.
  • To model users’ preferred stories, it requires to understand the evolution patterns, not merely to keep strong coherence
Highlights
  • As the Internet becomes more pervasive, information overload becomes increasingly more severe
  • A person who desires to learn about the 2016 Presidential Election needs iteratively search through several keywords many times and review numerous news documents so that he or she can generate a cohesive picture, e.g., knowledge graph
  • Storytelling is an efficient way to solve this issue of information overload
  • By inferring the entity nodes connections, the original documents can be represented through a knowledge graph which consists of a set of storylines
  • Designing a multimodal model integrated with GAN based imitation learning: Inspired by human’s ability to link multiple entities through visual similarity, we propose a multimodal method across textual and visual modality with imitation learning
  • We proposed a multimodal imitation learning approach for generating storyline on unseen events
Results
  • Evaluation via user study

    Evaluating storytelling is a difficult task, due to the fact that there is no established golden standard, and even ground truth is hard to elicit.
  • A third-party user study via AMT is conducted, since it allows them to obtain accurate statistical information with a large sample size of users.
  • It aims to test whether the generated storyline matches users’ interests.
  • For each unit task in AMT, workers were asked to choose the best one among those generated candidates gave event background knowledge such as Wikipedia or key news articles: PG SS
Conclusion
  • The authors proposed a multimodal imitation learning approach for generating storyline on unseen events.
  • To avoid the reward function designing, GAN based imitation learning is introduced to learn the latent policy given users’ demonstrations.
  • To bridge the information gap between text and image, the model effectively integrates generative adversarial nets and multimodal learning via deterministic policy gradient.
  • Associating with multimodal perspective, the model succeeds to capture the latent patterns across different modalities, and reveal more satisfying storylines towards users’ interests
Summary
  • Introduction:

    As the Internet becomes more pervasive, information overload becomes increasingly more severe.
  • Even with the help of search engines such as Google and Yahoo, people cannot understand a series of coherent news events.
  • A person who desires to learn about the 2016 Presidential Election needs iteratively search through several keywords many times and review numerous news documents so that he or she can generate a cohesive picture, e.g., knowledge graph.
  • By inferring the entity nodes connections, the original documents can be represented through a knowledge graph which consists of a set of storylines.
  • To model users’ preferred stories, it requires to understand the evolution patterns, not merely to keep strong coherence
  • Results:

    Evaluation via user study

    Evaluating storytelling is a difficult task, due to the fact that there is no established golden standard, and even ground truth is hard to elicit.
  • A third-party user study via AMT is conducted, since it allows them to obtain accurate statistical information with a large sample size of users.
  • It aims to test whether the generated storyline matches users’ interests.
  • For each unit task in AMT, workers were asked to choose the best one among those generated candidates gave event background knowledge such as Wikipedia or key news articles: PG SS
  • Conclusion:

    The authors proposed a multimodal imitation learning approach for generating storyline on unseen events.
  • To avoid the reward function designing, GAN based imitation learning is introduced to learn the latent policy given users’ demonstrations.
  • To bridge the information gap between text and image, the model effectively integrates generative adversarial nets and multimodal learning via deterministic policy gradient.
  • Associating with multimodal perspective, the model succeeds to capture the latent patterns across different modalities, and reveal more satisfying storylines towards users’ interests
Tables
  • Table1: Similarity performance (T./I. denotes Text/Image respectively, T.I. means the combination of T. and I.)
  • Table2: Preference statistics from AMT
Download tables as Excel
Related work
  • Storytelling: The storyline generation problem was first studied by Kumar et al [Kumar et al, 2008] as a generic redescription mining technique, by which a series of redescription between the given disjoint and dissimilar object sets and corresponding subsets are discovered. Storytelling is an efficient way to solve the issue of information overload. By extracting critical and connected entities, the original document is structurally summarized. Current works contain two categories: Textual Storytelling[Kumar et al, 2008; Hossain et al, 2012; Fang et al, 2011; Voskarides et al, 2015; Lee et al, 2012; Shahaf et al, 2012a; 2012b; Lin et al, 2012] and Visual Storytelling[Kim et al, 2014; Park and Kim, 2015; Wang et al, 2012]. Few works are reported to extract storylines based on both text and image. Current methods often suggest assumptions between good storyline and explicit metrics, such as average similarity or weakest similarity of all the neighbor nodes. However, these assumptions limit the generating meaningful stories since a user may have unique notions of good storylines. A few researchers employ Latent Dirichlet Allocation(LDA)[Zhou et al, 2015; Huang and Huang, 2013] to extract stories in unsupervised fashion. However, it is difficult for LDA to accurately model sequential data.
Funding
  • Proposes a method, multimodal imitation learning via generative adversarial networks(MIL-GAN), to directly model users’ interests as reflected by various data
  • Focuses on directly imitating user-provided storylines rather than designing any indirect measures
  • Argues that the two similar storylines share the same structure in a certain embedding space
  • Introduces a typical Inverse Reinforcement Learning(IRL), imitation learning, to learn the latent policy
  • Proposes a multimodal method across textual and visual modality with imitation learning
Reference
  • [Arjovsky et al., 2017] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. ArXiv e-prints, jan 2017.
    Google ScholarFindings
  • [Bengio et al., 2015] Samy Bengio, Oriol Vinyals, et al. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, pages 1171–1179, 2015.
    Google ScholarLocate open access versionFindings
  • [Fang et al., 2011] Lujun Fang, Anish Das Sarma, Cong Yu, and Philip Bohannon. Rex: explaining relationships between entity pairs. VLDB, 5(3):241–252, 2011.
    Google ScholarLocate open access versionFindings
  • [Goodfellow et al., 2014] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Yoshua Bengio, et al. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
    Google ScholarLocate open access versionFindings
  • [He et al., 2016] Ji He, Mari Ostendorf, et al. Deep reinforcement learning with a combinatorial action space for predicting popular reddit threads. EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • [Ho and Ermon, 2016] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In NIPS, pages 4565–4573, 2016.
    Google ScholarLocate open access versionFindings
  • [Hossain et al., 2012] M Shahriar Hossain, Patrick Butler, Arnold P Boedihardjo, and Naren Ramakrishnan. Storytelling in entity networks to support intelligence analysts. In SIGKDD, pages 1375–1383. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • [Huang and Huang, 2013] Lifu Huang and Lian’en Huang. Optimized event storyline generation based on mixtureevent-aspect model. In EMNLP, pages 726–735, 2013.
    Google ScholarLocate open access versionFindings
  • [Kim et al., 2014] Gunhee Kim, Leonid Sigal, et al. Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In CVPR, pages 4225–4232, 2014.
    Google ScholarLocate open access versionFindings
  • [Kiros et al., 2015] Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
    Google ScholarLocate open access versionFindings
  • [Kumar et al., 2008] Deept Kumar, Naren Ramakrishnan, Richard F Helm, and Malcolm Potts. Algorithms for storytelling. KDD, 20(6):736–751, 2008.
    Google ScholarLocate open access versionFindings
  • [Lample and Chaplot, 2016] Guillaume Lample and Devendra Singh Chaplot. Playing fps games with deep reinforcement learning. AAAI, 2016.
    Google ScholarLocate open access versionFindings
  • [Lee et al., 2012] Heeyoung Lee, Marta Recasens, et al. Joint entity and event coreference resolution across documents. In EMNLP-CONLL, pages 489–500. Association for Computational Linguistics, 2012.
    Google ScholarLocate open access versionFindings
  • [Lin et al., 2012] Chen Lin, Chun Lin, et al. Generating event storylines from microblogs. In CIKM, pages 175– 184. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • [Mnih et al., 2015] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
    Google ScholarLocate open access versionFindings
  • [Narasimhan et al., 2016] Karthik Narasimhan, Adam Yala, et al. Improving information extraction by acquiring external evidence with reinforcement learning. EMNLP, 2016.
    Google ScholarLocate open access versionFindings
  • [Ng et al., 2000] Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In ICML, pages 663–670, 2000.
    Google ScholarLocate open access versionFindings
  • [Ngiam et al., 2011] Jiquan Ngiam, Aditya Khosla, et al. Multimodal deep learning. In ICML, pages 689–696, 2011.
    Google ScholarLocate open access versionFindings
  • [Oh et al., 2015] Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, et al. Action-conditional video prediction using deep networks in atari games. In NIPS, pages 2863–2871, 2015.
    Google ScholarLocate open access versionFindings
  • [Park and Kim, 2015] Cesc C Park and Gunhee Kim. Expressing an image stream with a sequence of natural sentences. In NIPS, pages 73–81, 2015.
    Google ScholarLocate open access versionFindings
  • [Pomerleau, 1991] Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
    Google ScholarLocate open access versionFindings
  • [Russell, 1998] Stuart Russell. Learning agents for uncertain environments. In Proceedings of the eleventh annual conference on Computational learning theory, pages 101– 103. ACM, 1998.
    Google ScholarLocate open access versionFindings
  • [Shahaf and Guestrin, 2010] Dafna Shahaf and Carlos Guestrin. Connecting the dots between news articles. In SIGKDD, pages 623–632. ACM, 2010.
    Google ScholarLocate open access versionFindings
  • [Shahaf et al., 2012a] Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. Metro maps of science. In SIGKDD, pages 1122–1130. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • [Shahaf et al., 2012b] Dafna Shahaf, Carlos Guestrin, and Eric Horvitz. Trains of thought: Generating information maps. In WWW, pages 899–908. ACM, 2012.
    Google ScholarLocate open access versionFindings
  • [Silver et al., 2016] David Silver, Aja Huang, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
    Google ScholarLocate open access versionFindings
  • [Srivastava and Salakhutdinov, 2012] Nitish Srivastava and Ruslan R Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, pages 2222–2230, 2012.
    Google ScholarLocate open access versionFindings
  • [Voskarides et al., 2015] Nikos Voskarides, Edgar Meij, et al. Learning to explain entity relationships in knowledge graphs. 2015.
    Google ScholarFindings
  • [Wang et al., 2012] Dingding Wang, Tao Li, and Mitsunori Ogihara. Generating pictorial storylines via minimumweight connected dominating set approximation in multiview graphs. In AAAI. Citeseer, 2012.
    Google ScholarLocate open access versionFindings
  • [Williams, 1992] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
    Google ScholarLocate open access versionFindings
  • [Yu et al., 2017] L Yu, W Zhang, et al. Seqgan: sequence generative adversarial nets with policy gradient. volume AAAI, 2017.
    Google ScholarFindings
  • [Zhang and LeCun, 2015] Xiang Zhang and Yann LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.
    Findings
  • [Zhou et al., 2015] Deyu Zhou, Liangyu Chen, et al. An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In AAAI, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments