Image Enhanced Event Detection in News Articles
national conference on artificial intelligence, 2020.
Keywords:
Dual Recurrent Multimodal Modelevent triggernatural language inferenceevent detectionBERT+image obviously surpasses text-only methodMore(10+)
Weibo:
Abstract:
Event detection is a crucial and challenging sub-task of event extraction, which suffers from a severe ambiguity issue of trigger words. Existing works mainly focus on using textual context information, while there naturally exist many images accompanied by news articles that are yet to be explored. We believe that images not only reflect...More
Code:
Data:
Introduction
- In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences.
- Event trigger is the word that most clearly expresses the occurrence of an event (Doddington et al 2004).
- In the left example in Figure 1, since confront indicates the occurrence of event Meet, it should be labeled as the event trigger of Meet.
- A single word can trigger different events, and the surrounding contexts are often not informative enough to disambiguate them.
- In Figure 1, the trigger word
Highlights
- In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences
- We propose to utilize accompanied images in news articles to enhance Event Detection
- We contribute a supplement image dataset for Event Detection benchmark ACE2005, which can be further analyzed in related tasks such as event extraction
- For image enhanced Event Detection, we propose a novel fusion method, Dual Recurrent Multimodal Model, which conducts a deeper connection between the two modalities and makes an event level interaction
- We verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it
- The results are compared with six baseline methods demonstrate effectiveness of Dual Recurrent Multimodal Model
Methods
- DRMM outperforms previous state-of-the-art models, showing the effectiveness of introducing image modality into ED and the superiority of the proposed alternative dual attention.
- Compared with VAD, which incorporates image dataset, the method improves F score by over 7%.
- The authors' approach obviously outperforms ADDMBERT, which implies the superiority of multi-modality resources.
- Another interesting phenomenon is that the proposed method principally achieves the highest recall.
- Knowledge from image modality provides similarities to the event trigger’s distributional semantics with other training examples, and the model successfully retrieves more events
Results
- 6 indicate that DRMM outperforms all of the common fusion approaches by over 1.5%.
- Modality attention and co-attention are inferior to DRMM by ignoring the importance of contextual information, which emphasized by several approaches in the fusion process (Atrey, Kankanhalli, and Jain 2006).
- Table 4 gives example cases about how image modality knowledge affects predictions of ED.
- The image modality knowledge ”soldier, battlefield, explosion” helps disambiguate the
Conclusion
- The authors propose to utilize accompanied images in news articles to enhance Event Detection.
- The authors contribute a supplement image dataset for ED benchmark ACE2005, which can be further analyzed in related tasks such as event extraction.
- For image enhanced ED, the authors propose a novel fusion method, DRMM, which conducts a deeper connection between the two modalities and makes an event level interaction.
- The authors verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it.
- The results are compared with six baseline methods demonstrate effectiveness of DRMM
Summary
Introduction:
In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences.- Event trigger is the word that most clearly expresses the occurrence of an event (Doddington et al 2004).
- In the left example in Figure 1, since confront indicates the occurrence of event Meet, it should be labeled as the event trigger of Meet.
- A single word can trigger different events, and the surrounding contexts are often not informative enough to disambiguate them.
- In Figure 1, the trigger word
Methods:
DRMM outperforms previous state-of-the-art models, showing the effectiveness of introducing image modality into ED and the superiority of the proposed alternative dual attention.- Compared with VAD, which incorporates image dataset, the method improves F score by over 7%.
- The authors' approach obviously outperforms ADDMBERT, which implies the superiority of multi-modality resources.
- Another interesting phenomenon is that the proposed method principally achieves the highest recall.
- Knowledge from image modality provides similarities to the event trigger’s distributional semantics with other training examples, and the model successfully retrieves more events
Results:
6 indicate that DRMM outperforms all of the common fusion approaches by over 1.5%.- Modality attention and co-attention are inferior to DRMM by ignoring the importance of contextual information, which emphasized by several approaches in the fusion process (Atrey, Kankanhalli, and Jain 2006).
- Table 4 gives example cases about how image modality knowledge affects predictions of ED.
- The image modality knowledge ”soldier, battlefield, explosion” helps disambiguate the
Conclusion:
The authors propose to utilize accompanied images in news articles to enhance Event Detection.- The authors contribute a supplement image dataset for ED benchmark ACE2005, which can be further analyzed in related tasks such as event extraction.
- For image enhanced ED, the authors propose a novel fusion method, DRMM, which conducts a deeper connection between the two modalities and makes an event level interaction.
- The authors verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it.
- The results are compared with six baseline methods demonstrate effectiveness of DRMM
Tables
- Table1: Overall Performance on ACE2005 dataset (%)
- Table2: Statistics of our image dataset
- Table3: The performance of the language model with and without integration of images
- Table4: Error analysis: When does the image modality knowledge improve ED? GT is the ground truth and event triggers are marked by underlined. For interpretability, we describe images from the perspective of people, background and action instead of showing the actual figure vector
- Table5: The evaluation of image modality
- Table6: Effectiveness of multimodal fusion in DRMM Fusion Methods Precision Recall F
Related work
- Event Detection (ED)
In Automatic Content Extraction (ACE), event detection (ED) aims to detect event triggers (usually verbs or nouns) from unstructured news reports, which has a long history of research (Ahn 2006; Nguyen and Grishman 2018). ED serves as the fundamental task in information extraction, same as NER (Cao et al 2019) and entity linking (Cao et al 2017; 2018). Due to the flexibility and diversity of natural language, event triggers can be very ambiguous (Hogenboom et al 2011). The same event trigger can trigger different events in various contexts. Previous methods prove lexical and sentence-level information quite helpful for event detection (Ahn 2006; Nguyen and Grishman 2015).
Several researchers further incorporate document-level information to disambiguate the event (Duan, He, and Zhao 2017; Chen et al 2018; Liu et al 2018b). Other researchers use multiple linguistic resources to enhance event semantic understanding. Liu et al (2018a) proposes a gated attention to dynamically integrate parallel training corpus from different languages. In addition, open-domain lexical database (WordNet, FrameNet) is adopted as extra auxiliary resources (Lu and Nguyen 2018; Liu et al 2016) or extra training datasets (Liu et al 2016; Wang et al 2019) to improve event detection performance.
Funding
- This work is supported by the National Key Research and Development Program of China (2018YFB1005100 and 2018YFB1005101), NSFC key projects (U1736204, 61533018, 61661146007)
- This research is part of NExT research which is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IRC@SG Funding Initiative
Reference
- Ahn, D. 2006. The stages of event extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, 1–8.
- Atrey, P. K.; Kankanhalli, M. S.; and Jain, R. 2006. Information assimilation framework for event detection in multimedia surveillance systems. Multimedia systems 12(3):239– 253.
- Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; and Schneider, N. 201Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 178– 186.
- Cao, Y.; Huang, L.; Ji, H.; Chen, X.; and Li, J. 2017. Bridge text and knowledge by learning multi-prototype entity mention embedding. In ACL, 1623–1633.
- Cao, Y.; Hou, L.; Li, J.; and Liu, Z. 2018. Neural collective entity linking. In COLING, 675–686.
- Cao, Y.; Hu, Z.; Chua, T.-s.; Liu, Z.; and Ji, H. 2019. Lowresource name tagging learned with weakly labeled data. In (EMNLP-IJCNLP), 261–270.
- Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; and Zhao, J. 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In IJCNLP, volume 1, 167–176.
- Chen, Y.; Yang, H.; Liu, K.; Zhao, J.; and Jia, Y. 201Collective event detection via a hierarchical and bias tagging networks with gated multi-level attention mechanisms. In EMNLP, 1267–1276.
- Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Doddington, G. R.; Mitchell, A.; Przybocki, M. A.; Ramshaw, L. A.; Strassel, S. M.; and Weischedel, R. M. 2004. The automatic content extraction (ace) program-tasks, data, and evaluation. In Lrec, volume 2, 1.
- Duan, S.; He, R.; and Zhao, W. 2017. Exploiting document level information to improve event detection via recurrent neural networks. In IJCNLP, 352–361.
- Elliott, D.; Frank, S.; and Hasler, E. 2015. Multi-language image description with neural sequence models. CoRR, abs/1510.04709.
- Feng, X.; Qin, B.; and Liu, T. 2018. A language-independent neural network for event detection. Science China Information Sciences 61(9):092106.
- He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
- Heo, Y.; Kang, S.; and Yoo, D. 2019. Multimodal neural machine translation with weakly labeled images. IEEE Access.
- Hogenboom, F.; Frasincar, F.; Kaymak, U.; and De Jong, F. 2011. An overview of event extraction from text. In ISWC, volume 779, 48–57. Citeseer.
- Liu, S.; Chen, Y.; He, S.; Liu, K.; and Zhao, J. 2016. Leveraging framenet to improve automatic event detection. In ACL, volume 1, 2134–2143.
- Liu, J.; Chen, Y.; Liu, K.; and Zhao, J. 2018a. Event detection via gated multilingual attention mechanism. Statistics 1000:1250.
- Liu, S.; Cheng, R.; Yu, X.; and Cheng, X. 2018b. Exploiting contextual information via dynamic memory network for event detection. arXiv preprint arXiv:1810.03449.
- Lu, W., and Nguyen, T. H. 2018. Similar but not the same: Word sense disambiguation improves event detection via neural representation matching. In EMNLP, 4822–4828.
- Moon, S.; Neves, L.; and Carvalho, V. 2018. Multimodal named entity recognition for short social media posts. arXiv preprint arXiv:1802.07862.
- Nguyen, T. H., and Grishman, R. 2015. Event detection and domain adaptation with convolutional neural networks. In IJCNLP, volume 2, 365–371.
- Nguyen, T. H., and Grishman, R. 2018. Graph convolutional networks with argument-aware pooling for event detection. In AAAI.
- Qian, C.; Zhu, X.; Ling, Z.-H.; Inkpen, D.; and Wei, S. 2017. Neural natural language inference models enhanced with external knowledge. arXiv preprint arXiv:1711.04289.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L. u.; and Polosukhin, I. 2017. Attention is all you need. In NIPS. 5998–6008.
- Wang, X.; Han, X.; Liu, Z.; Sun, M.; and Li, P. 2019. Adversarial training for weakly supervised event detection. In NAACL.
- Wang, L.; Li, Y.; and Lazebnik, S. 2016. Learning deep structure-preserving image-text embeddings. In CVPR, 5005–5013.
- Zhang, T.; Whitehead, S.; Zhang, H.; Li, H.; Ellis, J.; Huang, L.; Liu, W.; Ji, H.; and Chang, S.-F. 2017. Improving event extraction via multimodal integration. In MM, 270–278. ACM.
- Zhang, K.; Lv, G.; Wu, L.; Chen, E.; Liu, Q.; Wu, H.; and Wu, F. 2018. Image-enhanced multi-level sentence representation net for natural language inference. In ICDM, 747– 756. IEEE.
Tags
Comments