Image Enhanced Event Detection in News Articles

national conference on artificial intelligence, 2020.

Cited by: 1|Bibtex|Views268|Links
Keywords:
Dual Recurrent Multimodal Modelevent triggernatural language inferenceevent detectionBERT+image obviously surpasses text-only methodMore(10+)
Weibo:
We contribute a supplement image dataset for Event Detection benchmark ACE2005, which can be further analyzed in related tasks such as event extraction

Abstract:

Event detection is a crucial and challenging sub-task of event extraction, which suffers from a severe ambiguity issue of trigger words. Existing works mainly focus on using textual context information, while there naturally exist many images accompanied by news articles that are yet to be explored. We believe that images not only reflect...More

Code:

Data:

0
Introduction
  • In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences.
  • Event trigger is the word that most clearly expresses the occurrence of an event (Doddington et al 2004).
  • In the left example in Figure 1, since confront indicates the occurrence of event Meet, it should be labeled as the event trigger of Meet.
  • A single word can trigger different events, and the surrounding contexts are often not informative enough to disambiguate them.
  • In Figure 1, the trigger word
Highlights
  • In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences
  • We propose to utilize accompanied images in news articles to enhance Event Detection
  • We contribute a supplement image dataset for Event Detection benchmark ACE2005, which can be further analyzed in related tasks such as event extraction
  • For image enhanced Event Detection, we propose a novel fusion method, Dual Recurrent Multimodal Model, which conducts a deeper connection between the two modalities and makes an event level interaction
  • We verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it
  • The results are compared with six baseline methods demonstrate effectiveness of Dual Recurrent Multimodal Model
Methods
  • DRMM outperforms previous state-of-the-art models, showing the effectiveness of introducing image modality into ED and the superiority of the proposed alternative dual attention.
  • Compared with VAD, which incorporates image dataset, the method improves F score by over 7%.
  • The authors' approach obviously outperforms ADDMBERT, which implies the superiority of multi-modality resources.
  • Another interesting phenomenon is that the proposed method principally achieves the highest recall.
  • Knowledge from image modality provides similarities to the event trigger’s distributional semantics with other training examples, and the model successfully retrieves more events
Results
  • 6 indicate that DRMM outperforms all of the common fusion approaches by over 1.5%.
  • Modality attention and co-attention are inferior to DRMM by ignoring the importance of contextual information, which emphasized by several approaches in the fusion process (Atrey, Kankanhalli, and Jain 2006).
  • Table 4 gives example cases about how image modality knowledge affects predictions of ED.
  • The image modality knowledge ”soldier, battlefield, explosion” helps disambiguate the
Conclusion
  • The authors propose to utilize accompanied images in news articles to enhance Event Detection.
  • The authors contribute a supplement image dataset for ED benchmark ACE2005, which can be further analyzed in related tasks such as event extraction.
  • For image enhanced ED, the authors propose a novel fusion method, DRMM, which conducts a deeper connection between the two modalities and makes an event level interaction.
  • The authors verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it.
  • The results are compared with six baseline methods demonstrate effectiveness of DRMM
Summary
  • Introduction:

    In Automatic Content Extraction (ACE), Event Detection (ED) aims to identify event triggers from sentences.
  • Event trigger is the word that most clearly expresses the occurrence of an event (Doddington et al 2004).
  • In the left example in Figure 1, since confront indicates the occurrence of event Meet, it should be labeled as the event trigger of Meet.
  • A single word can trigger different events, and the surrounding contexts are often not informative enough to disambiguate them.
  • In Figure 1, the trigger word
  • Methods:

    DRMM outperforms previous state-of-the-art models, showing the effectiveness of introducing image modality into ED and the superiority of the proposed alternative dual attention.
  • Compared with VAD, which incorporates image dataset, the method improves F score by over 7%.
  • The authors' approach obviously outperforms ADDMBERT, which implies the superiority of multi-modality resources.
  • Another interesting phenomenon is that the proposed method principally achieves the highest recall.
  • Knowledge from image modality provides similarities to the event trigger’s distributional semantics with other training examples, and the model successfully retrieves more events
  • Results:

    6 indicate that DRMM outperforms all of the common fusion approaches by over 1.5%.
  • Modality attention and co-attention are inferior to DRMM by ignoring the importance of contextual information, which emphasized by several approaches in the fusion process (Atrey, Kankanhalli, and Jain 2006).
  • Table 4 gives example cases about how image modality knowledge affects predictions of ED.
  • The image modality knowledge ”soldier, battlefield, explosion” helps disambiguate the
  • Conclusion:

    The authors propose to utilize accompanied images in news articles to enhance Event Detection.
  • The authors contribute a supplement image dataset for ED benchmark ACE2005, which can be further analyzed in related tasks such as event extraction.
  • For image enhanced ED, the authors propose a novel fusion method, DRMM, which conducts a deeper connection between the two modalities and makes an event level interaction.
  • The authors verify the quality of the image datasets supplement to ACE2005, and conduct a series of experiments on it.
  • The results are compared with six baseline methods demonstrate effectiveness of DRMM
Tables
  • Table1: Overall Performance on ACE2005 dataset (%)
  • Table2: Statistics of our image dataset
  • Table3: The performance of the language model with and without integration of images
  • Table4: Error analysis: When does the image modality knowledge improve ED? GT is the ground truth and event triggers are marked by underlined. For interpretability, we describe images from the perspective of people, background and action instead of showing the actual figure vector
  • Table5: The evaluation of image modality
  • Table6: Effectiveness of multimodal fusion in DRMM Fusion Methods Precision Recall F
Download tables as Excel
Related work
  • Event Detection (ED)

    In Automatic Content Extraction (ACE), event detection (ED) aims to detect event triggers (usually verbs or nouns) from unstructured news reports, which has a long history of research (Ahn 2006; Nguyen and Grishman 2018). ED serves as the fundamental task in information extraction, same as NER (Cao et al 2019) and entity linking (Cao et al 2017; 2018). Due to the flexibility and diversity of natural language, event triggers can be very ambiguous (Hogenboom et al 2011). The same event trigger can trigger different events in various contexts. Previous methods prove lexical and sentence-level information quite helpful for event detection (Ahn 2006; Nguyen and Grishman 2015).

    Several researchers further incorporate document-level information to disambiguate the event (Duan, He, and Zhao 2017; Chen et al 2018; Liu et al 2018b). Other researchers use multiple linguistic resources to enhance event semantic understanding. Liu et al (2018a) proposes a gated attention to dynamically integrate parallel training corpus from different languages. In addition, open-domain lexical database (WordNet, FrameNet) is adopted as extra auxiliary resources (Lu and Nguyen 2018; Liu et al 2016) or extra training datasets (Liu et al 2016; Wang et al 2019) to improve event detection performance.
Funding
  • This work is supported by the National Key Research and Development Program of China (2018YFB1005100 and 2018YFB1005101), NSFC key projects (U1736204, 61533018, 61661146007)
  • This research is part of NExT research which is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its IRC@SG Funding Initiative
Reference
  • Ahn, D. 2006. The stages of event extraction. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events, 1–8.
    Google ScholarLocate open access versionFindings
  • Atrey, P. K.; Kankanhalli, M. S.; and Jain, R. 2006. Information assimilation framework for event detection in multimedia surveillance systems. Multimedia systems 12(3):239– 253.
    Google ScholarLocate open access versionFindings
  • Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; and Schneider, N. 201Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 178– 186.
    Google ScholarLocate open access versionFindings
  • Cao, Y.; Huang, L.; Ji, H.; Chen, X.; and Li, J. 2017. Bridge text and knowledge by learning multi-prototype entity mention embedding. In ACL, 1623–1633.
    Google ScholarFindings
  • Cao, Y.; Hou, L.; Li, J.; and Liu, Z. 2018. Neural collective entity linking. In COLING, 675–686.
    Google ScholarLocate open access versionFindings
  • Cao, Y.; Hu, Z.; Chua, T.-s.; Liu, Z.; and Ji, H. 2019. Lowresource name tagging learned with weakly labeled data. In (EMNLP-IJCNLP), 261–270.
    Google ScholarLocate open access versionFindings
  • Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; and Zhao, J. 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In IJCNLP, volume 1, 167–176.
    Google ScholarLocate open access versionFindings
  • Chen, Y.; Yang, H.; Liu, K.; Zhao, J.; and Jia, Y. 201Collective event detection via a hierarchical and bias tagging networks with gated multi-level attention mechanisms. In EMNLP, 1267–1276.
    Google ScholarFindings
  • Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Doddington, G. R.; Mitchell, A.; Przybocki, M. A.; Ramshaw, L. A.; Strassel, S. M.; and Weischedel, R. M. 2004. The automatic content extraction (ace) program-tasks, data, and evaluation. In Lrec, volume 2, 1.
    Google ScholarLocate open access versionFindings
  • Duan, S.; He, R.; and Zhao, W. 2017. Exploiting document level information to improve event detection via recurrent neural networks. In IJCNLP, 352–361.
    Google ScholarLocate open access versionFindings
  • Elliott, D.; Frank, S.; and Hasler, E. 2015. Multi-language image description with neural sequence models. CoRR, abs/1510.04709.
    Findings
  • Feng, X.; Qin, B.; and Liu, T. 2018. A language-independent neural network for event detection. Science China Information Sciences 61(9):092106.
    Google ScholarLocate open access versionFindings
  • He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
    Google ScholarLocate open access versionFindings
  • Heo, Y.; Kang, S.; and Yoo, D. 2019. Multimodal neural machine translation with weakly labeled images. IEEE Access.
    Google ScholarFindings
  • Hogenboom, F.; Frasincar, F.; Kaymak, U.; and De Jong, F. 2011. An overview of event extraction from text. In ISWC, volume 779, 48–57. Citeseer.
    Google ScholarLocate open access versionFindings
  • Liu, S.; Chen, Y.; He, S.; Liu, K.; and Zhao, J. 2016. Leveraging framenet to improve automatic event detection. In ACL, volume 1, 2134–2143.
    Google ScholarLocate open access versionFindings
  • Liu, J.; Chen, Y.; Liu, K.; and Zhao, J. 2018a. Event detection via gated multilingual attention mechanism. Statistics 1000:1250.
    Google ScholarLocate open access versionFindings
  • Liu, S.; Cheng, R.; Yu, X.; and Cheng, X. 2018b. Exploiting contextual information via dynamic memory network for event detection. arXiv preprint arXiv:1810.03449.
    Findings
  • Lu, W., and Nguyen, T. H. 2018. Similar but not the same: Word sense disambiguation improves event detection via neural representation matching. In EMNLP, 4822–4828.
    Google ScholarFindings
  • Moon, S.; Neves, L.; and Carvalho, V. 2018. Multimodal named entity recognition for short social media posts. arXiv preprint arXiv:1802.07862.
    Findings
  • Nguyen, T. H., and Grishman, R. 2015. Event detection and domain adaptation with convolutional neural networks. In IJCNLP, volume 2, 365–371.
    Google ScholarLocate open access versionFindings
  • Nguyen, T. H., and Grishman, R. 2018. Graph convolutional networks with argument-aware pooling for event detection. In AAAI.
    Google ScholarFindings
  • Qian, C.; Zhu, X.; Ling, Z.-H.; Inkpen, D.; and Wei, S. 2017. Neural natural language inference models enhanced with external knowledge. arXiv preprint arXiv:1711.04289.
    Findings
  • Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L. u.; and Polosukhin, I. 2017. Attention is all you need. In NIPS. 5998–6008.
    Google ScholarFindings
  • Wang, X.; Han, X.; Liu, Z.; Sun, M.; and Li, P. 2019. Adversarial training for weakly supervised event detection. In NAACL.
    Google ScholarFindings
  • Wang, L.; Li, Y.; and Lazebnik, S. 2016. Learning deep structure-preserving image-text embeddings. In CVPR, 5005–5013.
    Google ScholarLocate open access versionFindings
  • Zhang, T.; Whitehead, S.; Zhang, H.; Li, H.; Ellis, J.; Huang, L.; Liu, W.; Ji, H.; and Chang, S.-F. 2017. Improving event extraction via multimodal integration. In MM, 270–278. ACM.
    Google ScholarLocate open access versionFindings
  • Zhang, K.; Lv, G.; Wu, L.; Chen, E.; Liu, Q.; Wu, H.; and Wu, F. 2018. Image-enhanced multi-level sentence representation net for natural language inference. In ICDM, 747– 756. IEEE.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments