WeChat Mini Program
Old Version Features

CHIP 2023 Task Overview: Complex Information and Relation Extraction of Drug-Related Materials

Qian Chen,Jia Wang, Jia Xu, Bingxiang Ji

HEALTH INFORMATION PROCESSING EVALUATION TRACK PAPERS, CHIP 2023(2024)

Sinopharm Grp Digital Technol Shanghai Co Ltd

Cited 0|Views3
Abstract
Drug labels or package insert are legal documents that include significant information and are highly valuable. However, it contains both structured and unstructured information, which is challenging to extract. We construct a drug package insert information extraction dataset, which consists of 1,000 electronic files with a total of 17,000 structured fields and 24,580 entity relationships annotated. It is used in the CHIP2023 "Complex Information and Relation Extraction of Drug-related Materials" evaluation competition, in order to promote the development of printed material recognition and entity relationship extraction technology. Participants need to recognize structured fields from dataset and extract entity relationship from specified fields. Finally, we provide a concise overview and outline of their methods and discuss the potential value of the dataset in further study.
More
Translated text
Key words
Visual Document Understanding,Document Information Extraction,Optical Character Recognition,relation extraction
求助PDF
上传PDF
Bibtex
收藏
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Upload PDF to Generate Summary
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined