The Radiation Oncology NLP Database
CoRR(2024)
摘要
We present the Radiation Oncology NLP Database (ROND), the first dedicated
Natural Language Processing (NLP) dataset for radiation oncology, an important
medical specialty that has received limited attention from the NLP community in
the past. With the advent of Artificial General Intelligence (AGI), there is an
increasing need for specialized datasets and benchmarks to facilitate research
and development. ROND is specifically designed to address this gap in the
domain of radiation oncology, a field that offers many opportunities for NLP
exploration. It encompasses various NLP tasks including Logic Reasoning, Text
Classification, Named Entity Recognition (NER), Question Answering (QA), Text
Summarization, and Patient-Clinician Conversations, each with a distinct focus
on radiation oncology concepts and application cases. In addition, we have
developed an instruction-tuning dataset consisting of over 20k instruction
pairs (based on ROND) and trained a large language model, CancerChat. This
serves to demonstrate the potential of instruction-tuning large language models
within a highly-specialized medical domain. The evaluation results in this
study could serve as baseline results for future research. ROND aims to
stimulate advancements in radiation oncology and clinical NLP by offering a
platform for testing and improving algorithms and models in a domain-specific
context. The ROND dataset is a joint effort of multiple U.S. health
institutions. The data is available at
https://github.com/zl-liu/Radiation-Oncology-NLP-Database.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要