Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models

Anastasios Lamproudis,Aron Henriksson,Hercules Dalianis

HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF（2021）

引用 2|浏览0

暂无评分

摘要

Research has shown that using generic language models - specifically, BERT models - in specialized domains may be sub-optimal due to domain differences in language use and vocabulary. There are several techniques for developing domain-specific language models that leverage the use of existing generic language models, including continued and domain-adaptive pretraining with in-domain data. Here, we investigate a strategy based on using a domain-specific vocabulary, while leveraging a generic language model for initialization. The results demonstrate that domain-adaptive pretraining, in combination with a domain-specific vocabulary - as opposed to a general-domain vocabulary - yields improvements on two downstream clinical NLP tasks for Swedish. The results highlight the value of domain-adaptive pretraining when developing specialized language models and indicate that it is beneficial to adapt the vocabulary of the language model to the target domain prior to continued, domain-adaptive pretraining of a generic language model.

查看译文

关键词

Natural Language Processing, Language Models, Domain-adaptive Pretraining, Clinical Text, Swedish

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要