DATAMAFIA at WNUT-2020 Task 2 - A Study of Pre-trained Language Models along with Regularization Techniques for Downstream Tasks.

W-NUT@EMNLP(2020)

引用 3|浏览3
暂无评分
摘要
This document describes the system description developed by team datamafia at WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. This paper contains a thorough study of pre-trained language models on downstream binary classification task over noisy user generated Twitter data. The solution submitted to final test leaderboard is a fine tuned RoBERTa model which achieves F1 score of 90.8% and 89.4% on the dev and test data respectively. In the later part, we explore several techniques for injecting regularization explicitly into language models to generalize predictions over noisy data. Our experiments show that adding regularizations to RoBERTa pre-trained model can be very robust to data and annotation noises and can improve overall performance by more than 1.2%.
更多
查看译文
关键词
Neural Machine Translation,Language Modeling,Topic Modeling,Speaker Diarization,Machine Translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要