Few-shot learning for automated content analysis: Efficient coding of arguments and claims in the debate on arms deliveries to Ukraine
CoRR(2023)
摘要
Pre-trained language models (PLM) based on transformer neural networks
developed in the field of natural language processing (NLP) offer great
opportunities to improve automatic content analysis in communication science,
especially for the coding of complex semantic categories in large datasets via
supervised machine learning. However, three characteristics so far impeded the
widespread adoption of the methods in the applying disciplines: the dominance
of English language models in NLP research, the necessary computing resources,
and the effort required to produce training data to fine-tune PLMs. In this
study, we address these challenges by using a multilingual transformer model in
combination with the adapter extension to transformers, and few-shot learning
methods. We test our approach on a realistic use case from communication
science to automatically detect claims and arguments together with their stance
in the German news debate on arms deliveries to Ukraine. In three experiments,
we evaluate (1) data preprocessing strategies and model variants for this task,
(2) the performance of different few-shot learning methods, and (3) how well
the best setup performs on varying training set sizes in terms of validity,
reliability, replicability and reproducibility of the results. We find that our
proposed combination of transformer adapters with pattern exploiting training
provides a parameter-efficient and easily shareable alternative to fully
fine-tuning PLMs. It performs on par in terms of validity, while overall,
provides better properties for application in communication studies. The
results also show that pre-fine-tuning for a task on a near-domain dataset
leads to substantial improvement, in particular in the few-shot setting.
Further, the results indicate that it is useful to bias the dataset away from
the viewpoints of specific prominent individuals.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要