Actor Identification in Discourse: A Challenge for LLMs?
CoRR(2024)
摘要
The identification of political actors who put forward claims in public
debate is a crucial step in the construction of discourse networks, which are
helpful to analyze societal debates. Actor identification is, however, rather
challenging: Often, the locally mentioned speaker of a claim is only a pronoun
("He proposed that [claim]"), so recovering the canonical actor name requires
discourse understanding. We compare a traditional pipeline of dedicated NLP
components (similar to those applied to the related task of coreference) with a
LLM, which appears a good match for this generation task. Evaluating on a
corpus of German actors in newspaper reports, we find surprisingly that the LLM
performs worse. Further analysis reveals that the LLM is very good at
identifying the right reference, but struggles to generate the correct
canonical form. This points to an underlying issue in LLMs with controlling
generated output. Indeed, a hybrid model combining the LLM with a classifier to
normalize its output substantially outperforms both initial models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要