Two Approaches to Diachronic Normalization of Polish Texts
CoRR(2024)
摘要
This paper discusses two approaches to the diachronic normalization of Polish
texts: a rule-based solution that relies on a set of handcrafted patterns, and
a neural normalization model based on the text-to-text transfer transformer
architecture. The training and evaluation data prepared for the task are
discussed in detail, along with experiments conducted to compare the proposed
normalization solutions. A quantitative and qualitative analysis is made. It is
shown that at the current stage of inquiry into the problem, the rule-based
solution outperforms the neural one on 3 out of 4 variants of the prepared
dataset, although in practice both approaches have distinct advantages and
disadvantages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要