Authorship Attribution of Literary Texts Using Named Entity Masking and MaxLogit-Based Sequence Classification for Varying Text Lengths.

ICAISC (1)(2023)

引用 0|浏览0
暂无评分
摘要
This paper explores the problem of identifying an author based on text passages of varying length, ranging from 100 to 2,000 words. The study builds on previous research on authorship attribution of Polish literary texts, finding that the TF-IDF with multilayer perceptron outperforms other techniques. The study investigates whether the issue with BERT in authorship attribution can be mitigated by removing named entities from the input data and replacing posteriori probabilities with logits in sequence classification. The results demonstrate that machine learning methods are capable of almost perfect authorship attribution on short texts, and the proposed MaxLogit approach significantly improves results. However, except in the case of short passages up to 400 words, better results are obtained with TF-IDF than with BERT. The study concludes with a discussion of the results and suggestions for future research.
更多
查看译文
关键词
authorship attribution,literary texts,named entity masking,maxlogit-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要