An Interpretable Authorship Attribution Algorithm Based on Distance-Related Characterizations of Tokens

Victor Lomas, Michelle Reyes,Antonio Neme

ADVANCES IN SOFT COMPUTING, MICAI 2023, PT II（2024）

引用 0|浏览3

暂无评分

摘要

Natural Language Processing has focused its efforts in sentiment analysis, token categorization, topic identification, translation, authorship attribution and many other relevant and useful tasks. In this contribution, we describe an algorithm able to characterize texts in a feature space from which, among other tasks, the authorship attribution problem can be tackled. Although several deep learning architectures have shown good results, the solution they offer is usually hard to interpret and the explanation about the attribution is opaque. In our algorithm, each token is characterized in terms of both, the number of tokens that separate it from its previous appearance within the text, and the number of words that separate it from the last novel token. A novel token is an element that appears for the first time in the context of the analyzed writing. Following the proposed approach, we analyzed hundreds of texts from dozens of writers. The embeddings created by our proposal allows classifiers to correctly attribute the authorship of a novel text, as shown by several tests. Our proposal achieves identification metrics similar, and in some cases, better than state-of-the-art models. Equally relevant, our method is interpretable and shows a far lower computational complexity than deep learning architectures as Large Language Models.

查看译文

关键词

authorship attribution,unsupervised learning,embeddings

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要