Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution

SN Comput. Sci.（2021）

引用 1|浏览0

暂无评分

摘要

Writing style in written language is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. The recent work in neural network based style analysis mainly lacks the multi-level modeling of writing style. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more in capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature. Additionally, We adopt a transfer learning approach and use deep contextualized word representation (ELMo) in our model to measure the impact of lower level linguistic representations versus higher level linguistic representations of ELMo in the task of authorship attribution. According to our experimental results, lower level linguistic representations which mainly carry syntactic information demonstrate better performance in authorship attribution task when compared to higher level linguistic representations which mainly carry semantic information.

查看译文

关键词

Document analysis,Natural language processing,Authorship attribution,Syntax encoding,Deep neural networks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要