Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
CoRR(2024)
摘要
With the advent of large language models (LLM), the line between
human-crafted and machine-generated texts has become increasingly blurred. This
paper delves into the inquiry of identifying discernible and unique linguistic
properties in texts that were written by humans, particularly uncovering the
underlying discourse structures of texts beyond their surface structures.
Introducing a novel methodology, we leverage hierarchical parse trees and
recursive hypergraphs to unveil distinctive discourse patterns in texts
produced by both LLMs and humans. Empirical findings demonstrate that, although
both LLMs and humans generate distinct discourse patterns influenced by
specific domains, human-written texts exhibit more structural variability,
reflecting the nuanced nature of human writing in different domains. Notably,
incorporating hierarchical discourse features enhances binary classifiers'
overall performance in distinguishing between human-written and
machine-generated texts, even on out-of-distribution and paraphrased samples.
This underscores the significance of incorporating hierarchical discourse
features in the analysis of text patterns. The code and dataset will be
available at [TBA].
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要