Exploiting conversation structure in unsupervised topic segmentation for emails

EMNLP(2010)

引用 37|浏览36
暂无评分
摘要
This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.
更多
查看译文
关键词
email thread,lexical information,email conversation,email corpus,better model,conversation structure,existing topic segmentation model,finer level conversation structure,work concerns automatic topic,Latent Dirichlet Allocation,Exploiting conversation structure,unsupervised topic segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要