Topic Segmentation for Dialogue Stream

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(2019)

引用 5|浏览3
暂无评分
摘要
Topic segmentation, which aims to divide a document into topic blocks, is a fundamental task in natural language processing. Most of the previous researches focus on written text rather than dialogue text. However, dialogue text has its unique characteristic and is more challenging in topic segmentation. The existing neural models for topic segmentation are usually built on RNN or CNN, which are competent in written text but has a poor performance in dialogue text. We argue that a better segmentation result for dialogue text requires a better semantic representation of sentences. In this paper, we formulate topic segmentation as a sequence labeling task and propose a model based on BERT and TCN (Temporal Convolutional Network) to accomplish the task. We also present three datasets, including two dialogue datasets and a news dataset, to evaluate the model's performance. Compared to the previous best model, our model shows an absolute performance improvement of 8% - 17% in F-1 scores. Moreover, we explore the impact of importing speakers on dialogue text segmentation, the experiment result shows that the additional speaker information could effectively improve the segmentation performance.
更多
查看译文
关键词
topic segmentation,written text,dialogue datasets,dialogue text segmentation,dialogue stream,topic blocks,RNN,CNN,BERT,TCN,Temporal Convolutional Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要