Combining co-clustering with noise detection for theme-based summarization

TSLP(2014)

引用 10|浏览45
暂无评分
摘要
To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two co-clustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in theme-based summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm.
更多
查看译文
关键词
noise-detecting co-clustering approach,real-world datasets,interactive clustering,noise-detecting integrated algorithm,theme-based summarization,combining co-clustering,noise detection algorithm,cluster sentence,noise detection,sentence clustering,integrated clustering,co-clustering framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要