Multilingual Wikipedia, Summarization, and Information Trustworthiness

msra

引用 23|浏览25
暂无评分
摘要
Wikipedia is used as a corpus for a variety of text processing applications. It is especially popular for information selec- tion tasks, such as summarization feature identification, an- swer generation/verification, etc. Many Wikipedia entries (about people, events, locations, etc.) have descriptions in several languages. Often Wikipedia entry descriptions cre- ated in different languages exhibit differences in length and content. In this paper we show that the pattern of infor- mation overlap across the descriptions written in different languages for the same Wikipedia entry fits well the pyra- mid summary framework, i.e., some information facts are covered in the Wikipedia entry descriptions in many lan- guages, while others are covered in a handful number of descriptions. This phenomenon leads to a natural summa- rization algorithm which we present in this paper. Accord- ing to our evaluation, the generated summaries have a high level of user satisfaction. Moreover, the discovered pyramid structure of Wikipedia entry descriptions can be used for Wikipedia information trustworthiness verification.
更多
查看译文
关键词
multilinguality,summarization,wikipedia
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要