Querying, exploring and mining the extended document

Querying, exploring and mining the extended document(2011)

引用 23|浏览1
暂无评分
摘要
The evolution of the Web into an interactive medium that encourages active user engagement has ignited a huge increase in the amount, complexity and diversity of available textual data. This evolution forces us to reevaluate our view of documents as simple pieces of text and of document collections as immutable and isolated. Extended documents published in the context of blogs, micro-blogs, on-line social networks, customer feedback portals, can be associated with a wealth of meta-data in addition to their textual component: tags, links, sentiment, entities mentioned in text, etc. Collections of user-generated documents grow, evolve, co-exist and interact: they are dynamic and integrated. These unique characteristics of modern documents and document collections present us with exciting opportunities for improving the way we interact with them. At the same time, this additional complexity combined with the vast amounts of available textual data present us with formidable computational challenges. In this context, we introduce, study and extensively evaluate an array of effective and efficient solutions for querying, exploring and mining extended documents, dynamic and integrated document collections. For collections of socially annotated extended documents, we present an improved probabilistic search and ranking approach based on our growing understanding of the dynamics of the social annotation process. For extended documents, such as blog posts, associated with entities extracted from text and categorical attributes, we enable their interactive exploration through the efficient computation of strong entity associations. Associated entities are computed for all possible attribute value restrictions of the document collection. For extended documents, such as user reviews, annotated with a numerical rating, we introduce a keyword-query refinement approach. The solution enables the interactive navigation and exploration of large result sets. We extend the skyline query to document streams, such as news articles, associated with categorical attributes and partially-ordered domains. The technique incrementally maintains a small set of recent, uniquely interesting extended documents from the stream. Finally, we introduce a solution for the scalable integration of structured data sources into Web search. Queries are analyzed in order to determine what structured data, if any, should be used to augment Web search results.
更多
查看译文
关键词
interesting extended document,categorical attribute,structured data source,available textual data,associated entity,modern document,integrated document collection,extended document,document collection,user-generated document
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要