Discovering OLAP dimensions in semi-structured data

CIKM'12: 21st ACM International Conference on Information and Knowledge Management Maui Hawaii USA November, 2012(2012)

引用 71|浏览2
暂无评分
摘要
With the standard OLAP technology, cubes are constructed from the input data based on the available data fields and known relationships between them. Structuring the data into a set of numeric measures distributed along a set of uniformly structured dimensions may be unrealistic for applications dealing with semi-structured data. We propose to extend the capabilities of OLAP via content-driven discovery of measures and dimensional characteristics in the original dataset. New structural elements are discovered by means of data mining and other techniques and are therefore prone to changes as the underlying dataset evolves. In this work we focus on the challenge of generating, maintaining, and querying such discovered elements of the cube. We demonstrate the benefits of our approach by providing OLAP to the public stream of user-generated content of the popular microblogging service Twitter. We were able to enrich the original set by discovering dynamic characteristics such as user activity, popularity, messaging behavior, as well as classifying messages by topic, impact, origin, method of generation, etc. Application of knowledge discovery techniques coupled with human expertise enable structural enrichment of the original data beyond the scope of the existing methods for generating multidimensional models from relational or semi-structured data.
更多
查看译文
关键词
discovering olap dimension,original data,standard olap technology,original dataset,data mining,input data,semi-structured data,original set,knowledge discovery technique,available data field,content-driven discovery,semi structured data,olap
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要