Tracking Set Correlations At Large Scale

SIGMOD/PODS'14: International Conference on Management of Data Snowbird Utah USA June, 2014(2014)

引用 8|浏览12
暂无评分
摘要
In this work, we consider the continuous computation of correlations between co-occurring tags that appear in messages published in social media streams. The vast amount and pace these messages are created makes it necessary to parallelise the computation of correlations to various nodes in a computing cluster. The main challenge in this is to ensure that each node will compute a subset of the coefficients and every coefficient will be computed by some node. The core task is to continuously create and maintain partitions of the tags and forward the incoming messages based on them. Our approach proposes and evaluates several algorithms that partition the tags to the nodes while at the same time they minimise the replication of tags to the nodes and balance the load on them. The proposed framework is implemented in Java within the Storm stream processing platform. We evaluate the partitioning algorithms and validate the feasibility of our approach through a thorough experimental study performed using real data.
更多
查看译文
关键词
Distributed Stream Processing,Partitioning,Set Correlations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要