How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?

ICWSM(2010)

引用 319|浏览72
暂无评分
摘要
Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: re- searchers are therefore, often forced to analyze a judiciously selected "sample" of the data. Like other social media phe- nomena, information diffusion is a social process-it is af- fected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena-information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activ- ity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal char- acteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables-search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user- context (e.g. location, activity) can improve on na¨ ive meth- ods by a significant margin of ∼15-20%.
更多
查看译文
关键词
social media,sampling methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要