Predicting lifespans of popular tweets in microblog

SIGIR, pp. 1129-1130, 2012.

Cited by: 21|Bibtex|Views24|Links
EI
Keywords:
post timetime serieseffective approachfirst-hour retweeting informationpredicting lifespansMore(7+)
Weibo:
Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in our future work

Abstract:

In microblog like Twitter, popular tweets are usually retweeted by many users. For different tweets, their lifespans (i.e., how long they will stay popular) vary. This paper presents a simple yet effective approach to predict the lifespans of popular tweets based on their static characteristics and dynamic retweeting patterns. For a poten...More

Code:

Data:

0
Introduction
  • Microblog has become one of the most popular social network services, used by billions of people all over the world.
  • After a tweet is posted, the authors collect its retweeting amount within the first one hour.
  • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
  • The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.
Highlights
  • Nowadays, microblog has become one of the most popular social network services, used by billions of people all over the world
  • We explore the use of historic tweets with similar static and dynamic characteristics for prediction
  • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and we formulate its retweeting amounts at different time intervals of the first one hour into a time series
  • We predict its lifespan based on historic tweets with similar characteristics
  • We evaluated our lifespan prediction approach on the real data set from Tencent Microblog
  • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in our future work
Results
  • The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
  • After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
  • Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
  • The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
  • Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
  • The authors predict its lifespan based on historic tweets with similar characteristics.
  • The time when a tweet is posted impacts its retweeting pattern.
  • The retweeting pattern of a tweet posted at 5 am is different from that
  • The authors gather historic tweets with the same author and post time as the one to be predicted.
  • Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
  • The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
  • Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.
Conclusion
  • Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
  • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
  • 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
  • 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.
Summary
  • Microblog has become one of the most popular social network services, used by billions of people all over the world.
  • After a tweet is posted, the authors collect its retweeting amount within the first one hour.
  • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
  • The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.
  • The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
  • After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
  • Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
  • The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
  • Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
  • The authors predict its lifespan based on historic tweets with similar characteristics.
  • The time when a tweet is posted impacts its retweeting pattern.
  • The retweeting pattern of a tweet posted at 5 am is different from that
  • The authors gather historic tweets with the same author and post time as the one to be predicted.
  • Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
  • The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
  • Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.
  • Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
  • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
  • 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
  • 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.
Tables
  • Table1: Performance Comparison (RMSE ) tweets with tweets with tweets with fhrc>=100 fhrc>=500 fhrc>=1000
Download tables as Excel
Funding
  • The work is supported by National Natural Science Foundation of China (60773156, 61073004), Chinese Major State Basic Research Development 973 Program (2011CB3022032), Important National Science & Technology Specific Program (2011ZX01042-001-002-2), and the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology
Reference
  • P. Cui, F. Wang, S. Liu, M. Ou, S. Yang, and L. Sun. Who should share what? item-level social influence prediction for users and posts ranking. In SIGIR, pages 185–194, 2011.
    Google ScholarLocate open access versionFindings
  • L. Hong, O. Dan, and B. Davison. Predicting popular messages in twitter. In WWW, pages 57–58, 2011.
    Google ScholarLocate open access versionFindings
  • M. Wilk and R. Gnanadesikan. Probability plotting methods for the analysis for the analysis of data. Biometrika, 55(1):1–17, 1968.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments