Predicting lifespans of popular tweets in microblog

    SIGIR, pp. 1129-1130, 2012.

    Cited by: 21|Bibtex|Views17|Links
    EI
    Keywords:
    post timetime serieseffective approachfirst-hour retweeting informationpredicting lifespansMore(7+)
    Wei bo:
    Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in our future work

    Abstract:

    In microblog like Twitter, popular tweets are usually retweeted by many users. For different tweets, their lifespans (i.e., how long they will stay popular) vary. This paper presents a simple yet effective approach to predict the lifespans of popular tweets based on their static characteristics and dynamic retweeting patterns. For a poten...More

    Code:

    Data:

    0
    Introduction
    • Microblog has become one of the most popular social network services, used by billions of people all over the world.
    • After a tweet is posted, the authors collect its retweeting amount within the first one hour.
    • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
    • The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.
    Highlights
    • Nowadays, microblog has become one of the most popular social network services, used by billions of people all over the world
    • We explore the use of historic tweets with similar static and dynamic characteristics for prediction
    • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and we formulate its retweeting amounts at different time intervals of the first one hour into a time series
    • We predict its lifespan based on historic tweets with similar characteristics
    • We evaluated our lifespan prediction approach on the real data set from Tencent Microblog
    • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in our future work
    Results
    • The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
    • After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
    • Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
    • The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
    • Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
    • The authors predict its lifespan based on historic tweets with similar characteristics.
    • The time when a tweet is posted impacts its retweeting pattern.
    • The retweeting pattern of a tweet posted at 5 am is different from that
    • The authors gather historic tweets with the same author and post time as the one to be predicted.
    • Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
    • The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
    • Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.
    Conclusion
    • Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
    • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
    • 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
    • 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.
    Summary
    • Microblog has become one of the most popular social network services, used by billions of people all over the world.
    • After a tweet is posted, the authors collect its retweeting amount within the first one hour.
    • If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
    • The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.
    • The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
    • After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
    • Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
    • The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
    • Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
    • The authors predict its lifespan based on historic tweets with similar characteristics.
    • The time when a tweet is posted impacts its retweeting pattern.
    • The retweeting pattern of a tweet posted at 5 am is different from that
    • The authors gather historic tweets with the same author and post time as the one to be predicted.
    • Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
    • The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
    • Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.
    • Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
    • Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
    • 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
    • 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.
    Tables
    • Table1: Performance Comparison (RMSE ) tweets with tweets with tweets with fhrc>=100 fhrc>=500 fhrc>=1000
    Download tables as Excel
    Funding
    • The work is supported by National Natural Science Foundation of China (60773156, 61073004), Chinese Major State Basic Research Development 973 Program (2011CB3022032), Important National Science & Technology Specific Program (2011ZX01042-001-002-2), and the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology
    Reference
    • P. Cui, F. Wang, S. Liu, M. Ou, S. Yang, and L. Sun. Who should share what? item-level social influence prediction for users and posts ranking. In SIGIR, pages 185–194, 2011.
      Google ScholarLocate open access versionFindings
    • L. Hong, O. Dan, and B. Davison. Predicting popular messages in twitter. In WWW, pages 57–58, 2011.
      Google ScholarLocate open access versionFindings
    • M. Wilk and R. Gnanadesikan. Probability plotting methods for the analysis for the analysis of data. Biometrika, 55(1):1–17, 1968.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments