# Predicting lifespans of popular tweets in microblog

SIGIR, pp. 1129-1130, 2012.

EI

Keywords:

Wei bo:

Abstract:

In microblog like Twitter, popular tweets are usually retweeted by many users. For different tweets, their lifespans (i.e., how long they will stay popular) vary. This paper presents a simple yet effective approach to predict the lifespans of popular tweets based on their static characteristics and dynamic retweeting patterns. For a poten...More

Code:

Data:

Introduction

- Microblog has become one of the most popular social network services, used by billions of people all over the world.
- After a tweet is posted, the authors collect its retweeting amount within the first one hour.
- If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
- The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.

Highlights

- Nowadays, microblog has become one of the most popular social network services, used by billions of people all over the world
- We explore the use of historic tweets with similar static and dynamic characteristics for prediction
- If the amount is greater than a certain threshold, it is regarded as a popular tweet, and we formulate its retweeting amounts at different time intervals of the first one hour into a time series
- We predict its lifespan based on historic tweets with similar characteristics
- We evaluated our lifespan prediction approach on the real data set from Tencent Microblog
- Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in our future work

Results

- The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
- After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
- Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
- The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
- Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
- The authors predict its lifespan based on historic tweets with similar characteristics.
- The time when a tweet is posted impacts its retweeting pattern.
- The retweeting pattern of a tweet posted at 5 am is different from that
- The authors gather historic tweets with the same author and post time as the one to be predicted.
- Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
- The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
- Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.

Conclusion

- Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
- Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
- 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
- 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.

Summary

- Microblog has become one of the most popular social network services, used by billions of people all over the world.
- After a tweet is posted, the authors collect its retweeting amount within the first one hour.
- If the amount is greater than a certain threshold, it is regarded as a popular tweet, and the authors formulate its retweeting amounts at different time intervals of the first one hour into a time series.
- The authors identify a set of historic tweets with the same author and post time, and find from them the top-k tweets with similar retweeting patterns.
- The authors evaluate the method on the real data set from Tencent Microblog, which has the largest number of users in China.
- After a tweet is posted, its retweeting amount at the beginning hours reflects whether the tweet is popular or not within the few days.
- Almost 100% of tweets are rarely retweeted after 72 hours since they are posted.
- The retweeting amount of tweets almost achieves the maximum within the first 72 hours after posted.
- Its lifespan is the hours that have passed when its retweeting amount reaches 95% of the total retweeting amount.
- The authors predict its lifespan based on historic tweets with similar characteristics.
- The time when a tweet is posted impacts its retweeting pattern.
- The retweeting pattern of a tweet posted at 5 am is different from that
- The authors gather historic tweets with the same author and post time as the one to be predicted.
- Top-k historic tweets are found from the candidate tweet set based on the similar dynamic retweeting pattern.
- The authors evaluated the lifespan prediction approach on the real data set from Tencent Microblog.
- Let X be the total number of tweets for prediction, Lif eSpanp(Ti) and Lif eSpant(Ti) be the predicted and real lifespan of the i-th tweet.
- Fhrc is the retweeting amount of a tweet during the first hour after it is posted.
- Due to the insufficient number of historic tweets and the average method, the predicted value is less than the real one for tweets with over 50-hour lifespan in the data set, which will be considered in the future work.
- 1) The proposed ATR-kNN approach outperforms the baseline algorithms, achieving the best performance under different fhrc values.
- 2) The two static characteristics emphasized in the approach do contribute to the prediction performance, and the author requirement plays a more important role than the post time of the tweets.

- Table1: Performance Comparison (RMSE ) tweets with tweets with tweets with fhrc>=100 fhrc>=500 fhrc>=1000

Funding

- The work is supported by National Natural Science Foundation of China (60773156, 61073004), Chinese Major State Basic Research Development 973 Program (2011CB3022032), Important National Science & Technology Specific Program (2011ZX01042-001-002-2), and the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology

Reference

- P. Cui, F. Wang, S. Liu, M. Ou, S. Yang, and L. Sun. Who should share what? item-level social influence prediction for users and posts ranking. In SIGIR, pages 185–194, 2011.
- L. Hong, O. Dan, and B. Davison. Predicting popular messages in twitter. In WWW, pages 57–58, 2011.
- M. Wilk and R. Gnanadesikan. Probability plotting methods for the analysis for the analysis of data. Biometrika, 55(1):1–17, 1968.

Tags

Comments