Automatic Twitter Topic Summarization With Speech Acts

IEEE Transactions on Audio, Speech, and Language Processing(2013)

引用 71|浏览65
暂无评分
摘要
With the growth of the social media service of Twitter, automatic summarization of Twitter messages (tweets) is in urgent need for efficient processing of the massive tweeted information. Unlike multi-document summarization in general, Twitter topic summarization must handle the numerous, short, dissimilar, and noisy nature of tweets. To address this challenge, we propose a novel speech act-guided summarization approach in this work. Speech acts characterize tweeters' communicative behavior and provide an organized view of their messages. Speech act recognition is a multi-class classification problem, which we solve by using word-based and symbol-based features that capture both the linguistic features of speech acts and the particularities of Twitter text. The recognized speech acts in tweets are then used to direct the extraction of key words and phrases to fill in templates designed for speech acts. Leveraging high-ranking words and phrases as well as topic information for major speech acts, we propose a round-robin algorithm to generate template-based summaries. Different from the extractive method adopted in most previous works, our summarization method is abstractive. Evaluated on two 100-topic datasets, the summaries generated by our method outperform two kinds of representative extractive summaries and rival human-written summaries in terms of explanatoriness and informativeness.
更多
查看译文
关键词
word-based features,automatic twitter topic summarization,twitter text,speech processing,extractive method,key word/phrase extraction,symbol-based features,pattern classification,speech act recognition,template-based summaries,twitter messages,abstractive summarization,human-written summaries,speech act-guided summarization approach,multidocument summarization,high-ranking words,twitter,social media service,social networking (online),document handling,communicative behavior,round-robin algorithm,speech act,multiclass classification problem,noise measurement,media,speech,pragmatics,round robin algorithm,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要