NivaDuck - A Scalable Pipeline to Build a Database of Political Twitter Handles for India and the United States

SMSociety'20: International Conference on Social Media and Society Toronto ON Canada July, 2020(2020)

引用 15|浏览7
暂无评分
摘要
We present a scalable methodology to identify Twitter handles of politicians in a given region and test our framework in the context of Indian and US politics. The main contribution of our work is the list of the curated Twitter handles of 18500 Indian and 8000 US politicians. Our work leveraged machine learning-based classification and human verification to build a data set of Indian politicians on Twitter. We built NivaDuck, a highly precise, two-staged classification pipeline that leverages Twitter description text and tweet content to identify politicians. For India, we tested NivaDuck’s recall using Twitter handles of the members of the Indian parliament while for the US we used state and local level politicians in California state and San Diego county respectively. We found that while NivaDuck has lower recall scores, it produces large, diverse sets of politicians with precision exceeding 90 percent for the US dataset. We discuss the need for an ML-based, scalable method to compile such a dataset and its myriad use cases for the research community and its wide-ranging utilities for research in political communication on social media.
更多
查看译文
关键词
twitter, politics, archive, india, united states
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要