Assessing Candidate Preference through Web Browsing History

KDD, pp. 158-167, 2018.

Cited by: 2|Bibtex|Views18|Links
EI
Keywords:
social mediumuser populationmachine learningground truthtwitter userMore(9+)
Weibo:
We show the power of using Web browsing behavior for assessing candidate preference, in terms of day-to-day and state-to-state level predictions that elucidate the impact of exogenous effects such as the ‘Comey letter.’

Abstract:

Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of ch...More

Code:

Data:

0
Introduction
  • Understanding the candidate preference of voters leading up to a major election such as the 2016 U.S presidential election is an important but difficult task.
  • In light of the challenges presented by traditional polling, in this paper the authors undertake an exploration of an alternative approach for assessing candidate preference in the electorate.
  • The authors' study examines the issues and potential benefits of approaching this problem by using passively collected records of user activity on the Web. In particular, the authors undertake the first study to look at the relationship between Web browsing behavior and election candidate preference.
  • State-level prediction is important in the U.S due to the winner-take-all nature of the electoral college system at the state level
Highlights
  • Understanding the candidate preference of voters leading up to a major election such as the 2016 U.S presidential election is an important but difficult task
  • Our contributions are twofold: first, we elucidate the challenges involved in using browsing behavior to assess candidate preference and we present methods that can be used to overcome those challenges
  • We show that using Web browsing behavior, it is possible to predict candidate preference with accuracy equivalent to state of the art polling, but with the additional advantage that prediction can be made on a fine-grain both spatially and temporally
  • In Section 5.2, we discuss how the learning algorithms presented in Section 4 and how our state-level predictions compare to polls and actual election results
  • We observed that “social referrals”, i.e., visits to sites originated from social media, are more important to infer candidate preference than those originated from other sources, such as search engines and URLs directly typed into the browser
  • We show the power of using Web browsing behavior for assessing candidate preference, in terms of day-to-day and state-to-state level predictions that elucidate the impact of exogenous effects such as the ‘Comey letter.’
Methods
  • The authors' first challenge is to extract a set of features that effectively captures user preference from the user browsing logs described in Section 3.
  • The frequency with which the user visited each one of the top 500 sites in the US, according to Alexa.5.
  • The authors considered this the baseline feature vector.
  • The frequency with which each user visited sites when referred by social media sites.
  • Visits to the top 100 most visited sites according to the dataset were considered
Results
  • The authors show the performance of the method in practice. The authors start by presenting how the authors split the data in order to avoid overfitting (in Section 5.1).
  • In Section 5.2, the authors discuss how the learning algorithms presented in Section 4 and how the state-level predictions compare to polls and actual election results.
  • Figure 4 shows how the authors divided the dataset for purposes of training and testing.
  • The authors retained data from the first week for training.
  • For the purpose of training the model, the authors used 538 polls from September 10, both for estimating B(r ) and tuning the per-state logistic regression thresholds
Conclusion
  • In this paper the authors have presented the first method that uses the history of user visits to Web sites to assess individual preference for political candidates.
  • The authors pinpoint the challenges to be overcome in realizing this goal, chief among them dealing with temporal and regional heterogeneity in user populations, as well as overcoming the lack of individual-level ground truth labels for training.
  • With respect to the latter, the authors develop a new method allowing them to train a user-level classifier using only aggregate data.
  • We show the power of using Web browsing behavior for assessing candidate preference, in terms of day-to-day and state-to-state level predictions that elucidate the impact of exogenous effects such as the ‘Comey letter.’
Summary
  • Introduction:

    Understanding the candidate preference of voters leading up to a major election such as the 2016 U.S presidential election is an important but difficult task.
  • In light of the challenges presented by traditional polling, in this paper the authors undertake an exploration of an alternative approach for assessing candidate preference in the electorate.
  • The authors' study examines the issues and potential benefits of approaching this problem by using passively collected records of user activity on the Web. In particular, the authors undertake the first study to look at the relationship between Web browsing behavior and election candidate preference.
  • State-level prediction is important in the U.S due to the winner-take-all nature of the electoral college system at the state level
  • Methods:

    The authors' first challenge is to extract a set of features that effectively captures user preference from the user browsing logs described in Section 3.
  • The frequency with which the user visited each one of the top 500 sites in the US, according to Alexa.5.
  • The authors considered this the baseline feature vector.
  • The frequency with which each user visited sites when referred by social media sites.
  • Visits to the top 100 most visited sites according to the dataset were considered
  • Results:

    The authors show the performance of the method in practice. The authors start by presenting how the authors split the data in order to avoid overfitting (in Section 5.1).
  • In Section 5.2, the authors discuss how the learning algorithms presented in Section 4 and how the state-level predictions compare to polls and actual election results.
  • Figure 4 shows how the authors divided the dataset for purposes of training and testing.
  • The authors retained data from the first week for training.
  • For the purpose of training the model, the authors used 538 polls from September 10, both for estimating B(r ) and tuning the per-state logistic regression thresholds
  • Conclusion:

    In this paper the authors have presented the first method that uses the history of user visits to Web sites to assess individual preference for political candidates.
  • The authors pinpoint the challenges to be overcome in realizing this goal, chief among them dealing with temporal and regional heterogeneity in user populations, as well as overcoming the lack of individual-level ground truth labels for training.
  • With respect to the latter, the authors develop a new method allowing them to train a user-level classifier using only aggregate data.
  • We show the power of using Web browsing behavior for assessing candidate preference, in terms of day-to-day and state-to-state level predictions that elucidate the impact of exogenous effects such as the ‘Comey letter.’
Tables
  • Table1: Overview of the Web browsing data that is the basis for our study. The data was provided by US participants in comScore’s global desktop user panel and was collected from September 9, 2016 to November 3, 2016
Download tables as Excel
Related work
  • 2.1 Public Opinion Research Methods

    Predicting elections requires insights into the citizenry’s future behaviors, which in turn depend on understanding its political attitudes and preferences. Surveys, or polls, have become the most important and prevalent method of gauging public opinion and behaviors since their emergence in the 1930s. In a recent review of public opinion surveys, Berinsky [3] notes early observations of the power of polls in the minds of both the public and elites [10, 50], but also the inherent difficulty of the task [4, 15, 22, 30].

    Indeed, in the last 75 years, researchers have come to understand the strengths and weaknesses of surveys, as well as the myriad of considerations necessary to properly use these instruments to understand political attitudes. Weisberg [51] summarizes the potential for error in surveys. Sources of errors include response accuracy (interviewer effects [e.g., 18, 53], question wording [see also 21], questionnaire issues [e.g., 11, 32, 41, 45, 49], and question nonresponse [e.g., 47]), respondent selection (unit nonresponse [e.g., 1, 14], sampling frames and error [e.g., 31, 36]), and survey administration (data editing, sensitive topics, and comparability effects).
Funding
  • This material is based upon work supported by NSF grants IIS1421759, CNS-1618207, and CNS-1703592 and by AFRL grant FA875012-2-0328
Reference
  • J Scott Armstrong and Terry S Overton. 1977. Estimating nonresponse bias in mail surveys. Journal of marketing research (1977), 396–402.
    Google ScholarLocate open access versionFindings
  • Eytan Bakshy, Solomon Messing, and Lada A Adamic. 2015. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 6239 (2015), 1130–1132.
    Google ScholarLocate open access versionFindings
  • Adam J Berinsky. 2017. Measuring Public Opinion with Surveys. Annual Review of Political Science 20 (2017), 309–329.
    Google ScholarLocate open access versionFindings
  • Herbert Blumer. 195What is wrong with social theory? American sociological review 19, 1 (1954), 3–10.
    Google ScholarFindings
  • James E Campbell, Helmut Norpoth, Alan I Abramowitz, Michael S Lewis-Beck, Charles Tien, Robert S Erikson, Christopher Wlezien, Brad Lockerbie, Thomas M Holbrook, Bruno Jerôme, et al. 2017. A recap of the 2016 election forecasts. PS: Political Science & Politics 50, 2 (2017), 331–338.
    Google ScholarLocate open access versionFindings
  • Jonathan Chang, Sean Gerrish, Chong Wang, Jordan L Boyd-Graber, and David M Blei. 2009. Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems. 288–296.
    Google ScholarLocate open access versionFindings
  • Raviv Cohen and Derek Ruths. 2013. Classifying political orientation on Twitter: It’s not easy!. In ICWSM.
    Google ScholarFindings
  • comScore, Inc. 201Panelist Privacy Statement. http://www.comscore.com/ About-comScore/Privacy.
    Findings
  • Michael D Conover, Bruno Gonçalves, Jacob Ratkiewicz, Alessandro Flammini, and Filippo Menczer. 2011. Predicting the political alignment of twitter users. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE, 192–199.
    Google ScholarLocate open access versionFindings
  • Philip E Converse. 1987. Changing conceptions of public opinion in the political process. The Public Opinion Quarterly 51 (1987), S12–S24.
    Google ScholarLocate open access versionFindings
  • Robert M Entman. 2007. Framing bias: Media in the distribution of power. Journal of communication 57, 1 (2007), 163–173.
    Google ScholarLocate open access versionFindings
  • Adam Fourney, Miklos Z Racz, Gireeja Ranade, Markus Mobius, and Eric Horvitz. 2017. Geographic and Temporal Trends in Fake News Consumption During the 2016 US Presidential Election. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 2071–2074.
    Google ScholarLocate open access versionFindings
  • Daniel Gayo Avello, Panagiotis T Metaxas, and Eni Mustafaraj. 2011. Limits of electoral predictions using twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • Andrew Gelman, Sharad Goel, Douglas Rivers, David Rothschild, et al. 2016. The mythical swing voter. Quarterly Journal of Political Science 11, 1 (2016), 103–130.
    Google ScholarLocate open access versionFindings
  • Benjamin Ginsberg. 1986. The captive public: How mass opinion promotes state power. New York: Basic Books.
    Google ScholarFindings
  • Jennifer Golbeck and Derek Hansen. 2014. A method for computing political preference among Twitter followers. Social Networks 36 (2014), 177–184.
    Google ScholarLocate open access versionFindings
  • Jeffrey Gottfried, Michael Barthel, Elisa Shearer, and Amy Mitchell. 2016. The 2016 presidential campaign–A news event that’s hard to miss. Pew Research Center 4 (2016).
    Google ScholarLocate open access versionFindings
  • Pamela Grimm. 2010. Social desirability bias. Wiley international encyclopedia of marketing (2010).
    Google ScholarFindings
  • Justin Grimmer. 2015. We are all social scientists now: how big data, machine learning, and causal inference work together. PS: Political Science & Politics 48, 1 (2015), 80–83.
    Google ScholarLocate open access versionFindings
  • Justin Grimmer and Brandon M Stewart. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis 21, 3 (2013), 267–297.
    Google ScholarLocate open access versionFindings
  • Robert M Groves, Floyd J Fowler Jr, Mick P Couper, James M Lepkowski, Eleanor Singer, and Roger Tourangeau. 2011. Survey methodology. Vol. 561. John Wiley & Sons.
    Google ScholarFindings
  • Susan Herbst. 1993. Numbered voices: How opinion polling has shaped American politics. University of Chicago Press.
    Google ScholarFindings
  • Daniel W Hill and Zachary M Jones. 2014. An empirical evaluation of explanations for state repression. American Political Science Review 108, 3 (2014), 661–687.
    Google ScholarLocate open access versionFindings
  • Kosuke Imai and Aaron Strauss. 2010. Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19, 1 (2010), 1–19.
    Google ScholarLocate open access versionFindings
  • Andreas Jungherr, Pascal Jürgens, and Harald Schoen. 2012. Why the pirate party won the german election of 2009 or the trouble with predictions: A response to tumasjan, a., sprenger, to, sander, pg, & welpe, im “predicting elections with twitter: What 140 characters reveal about political sentiment”. Social science computer review 30, 2 (2012), 229–234.
    Google ScholarLocate open access versionFindings
  • Aaron Kaufman. 2018. Estimating the Partisan Bias of Survey Questions. (2018). Working paper.
    Google ScholarLocate open access versionFindings
  • Aaron Kaufman, Peter Kraft, and Maya Sen. 2018. Improving Supreme Court Forecasting Using Boosted Decision Trees. (2018).
    Google ScholarFindings
  • Courtney Kennedy, Mark Blumenthal, Scott Clement, Joshua D. Clinton, Claire Durand, Charles Franklin, Kyley McGeeney, Lee Miringoff, Kristen Olson, Douglas Rivers, Lydia Saad, and G. Evans Wittand Christopher Wlezien. 2018. An Evaluation of the 2016 Election Polls in the United States. Public Opinion Quarterly (February 3 2018).
    Google ScholarLocate open access versionFindings
  • Ryan Kennedy, Stefan Wojcik, and David Lazer. 2017. Improving election prediction internationally. Science 355, 6324 (2017), 515–520.
    Google ScholarLocate open access versionFindings
  • Valdimar Orlando Key. 1961. Public opinion and American democracy. (1961).
    Google ScholarFindings
  • L Kish. 1965. Survey Sampling. Wiley, New York.
    Google ScholarFindings
  • Jon A Krosnick, Neil Malhotra, and Urja Mittal. 2014. Public misunderstanding of political facts: How question wording affected estimates of partisan differences in birtherism. Public opinion quarterly 78, 1 (2014), 147–165.
    Google ScholarLocate open access versionFindings
  • Hendrik Kück and Nando de Freitas. 2012. Learning about individuals from group statistics. CoRR abs/1207.1393 (2012).
    Findings
  • Huyen T Le, GR Boynton, Yelena Mejova, Zubair Shafiq, and Padmini Srinivasan. 2017. Revisiting The American Voter on Twitter. In Proceedings of the 2017 CHI
    Google ScholarLocate open access versionFindings
  • Michael S Lewis-Beck. 2005. Election forecasting: principles and practice. The British Journal of Politics & International Relations 7, 2 (2005), 145–164.
    Google ScholarLocate open access versionFindings
  • S Lohr. 1999. Sampling: Design and Analysis. Number v. 1 in Sampling: Design and Analysis. Duxbury Press.
    Google ScholarFindings
  • Jacob M Montgomery and Santiago Olivella. 2016. Tree-based models for political science data. American Journal of Political Science (2016).
    Google ScholarLocate open access versionFindings
  • Brendan O’Connor, Ramnath Balasubramanyan, Bryan R Routledge, Noah A Smith, et al. 2010. From tweets to polls: Linking text sentiment to public opinion time series. Icwsm 11, 122-129 (2010), 1–2.
    Google ScholarLocate open access versionFindings
  • Giorgio Patrini, Richard Nock, Paul Rivera, and Tiberio Caetano. 2014. (Almost)
    Google ScholarFindings
  • Marco Pennacchiotti and Ana-Maria Popescu. 2011. A Machine Learning Approach to Twitter User Classification. Icwsm 11, 1 (2011), 281–288.
    Google ScholarLocate open access versionFindings
  • Philip M Podsakoff, Scott B MacKenzie, Jeong-Yeon Lee, and Nathan P Podsakoff.
    Google ScholarFindings
  • 2003. Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of applied psychology 88, 5 (2003), 879.
    Google ScholarLocate open access versionFindings
  • [42] Novi Quadrianto, Alex J. Smola, Tiberio S. Caetano, and Quoc V. Le. 2008. Estimating Labels from Label Proportions. In International Conference on Machine
    Google ScholarLocate open access versionFindings
  • [43] Kevin M Quinn, Burt L Monroe, Michael Colaresi, Michael H Crespin, and Dragomir R Radev. 2010. How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54, 1 (2010), 209–228.
    Google ScholarLocate open access versionFindings
  • [44] Nate Silver. 2016. https://fivethirtyeight.com/features/
    Findings
  • [45] Paul M Sniderman and Sean M Theriault. 2004. The structure of political argument and the logic of issue framing. Studies in public opinion: Attitudes, nonattitudes, measurement error, and change (2004), 133–65.
    Google ScholarLocate open access versionFindings
  • [46] Stefan Stieglitz and Linh Dang-Xuan. 2013. Social media and political communication: a social media analytics framework. Social Network Analysis and Mining 3, 4 (2013), 1277–1291.
    Google ScholarLocate open access versionFindings
  • [47] Roger Tourangeau and Ting Yan. 2007. Sensitive questions in surveys. Psychological bulletin 133, 5 (2007), 859.
    Google ScholarLocate open access versionFindings
  • [48] Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe. 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment.. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media.
    Google ScholarLocate open access versionFindings
  • [49] Amos Tversky and Daniel Kahneman. 1981. The framing of decisions and the psychology of choice. Science 211, 4481 (1981), 453–458.
    Google ScholarLocate open access versionFindings
  • [50] Sidney Verba. 1996. The citizen as respondent: sample surveys and American democracy presidential address, American Political Science Association, 1995. American Political Science Review 90, 1 (1996), 1–7.
    Google ScholarLocate open access versionFindings
  • [51] Herbert F Weisberg. 2009. The total survey error approach: A guide to the new science of survey research. University of Chicago Press.
    Google ScholarFindings
  • [52] Felix X. Yu, Dong Liu, Sanjiv Kumar, Tony Jebara, and Shih-Fu Chang. 2013.
    Google ScholarFindings
  • [53] Daniel John Zizzo. 2010. Experimenter demand effects in economic experiments. Experimental Economics 13, 1 (2010), 75–98.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments