SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    EPJ Data Sci., Volume 5, Issue 1, 2016, Pages 23

    Cited by: 0|Bibtex|Views20|Links
    EI
    Keywords:
    sentiment analysis benchmark methods evaluation
    Wei bo:
    6 Concluding remarks Recent efforts to analyze the moods embedded in Web. content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology

    Abstract:

    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learn...More

    Code:

    Data:

    0
    Introduction
    • Sentiment analysis has become an extremely popular tool, applied in several analytical domains, especially on the Web and social media.
    • Sentiment analysis can provide analytical perspectives for financial investors who want to discover and respond to market opinions [ , ].
    • Another important set of applications is in politics, where marketing campaigns are interested in tracking sentiments expressed by voters associated with candidates [ ].
    • Lexical-based methods do not rely on labeled data, it is hard to create a unique lexical-based dictionary to be used for all different contexts
    Highlights
    • Sentiment analysis has become an extremely popular tool, applied in several analytical domains, especially on the Web and social media
    • 6 Concluding remarks Recent efforts to analyze the moods embedded in Web . content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology
    • Several of these methods became widely used in their knowledge fields and have been applied as tools to quantify moods in the context of unstructured short messages in online social networks
    • We present a thorough comparison of twenty-four popular sentence-level sentiment analysis methods using gold standard datasets that span different types of data sources
    • We highlight that our results identified a few methods able to appear among the best ones for different datasets, we noted that the overall prediction performance still left a lot of space for improvements
    • We show that the prediction performance of methods vary largely across datasets
    Methods
    • Methods not included

      Despite the effort to include in the comparison most of the highly cited and important methods the authors could not include a few of them for different reasons.
    • NRC SVM [ ] is not available as well, the lexical resources used by the authors are available and were considered in the evaluation resulting in the methods: NRC Hashtag and Sentiment.
    • The authors can note that the performance of the evaluated methods are ok, but there is a lot of space for improvements.
    • Only for methods with low coverage in the -class ex-.
    • 3-classes Pos Method Mean Rank.
    • 2-classes Pos Method 4.00 (4.17) SentiStrength Sentiment140 Opinion Lexicon Opinion Finder SentiWordNet
    • If the authors look at the Macro-F values only for the best method on each dataset, the authors can note that the overall prediction performance of the methods is still low i.e. Macro-F values are around . only for methods with low coverage in the -class ex-
    Conclusion
    • Concluding remarks

      Recent efforts to analyze the moods embedded in Web . content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology.
    • Content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology.
    • Several of these methods became widely used in their knowledge fields and have been applied as tools to quantify moods in the context of unstructured short messages in online social networks.
    • The authors' effort quantifies the prediction performance of the twenty-four popular sentiment analysis methods across eighteen datasets for two tasks: differentiating two classes and three classes.
    Summary
    • Introduction:

      Sentiment analysis has become an extremely popular tool, applied in several analytical domains, especially on the Web and social media.
    • Sentiment analysis can provide analytical perspectives for financial investors who want to discover and respond to market opinions [ , ].
    • Another important set of applications is in politics, where marketing campaigns are interested in tracking sentiments expressed by voters associated with candidates [ ].
    • Lexical-based methods do not rely on labeled data, it is hard to create a unique lexical-based dictionary to be used for all different contexts
    • Methods:

      Methods not included

      Despite the effort to include in the comparison most of the highly cited and important methods the authors could not include a few of them for different reasons.
    • NRC SVM [ ] is not available as well, the lexical resources used by the authors are available and were considered in the evaluation resulting in the methods: NRC Hashtag and Sentiment.
    • The authors can note that the performance of the evaluated methods are ok, but there is a lot of space for improvements.
    • Only for methods with low coverage in the -class ex-.
    • 3-classes Pos Method Mean Rank.
    • 2-classes Pos Method 4.00 (4.17) SentiStrength Sentiment140 Opinion Lexicon Opinion Finder SentiWordNet
    • If the authors look at the Macro-F values only for the best method on each dataset, the authors can note that the overall prediction performance of the methods is still low i.e. Macro-F values are around . only for methods with low coverage in the -class ex-
    • Conclusion:

      Concluding remarks

      Recent efforts to analyze the moods embedded in Web . content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology.
    • Content have adopted various sentiment analysis methods, which were originally developed in linguistics and psychology.
    • Several of these methods became widely used in their knowledge fields and have been applied as tools to quantify moods in the context of unstructured short messages in online social networks.
    • The authors' effort quantifies the prediction performance of the twenty-four popular sentiment analysis methods across eighteen datasets for two tasks: differentiating two classes and three classes.
    Tables
    • Table1: Overview of the sentence-level methods available in the literature. Continued)
    • Table2: Table 2
    • Table3: Labeled datasets
    • Table4: Confusion matrix for experiments with three classes
    • Table5: Confusion matrix for experiments with two classes
    • Table6: Table 6
    • Table7: Table 7
    • Table8: Mean rank table for all datasets
    • Table9: Friedman’s test results
    • Table10: Best method for each dataset - 2-class experiments
    • Table11: Best method for each dataset - 3-class experiments
    • Table12: Contexts’ groups
    • Table13: Mean rank table for datasets of social networks
    • Table14: Mean rank table for datasets of comments
    • Table15: Mean rank table for datasets of reviews
    • Table16: Friedman’s test results per contexts
    Download tables as Excel
    Funding
    • This work was partially funded by projects InWeb (grant MCT/CNPq 573871/2008-6) and MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), and by the authors’ individual grants from CNPq, CAPES and FAPEMIG. Endnotes a https://www.google.com/trends/explore#q=sentiment%20analysis. b Except for paid methods. c http://www.ifeel.dcc.ufmg.br. d http://www.nltk.org/_modules/nltk/sentiment/vader.html. e http://mpqa.cs.pitt.edu/opinionfinder/. Received: 3 February 2016 Accepted: 19 June 2016
    Reference
    • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82-89. doi:10.1145/2436256.2436274
      Locate open access versionFindings
    • Hu M, Liu B (2004) Mining and summarizing customer reviews. In: KDD’04, pp 168-177. http://doi.acm.org/10.1145/1014052.1014073 3.
      Locate open access versionFindings
    • Oliveira N, Cortez P, Areal N (2013) On the predictability of stock market behavior using stocktwits sentiment and posting volume. In: Progress in artificial intelligence. Lecture notes in computer science, vol 8154.
      Google ScholarLocate open access versionFindings
    • Bollen J, Mao H, Zeng X-J (2010) Twitter mood predicts the stock market. arXiv:1010.3003 5.
      Findings
    • Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: 4th international AAAI conference on weblogs and social media (ICWSM) 6.
      Google ScholarLocate open access versionFindings
    • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP ’02), pp 79-86 7.
      Google ScholarLocate open access versionFindings
    • Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24-54 8.
      Google ScholarLocate open access versionFindings
    • Gonçalves P, Benevenuto F, Cha M (2013) PANAS-t: a pychometric scale for measuring sentiments on Twitter. arXiv:1308.1857v1 9.
      Findings
    • Bollen J, Pepe A, Mao H (2009) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. arXiv:0911.1583 10.
      Findings
    • Kramer ADI, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci USA 111(24):8788-8790. doi:10.1073/pnas.1320040111
      Locate open access versionFindings
    • Thelwall M (2013) Heart and soul: sentiment strength detection in the social web with SentiStrength. http://sentistrength.wlv.ac.uk/documentation/SentiStrengthChapter.pdf
      Findings
    • Reis J, Goncalves P, Vaz de Melo P, Prates R, Benevenuto F (2014) Magnet news: you choose the polarity of what you read. In: 8th international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • Reis J, Benevenuto F, Vaz de Melo P, Prates R, Kwak H, An J (2015) Breaking the news: first impressions matter on online news. In: 9th international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • Tamersoy A, De Choudhury M, Chau DH (2015) Characterizing smoking and drinking abstinence from social media. In: Proceedings of the 26th ACM conference on hypertext and social media (HT)
      Google ScholarLocate open access versionFindings
    • Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: 8th international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67-82 17. Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Min Knowl Discov 24(3):478-514.
      Google ScholarLocate open access versionFindings
    • doi:10.1007/s10618-011-0238-6
      Findings
    • Levallois C (2013) Umigon: sentiment analysis for tweets based on terms lists and heuristics. In: The second joint conference on lexical and computational semantics (*SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 414-417. http://www.aclweb.org/anthology/S13-2068 19. Abbasi A, Hassan A, Dhar M (2014) Benchmarking Twitter sentiment analysis tools. In: 9th international conference on language resources and evaluation (LREC) 20.
      Locate open access versionFindings
    • Gonçalves P, Araujo M, Benevenuto F, Cha M (2013) Comparing and combining sentiment analysis methods. In: Proceedings of the 1st ACM conference on online social networks (COSN’13) 21.
      Google ScholarLocate open access versionFindings
    • Araujo M, Diniz JP, Bastos L, Soares E, Júnior M, Ferreira M, Ribeiro F, Benevenuto F (2016) iFeel 2.0: a multilingual benchmarking system for sentence-level sentiment analysis. In: 10th international AAAI conference on weblogs and social media (ICWSM) 22.
      Google ScholarLocate open access versionFindings
    • Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: HLT/EMNLP on interactive demonstrations, pp 34-35 23.
      Google ScholarLocate open access versionFindings
    • Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT ’05), pp 347-354 24.
      Google ScholarLocate open access versionFindings
    • Esuli A, Sebastiani F (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: 5th international conference on language resources and evaluation (LREC), pp 417-422 25.
      Google ScholarLocate open access versionFindings
    • Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: 7th international conference on language resources and evaluation (LREC), pp 2200-2204 26.
      Google ScholarLocate open access versionFindings
    • Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39-41 27.
      Google ScholarLocate open access versionFindings
    • Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision 28.
      Google ScholarFindings
    • Cambria E, Olsher D, Rajagopal D (2014) SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: 28th AAAI conference on artificial intelligence, pp 1515-1521 29.
      Google ScholarLocate open access versionFindings
    • Nielsen F (2011) A new ANEW: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 30.
      Findings
    • Bradley MM, Lang PJ (1999) Affective norms for English words (ANEW): stimuli, instruction manual, and affective ratings. Technical report, Center for Research in Psychophysiology, University of Florida, Gainesville, FL 31.
      Google ScholarFindings
    • Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267-307 32.
      Google ScholarLocate open access versionFindings
    • Hannak A, Anderson E, Barrett LF, Lehmann S, Mislove A, Riedewald M (2012) Tweetin’ in the rain: exploring societal-scale effects of weather on mood. In: 6th international AAAI conference on weblogs and social media (ICWSM) 33.
      Google ScholarLocate open access versionFindings
    • Mohammad S (2012) #emotional tweets. In: The first joint conference on lexical and computational semantics volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation (SemEval 2012), pp 246-255. http://www.aclweb.org/anthology/S12-1033 34. De Smedt T, Daelemans W (2012) Pattern for Python. J Mach Learn Res 13(1):2063-2067 35.
      Locate open access versionFindings
    • Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S (2012) A system for real-time Twitter sentiment analysis of 2012 U.S. presidential election cycle. In: ACL system demonstrations, pp 115-120 36.
      Google ScholarFindings
    • Watson D, Clark L (1985) Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol 54(1):1063-1070 37.
      Google ScholarLocate open access versionFindings
    • Mohammad S, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436-465 38.
      Google ScholarLocate open access versionFindings
    • Plutchik R (1980) A general psychoevolutionary theory of emotion. Academic Press, New York, pp 3-33 39.
      Google ScholarFindings
    • Pappas N, Katsimpras G, Stamatatos E (2013) Distinguishing the popularity between topics: a system for up-to-date opinion retrieval and mining in the web. In: 14th international conference on intelligent text processing and computational linguistics 40.
      Google ScholarFindings
    • Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the seventh international workshop on semantic evaluation exercises (SemEval 2013) 41.
      Google ScholarLocate open access versionFindings
    • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP ’13), pp 1631-1642 42.
      Google ScholarLocate open access versionFindings
    • Warriner AB, Kuperman V, Brysbaert M (2013) Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav Res Methods 45(4):1191-1207 43.
      Google ScholarLocate open access versionFindings
    • Brysbaert M, New B (2009) Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav Res Methods 41(4):977-990 44.
      Google ScholarLocate open access versionFindings
    • Lexalytics (2015) Sentiment extraction - measuring the emotional tone of content. Technical report, Lexalytics 45.
      Google ScholarFindings
    • Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165-210 46.
      Google ScholarLocate open access versionFindings
    • Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: a computer approach to content analysis. MIT Press, Cambridge 47.
      Google ScholarFindings
    • Biever C (2010) Twitter mood maps reveal emotional states of America. New Sci 207(2771):14
      Google ScholarLocate open access versionFindings
    • 48. Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: 5th international conference on language resources and evaluation (LREC), pp 427-432
      Google ScholarLocate open access versionFindings
    • 49. Mohammad S, Dunne C, Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP ’09), pp 599-608. http://dl.acm.org/citation.cfm?id=1699571.1699591
      Locate open access versionFindings
    • 50. Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: 5th international conference on language resources and evaluation (LREC), pp 427-432
      Google ScholarLocate open access versionFindings
    • 51. Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: 4th international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • 52. Strapparava C, Mihalcea R (2007) SemEval-2007 task 14: affective text. In: Proceedings of the 4th international workshop on semantic evaluations (SemEval ’07), pp 70-74. http://dl.acm.org/citation.cfm?id=1621474.1621487
      Locate open access versionFindings
    • 53. Nakov P, Kozareva Z, Ritter A, Rosenthal S, Stoyanov V, Wilson T (2013) SemEval-2013 task 2: sentiment analysis in Twitter. In: The second joint conference on lexical and computational semantics (*SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 312-320
      Google ScholarLocate open access versionFindings
    • 54. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting of the Association for Computational Linguistics, pp 271-278
      Google ScholarLocate open access versionFindings
    • 55. Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. In: AAAI fall symposium series
      Google ScholarFindings
    • 56. Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, vol 5(1). doi:10.2200/s00416ed1v01y201204hlt016
      Locate open access versionFindings
    • 57. Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs. In: 1st international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • 58. Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the OMG! In: 5th international AAAI conference on weblogs and social media (ICWSM)
      Google ScholarLocate open access versionFindings
    • 59. Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1555-1565
      Google ScholarLocate open access versionFindings
    • 60. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 655-665
      Google ScholarLocate open access versionFindings
    • 61. Johnson R, Zhang T (2015) Effective use of word order for text categorization with convolutional neural networks. In: Human language technologies: the 2015 annual conference of the North American chapter of the ACL, pp 103-112.
      Google ScholarLocate open access versionFindings
    • 62. Valitutti R (2004) WordNet-affect: an affective extension of WordNet. In: 4th international conference on language resources and evaluation (LREC), pp 1083-1086
      Google ScholarLocate open access versionFindings
    • 63. Dodds PS, Danforth CM (2009) Measuring the happiness of large-scale written expression: songs, blogs, and presidents. J Happiness Stud 11(4):441-456. doi:10.1007/s10902-009-9150-9
      Locate open access versionFindings
    • 64. Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 conference on empirical methods in natural language processing (EMNLP ’08)
      Google ScholarLocate open access versionFindings
    • 65. Pappas N, Popescu-Belis A (2013) Sentiment analysis of user comments for one-class collaborative filtering over TED talks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 773-776
      Google ScholarLocate open access versionFindings
    • 66. Diakopoulos NA, Shamma DA (2010) Characterizing debate performance via aggregated Twitter sentiment. In: Proceedings of the 28th international conference on human factors in computing systems, pp 1195-1198
      Google ScholarLocate open access versionFindings
    • 67. Narr S, Hülfenhaus M, Albayrak S (2012) Language-independent Twitter sentiment analysis. In: Workshop on knowledge discovery, data mining and machine learning (KDML-2012)
      Google ScholarFindings
    • 68. Aisopos F (2014) Manually annotated sentiment analysis Twitter dataset NTUA. www.grid.ece.ntua.gr 69. Sanders N (2011) Twitter sentiment corpus by Niek Sanders. http://www.sananalytics.com/lab/twitter-sentiment/70.
      Locate open access versionFindings
    • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159-174 71.
      Google ScholarLocate open access versionFindings
    • Berenson ML, Levine DM, Szabat KA (2014) In: Basic business statistics - concepts and applications, 13th edn. Pearson
      Google ScholarFindings
    • Jain R (1991) The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling. Wiley, New York 73.
      Google ScholarFindings
    • Garcia D, Garas A, Schweitzer F (2012) Positive words carry less information than negative words. EPJ Data Sci 1(1):3 74.
      Google ScholarLocate open access versionFindings
    • Dodds PS, Clark EM, Desu S, Frank MR, Reagan AJ, Williams JR, Mitchell L, Harris KD, Kloumann IM, Bagrow JP, Megerdoomian K, McMahon MT, Tivnan BF, Danforth CM (2015) Human language reveals a universal positivity bias. Proc Natl Acad Sci USA 112(8):2389-2394. doi:10.1073/pnas.1411678112 http://www.pnas.org/content/112/8/2389.full.pdf
      Locate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments