Explicit factor models for explainable recommendation based on phrase-level sentiment analysis

SIGIR, pp. 83-92, 2014.

Cited by: 379|Bibtex|Views142|Links
EI
Keywords:
collaborative filteringrecommendation explanationinformation filteringrecommender systemsnatural language processingMore(1+)
Weibo:
We propose to leverage phrase-level sentiment analysis of user reviews for personalized recommendation

Abstract:

Collaborative Filtering(CF)-based recommendation algorithms, such as Latent Factor Models (LFM), work well in terms of prediction accuracy. However, the latent features make it difficulty to explain the recommendation results to the users. Fortunately, with the continuous growth of online user reviews, the information available for traini...More

Code:

Data:

0
Introduction
  • In the last few years, researchers have found or argued that explanations in recommendation systems could be very beneficial.
  • By explaining how the system works and/or why a product is recommended, the system becomes more transparent and has the potential to allow users to tell when the system is wrong, increase users’ confidence or trust in the system, help users make better and faster decisions, convince users to try or buy, or increase the ease of the user enjoyment.
  • Lack of explainability weakens the ability to persuade users and help users make better decisions in practical systems [39]
Highlights
  • In the last few years, researchers have found or argued that explanations in recommendation systems could be very beneficial
  • We mainly focus on the following research questions: 1) How do users care about the various explicit product features extracted from reviews, and 2) What is the performance of the Explicit Factor Model framework in both the task of rating prediction and the more practical task of top-K recommendation
  • MP SO negative Matrix Factorization BPRMF Hidden Factors as Topics Explicit Factor Model Normalized Discounted Cumulative Gain 0.244 0.212 0.216 0.238 0.261 k = 50 0.284 Area Under the ROC Curve 0.837 0.785 0.832 0.856 0.873 k = 50 0.884 validation is used for parameter tuning and performance evaluation
  • We study how the performance (NDCG and Area Under the ROC Curve) changes as k increases from 5 to the maximum value possible (96 for Yelp10 and 113 for Dianping), and the results on Yelp10 are shown in Figure 6
  • We propose to leverage phrase-level sentiment analysis of user reviews for personalized recommendation
  • Our analysis shows that different users could focus on different product aspects, and our experiments suggest that the size of the underlying feature space that users care about varies for different users, domains and countries
Methods
  • The authors study how the performance (NDCG and AUC) changes as k increases from 5 to the maximum value possible (96 for Yelp10 and 113 for Dianping), and the results on Yelp10 are shown in Figure 6
  • It shows that the performance of EFM continues to rise with the increase of k until around 15, and tends to be stable before it begins to drop when k = 45.
  • The standard deviations of both NDCG and AUC in fivefold cross-validation for each baseline algorithm and on each experimental point of EFM are ≤ 0.006
Results
  • The authors study how the performance (NDCG and AUC) changes as k increases from 5 to the maximum value possible (96 for Yelp10 and 113 for Dianping), and the results on Yelp10 are shown in Figure 6
  • It shows that the performance of EFM continues to rise with the increase of k until around 15, and tends to be stable before it begins to drop when k = 45.
Conclusion
  • CONCLUSIONS AND FUTURE WORK

    In this paper, the authors propose to leverage phrase-level sentiment analysis of user reviews for personalized recommendation.
  • The authors' analysis shows that different users could focus on different product aspects, and the experiments suggest that the size of the underlying feature space that users care about varies for different users, domains and countries
  • Both online and offline experiments show that the framework compares favourably with baseline methods in three tasks: rating prediction, top-K recommendation, and explanation based user persuasion.
  • The authors focused on the persuasiveness in explanation generation and experimental design, while it is worth studying other utilities of explanations and generating explanations automatically to optimize one or a combination of the utilities in the future
Summary
  • Introduction:

    In the last few years, researchers have found or argued that explanations in recommendation systems could be very beneficial.
  • By explaining how the system works and/or why a product is recommended, the system becomes more transparent and has the potential to allow users to tell when the system is wrong, increase users’ confidence or trust in the system, help users make better and faster decisions, convince users to try or buy, or increase the ease of the user enjoyment.
  • Lack of explainability weakens the ability to persuade users and help users make better decisions in practical systems [39]
  • Methods:

    The authors study how the performance (NDCG and AUC) changes as k increases from 5 to the maximum value possible (96 for Yelp10 and 113 for Dianping), and the results on Yelp10 are shown in Figure 6
  • It shows that the performance of EFM continues to rise with the increase of k until around 15, and tends to be stable before it begins to drop when k = 45.
  • The standard deviations of both NDCG and AUC in fivefold cross-validation for each baseline algorithm and on each experimental point of EFM are ≤ 0.006
  • Results:

    The authors study how the performance (NDCG and AUC) changes as k increases from 5 to the maximum value possible (96 for Yelp10 and 113 for Dianping), and the results on Yelp10 are shown in Figure 6
  • It shows that the performance of EFM continues to rise with the increase of k until around 15, and tends to be stable before it begins to drop when k = 45.
  • Conclusion:

    CONCLUSIONS AND FUTURE WORK

    In this paper, the authors propose to leverage phrase-level sentiment analysis of user reviews for personalized recommendation.
  • The authors' analysis shows that different users could focus on different product aspects, and the experiments suggest that the size of the underlying feature space that users care about varies for different users, domains and countries
  • Both online and offline experiments show that the framework compares favourably with baseline methods in three tasks: rating prediction, top-K recommendation, and explanation based user persuasion.
  • The authors focused on the persuasiveness in explanation generation and experimental design, while it is worth studying other utilities of explanations and generating explanations automatically to optimize one or a combination of the utilities in the future
Tables
  • Table1: Table of notations in the framework
  • Table2: Statistics of Yelp and Dianping datasets
  • Table3: Some statistics and evaluation results of the sentiment lexicons, where ‘F,O,S’ are for feature word, opinion word and sentiment respectively, and
  • Table4: Sampled entries from the Yelp dataset
  • Table5: Top-5 frequent features of the 5 clusters
  • Table6: Top-K recommendation results on Dianping dataset, where the result listed for EFM is the best performance with the corresponding k
  • Table7: Key statistics of synonym clusters
  • Table8: Word clusters of the top 15 features
  • Table9: The number of browsing records, clicks and click through rate for the three user types
  • Table10: The overall confusion matrix corresponding to the 1328 common items of A (with explanations) and B (without explanations) users
Download tables as Excel
Related work
  • With the ability to take advantage of the wisdom of crowds, Collaborative Filtering (CF) [33] techniques have achieved great success in personalized recommender systems, especially in rating prediction tasks. Recently, Latent Factor Models (LFM) based on Matrix Factorization (MF) [14] techniques have gained great popularity as they usually outperform traditional methods and have achieved state-of-theart performance in some benchmark datasets [33]. Various MF algorithms have been proposed in different problem settings, such as Singular Value Decomposition (SVD) [14, 32], Non-negative Matrix Factorization (NMF) [15], Max-Margin Matrix Factorization (MMMF) [29], Probabilistic Matrix Factorization (PMF) [30], and Localized Matrix Factorization (LMF) [45, 44]. They aim at learning latent factors from user-item rating matrices to make rating predictions, based on which to generate personalized recommendations. However, their latent characteristic makes it difficult to make recommendations in situations where we know a user cares Recommend

    Users pay a4en
    Items perform well on different features

    Ba4ery OS Color Memory Earphone Price Screen Service

    Brand about certain particular product features. Further more, it is also difficult to generate intuitional explanations for the recommendation results. Besides, the frequently used metrics such as RMSE and MAE do not necessarily have direct relationship with the performance in practical top-K recommendation scenarios [4].
Funding
  • Part of this work was supported by Chinese Natural Science Foundation (60903107, 61073071) and National High Technology Research and Development Program (2011AA01 A205), and the fourth author is sponsored by the National Science Foundation (IIS-0713111)
Reference
  • S. Aciar, D. Zhang, S. Simoff, and J. Debenham. Informed Recommender: Basing Recommendations on Consumer Product Reviews. Intelligent Systems, 22(3):39–47, 2007.
    Google ScholarLocate open access versionFindings
  • M. Bilgic and R. J. Mooney. Explaining Recommendations: Satisfaction vs. Promotion. IUI, 2005.
    Google ScholarLocate open access versionFindings
  • H. Cramer, V. Evers, S. Ramlal, M. van Someren, et al. The Effects of Transparency on Trust in and Acceptance of a Content-Based Art Recommender. User Modeling and User-Adapted Interaction, 18(5):455–496, 2008.
    Google ScholarLocate open access versionFindings
  • P. Cremonesi, Y. Koren, and R. Turrin. Performance of Recommender Algorithms on Top-N Recommendation Tasks. RecSys, pages 39–46, 2010.
    Google ScholarLocate open access versionFindings
  • C. Ding, T. Li, W. Peng, and H. Park. Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. KDD, pages 126–135, 2006.
    Google ScholarLocate open access versionFindings
  • X. Ding, B. Liu, and P. S. Yu. A Holistic Lexicon Based Approach to Opinion Mining. WSDM, 2008.
    Google ScholarLocate open access versionFindings
  • G. Ganu, N. Elhadad, and A. Marian. Beyond the Stars: Improving Rating Predictions using Review Text Content. WebDB, 2009.
    Google ScholarLocate open access versionFindings
  • X. He, M. Gao, M. Kan, Y. Liu, and K. Sugiyama. Predicting the Popularity of Web 2.0 Items based on User Comments. SIGIR, 2014.
    Google ScholarLocate open access versionFindings
  • X. He, M. Kan, P. Xie, and X. Chen. Comment-based Multi-View Clustering of Web 2.0 Items. WWW, 2014.
    Google ScholarFindings
  • J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. CSCW, 2000.
    Google ScholarLocate open access versionFindings
  • M. Hu and B. Liu. Mining and Summarizing Customer Reviews. KDD, pages 168–177, 2004.
    Google ScholarLocate open access versionFindings
  • N. Jakob, S. H. Weber, M. C. Muller, et al. Beyond the Stars: Exploiting Free-Text User Reviews to Improve the Accuracy of Movie Recommendations. TSA, 2009.
    Google ScholarLocate open access versionFindings
  • C. W. ki Leung, S. C. fai Chan, and F. lai Chung. Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach. ECAI, 2006.
    Google ScholarLocate open access versionFindings
  • Y. Koren, R. Bell, and C. Volinsky. Matrix Factorization Techniques for Recommender Systems. Computer, 2009.
    Google ScholarLocate open access versionFindings
  • D. D. Lee and H. S. Seung. Algorithms for Non-negative Matrix Factorization. Proc. NIPS, 2001.
    Google ScholarLocate open access versionFindings
  • D. Lemire and A. Maclachlan. Slope One Predictors for Online Rating-Based Collaborative Filtering. SDM, 2005.
    Google ScholarLocate open access versionFindings
  • B. Liu, M. Hu, and J. Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web. WWW, 2005.
    Google ScholarLocate open access versionFindings
  • B. Liu and L. Zhang. A Survey of Opinion Mining and Sentiment Analysis. Jour. Mining Text Data, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Lu, M. Castellanos, U. Dayal, and C. Zhai. Automatic construction of a context-aware sentiment lexicon: An optimization approach. WWW, 2011.
    Google ScholarLocate open access versionFindings
  • J. McAuley and J. Leskovec. Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text. RecSys, pages 165–172, 2013.
    Google ScholarLocate open access versionFindings
  • C. Musat, Y. Liang, and B. Faltings. Recommendation Using Textual Opinions. IJCAI, 2013.
    Google ScholarLocate open access versionFindings
  • T. Nakagawa, K. Inui, and S. Kurohashi. Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables. NAACL, 2010.
    Google ScholarLocate open access versionFindings
  • M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Review, 2004.
    Google ScholarLocate open access versionFindings
  • B. Pang and L. Lee. Opinion Mining and Sentiment Analysis. Found. & Trends in Info. Retr., 2(1-2), 2008.
    Google ScholarLocate open access versionFindings
  • B. Pang, L. Lee, et al. Thumbs up? sentiment classification using machine learning techniques. EMNLP, 2002.
    Google ScholarLocate open access versionFindings
  • N. Pappas and A. P. Belis. Sentiment Analysis of User Comments for One-Class Collaborative Filtering over TED Talks. SIGIR, pages 773–776, 2013.
    Google ScholarLocate open access versionFindings
  • S. Pero and T. Horvath. Opinion-Driven Matrix Factorization for Rating Prediction. UMAP, 2013.
    Google ScholarLocate open access versionFindings
  • S. Rendle, C. Freudenthaler, et al. BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI, 2009.
    Google ScholarFindings
  • J. Rennie and N. Srebro. Fast Maximum Margin Matrix Factorization for Collaborative Prediction. ICML, 2005.
    Google ScholarLocate open access versionFindings
  • R. Salakhutdinov and A. Mnih. Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. Proc. ICML, 2008.
    Google ScholarLocate open access versionFindings
  • A. Sharma and D. Cosley. Do Social Explanations Work? Studying and Modeling the Effects of Social Explanations in Recommender Systems. WWW, 2013.
    Google ScholarLocate open access versionFindings
  • N. Srebro and T. Jaakkola. Weighted Low-rank Approximations. Proc. ICML, pages 720–727, 2003.
    Google ScholarLocate open access versionFindings
  • X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering techniques. Advanc. in AI, 2009.
    Google ScholarLocate open access versionFindings
  • M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. Lexicon-Based Methods for Sentiment Analysis. Computational Linguastics, 37(2), 2011.
    Google ScholarLocate open access versionFindings
  • Y. Tan, Y. Zhang, M. Zhang, Y. Liu, and S. Ma. A Unified Framework for Emotional Elements Extraction based on Finite State Matching Machine. NLPCC, 400:60–71, 2013.
    Google ScholarLocate open access versionFindings
  • M. Terzi, M. A. Ferrario, and J. Whittle. Free text in user reviews: Their role in recommender systems. RecSys, 2011.
    Google ScholarLocate open access versionFindings
  • N. Tintarev and J. Masthoff. A Survey of Explanations in Recommender Systems. ICDE, 2007.
    Google ScholarFindings
  • N. Tintarev and J. Masthoff. Designing and Evaluating Explanations for Recommender Systems. Recommender Systems Handbook, pages 479–510, 2011.
    Google ScholarLocate open access versionFindings
  • J. Vig, S. Sen, and J. Riedl. Tagsplanations: Explaining Recommendations Using Tags. IUI, 2009.
    Google ScholarLocate open access versionFindings
  • J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. LREC, 2005.
    Google ScholarLocate open access versionFindings
  • T. Wilson, J. Wiebe, et al. Recognizing contextual polarity in phrase-level sentiment analysis. EMNLP, 2005.
    Google ScholarLocate open access versionFindings
  • A. Yessenalina, Y. Yue, et al. Multi-level structured models for document-level sentiment classification. EMNLP, 2010.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, H. Zhang, M. Zhang, Y. Liu, et al. Do Users Rate or Review? Boost Phrase-level Sentiment Labeling with Review-level Sentiment Classification. SIGIR, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, M. Zhang, Y. Liu, and S. Ma. Improve Collaborative Filtering Through Bordered Block Diagonal Form Matrices. SIGIR, 2013.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, M. Zhang, Y. Liu, S. Ma, and S. Feng. Localized Matrix Factorization for Recommendation based on Matrix Block Diagonal Forms. WWW, 2013.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments