FRAUDAR: Bounding Graph Fraud in the Face of Camouflage

    KDD, 2016.

    Cited by: 109|Bibtex|Views78|Links
    EI
    Keywords:
    detection methodsingular value decompositionsocial networkfake reviewreal worldMore(5+)
    Wei bo:
    We propose FRAUDAR, a fraud detection algorithm which provably bounds the amount of fraud adversaries can have, even in face of camouflage

    Abstract:

    Given a bipartite graph of users and the products that they review, or followers and followees, how can we detect fake reviews or follows? Existing fraud detection methods (spectral, etc.) try to identify dense subgraphs of nodes that are sparsely connected to the remaining graph. Fraudsters can evade these methods using camouflage, by ad...More

    Code:

    Data:

    0
    Introduction
    • How can the authors detect if a politician has purchased fake followers on Twitter, or if a product’s reviews on Amazon are genuine? More challengingly, how can the authors provably prevent fraudsters who sell fake followers and reviews for various web services from evading the detection systems? In this paper the authors focus on precisely this problem – how can the authors design a fraud detection system with strong, provable guarantees of robustness?

      Given the rise in popularity of social networks and other web services in recent years, fraudsters have strong incentives to manipulate these services.
    • Amazon and TripAdvisor fake reviews are available for sale, misleading consumers about restaurants, hotels, and other services and products
    • Detecting and neutralizing these actions is important for companies and consumers alike.
    • Graph-based approaches detect groups of spammers, often by identifying unexpectedly dense regions of the graph of users and products
    • Such methods are potentially harder to evade, as creating fake reviews unavoidably generates edges in the graph.
    • Most closely related (a) Attacks with random camouflage (b) Attacks with biased camouflage (c) Hijacked accounts
    Highlights
    • How can we detect if a politician has purchased fake followers on Twitter, or if a product’s reviews on Amazon are genuine? More challengingly, how can we provably prevent fraudsters who sell fake followers and reviews for various web services from evading our detection systems? In this paper we focus on precisely this problem – specifically, how can we design a fraud detection system with strong, provable guarantees of robustness?

      Given the rise in popularity of social networks and other web services in recent years, fraudsters have strong incentives to manipulate these services
    • How can we detect if a politician has purchased fake followers on Twitter, or if a product’s reviews on Amazon are genuine? More challengingly, how can we provably prevent fraudsters who sell fake followers and reviews for various web services from evading our detection systems? In this paper we focus on precisely this problem – how can we design a fraud detection system with strong, provable guarantees of robustness?
    • belief propagation has been used for fraud classification on eBay [21], and fraud detection [1]
    • All of these methods have been successful in finding fraud but they offer no guarantees of robustness. [25] performs adversarial analysis for spectral algorithms, showing that attacks of small enough scale will necessarily evade detection methods which rely on the top k singular value decomposition components
    • Our goal is to find a fraud detection approach satisfying the following criteria: PROBLEM DEFINITION 1 (DENSE SUBGRAPH DETECTION)
    • We propose FRAUDAR, a fraud detection algorithm which provably bounds the amount of fraud adversaries can have, even in face of camouflage
    Methods
    • Given this problem definition and attack model, the authors offer FRAUDAR and the theoretical analysis of FRAUDAR. 4.1 Metric

      the authors propose a class of metrics g that have desirable properties when used as suspiciousness metrics.
    • The authors will show that if g takes the form in (1) and (2), it can be optimized in a way that is (a) scalable; (b) offers theoretical guarantees, and (c) is robust to camouflage.
    • Let A ⊆ U be a subset of users and B ⊆ W be a subset of objects.
    • Note that g has a single argument, which is the union of the users and objects whose suspiciousness the authors are evaluating
    Results
    • The authors compare FRAUDAR to SPOKEN in their F measure in detecting the fake users.
    Conclusion
    • The authors propose FRAUDAR, a fraud detection algorithm which provably bounds the amount of fraud adversaries can have, even in face of camouflage.
    • It detected a large block of fraudulent activity in the Twitter followerfollowee graph. Scalability: FRAUDAR runs near-linearly in the input size. (See Figure 7)
    Summary
    • Introduction:

      How can the authors detect if a politician has purchased fake followers on Twitter, or if a product’s reviews on Amazon are genuine? More challengingly, how can the authors provably prevent fraudsters who sell fake followers and reviews for various web services from evading the detection systems? In this paper the authors focus on precisely this problem – how can the authors design a fraud detection system with strong, provable guarantees of robustness?

      Given the rise in popularity of social networks and other web services in recent years, fraudsters have strong incentives to manipulate these services.
    • Amazon and TripAdvisor fake reviews are available for sale, misleading consumers about restaurants, hotels, and other services and products
    • Detecting and neutralizing these actions is important for companies and consumers alike.
    • Graph-based approaches detect groups of spammers, often by identifying unexpectedly dense regions of the graph of users and products
    • Such methods are potentially harder to evade, as creating fake reviews unavoidably generates edges in the graph.
    • Most closely related (a) Attacks with random camouflage (b) Attacks with biased camouflage (c) Hijacked accounts
    • Methods:

      Given this problem definition and attack model, the authors offer FRAUDAR and the theoretical analysis of FRAUDAR. 4.1 Metric

      the authors propose a class of metrics g that have desirable properties when used as suspiciousness metrics.
    • The authors will show that if g takes the form in (1) and (2), it can be optimized in a way that is (a) scalable; (b) offers theoretical guarantees, and (c) is robust to camouflage.
    • Let A ⊆ U be a subset of users and B ⊆ W be a subset of objects.
    • Note that g has a single argument, which is the union of the users and objects whose suspiciousness the authors are evaluating
    • Results:

      The authors compare FRAUDAR to SPOKEN in their F measure in detecting the fake users.
    • Conclusion:

      The authors propose FRAUDAR, a fraud detection algorithm which provably bounds the amount of fraud adversaries can have, even in face of camouflage.
    • It detected a large block of fraudulent activity in the Twitter followerfollowee graph. Scalability: FRAUDAR runs near-linearly in the input size. (See Figure 7)
    Tables
    • Table1: Comparison between FRAUDAR and other fraud detection algorithms
    • Table2: Symbols and Definitions
    • Table3: Bipartite graph datasets used in our experiments
    Download tables as Excel
    Funding
    • This material is based upon work supported by the National Science Foundation under Grant No CNS-1314632, DGE-1252522, and IIS-1408924
    Reference
    • L. Akoglu, R. Chandy, and C. Faloutsos. Opinion fraud detection in online reviews by network effects. In ICWSM, 2013.
      Google ScholarLocate open access versionFindings
    • A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In 22nd WWW, pages 119–130. International World Wide Web Conferences Steering Committee, 2013.
      Google ScholarLocate open access versionFindings
    • Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro. Aiding the detection of fake accounts in large scale social online services. In NSDI, 2012.
      Google ScholarLocate open access versionFindings
    • [5] C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. Springer, 2001.
      Google ScholarFindings
    • [6] S. Ghosh, B. Viswanath, F. Kooti, N. K. Sharma, G. Korlam, F. Benevenuto, N. Ganguly, and K. P. Gummadi. Understanding and combating link farming in the twitter social network. In 21st WWW, pages 61–70. ACM, 2012.
      Google ScholarLocate open access versionFindings
    • [7] C. Giatsidis, D. M. Thilikos, and M. Vazirgiannis. Evaluating cooperation in communities with the k-core structure. In Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on, pages 87–93. IEEE, 2011.
      Google ScholarLocate open access versionFindings
    • [8] Z. Gu, K. Pei, Q. Wang, L. Si, X. Zhang, and D. Xu. Leaps: Detecting camouflaged attacks with statistical learning guided by program analysis.
      Google ScholarFindings
    • [9] Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB Endowment, pages 576–587, 2004.
      Google ScholarLocate open access versionFindings
    • [10] M. Jiang, A. Beutel, P. Cui, B. Hooi, S. Yang, and C. Faloutsos. A general suspiciousness metric for dense blocks in multimodal data. In Data Mining (ICDM), 2015 IEEE International Conference on, pages 781–786. IEEE, 2015.
      Google ScholarLocate open access versionFindings
    • [11] M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang. Catchsync: catching synchronized behavior in large directed graphs. In 20th KDD, pages 941–950. ACM, 2014.
      Google ScholarLocate open access versionFindings
    • [12] M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining, pages 126–138.
      Google ScholarLocate open access versionFindings
    • [13] N. Jindal and B. Liu. Opinion spam and analysis. In ICDM 2008, pages 219–230. ACM, 2008.
      Google ScholarLocate open access versionFindings
    • [14] G. Karypis and V. Kumar. METIS: Unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota, 2, 1995.
      Google ScholarFindings
    • [15] J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604–632, 1999.
      Google ScholarLocate open access versionFindings
    • [16] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In 19th WWW, pages 591–600. ACM, 2010.
      Google ScholarLocate open access versionFindings
    • [17] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Signed networks in social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1361–1370. ACM, 2010.
      Google ScholarLocate open access versionFindings
    • [18] J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165–172. ACM, 2013.
      Google ScholarLocate open access versionFindings
    • [19] A. Molavi Kakhki, C. Kliman-Silver, and A. Mislove. Iolaus: Securing online content rating systems. In 22nd WWW, pages 919–930. International World Wide Web Conferences Steering Committee, 2013.
      Google ScholarLocate open access versionFindings
    • [20] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 309–3Association for Computational Linguistics, 2011.
      Google ScholarLocate open access versionFindings
    • [21] S. Pandit, D. H. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In 16th WWW, pages 201–210. ACM, 2007.
      Google ScholarLocate open access versionFindings
    • [22] B. Perozzi, L. Akoglu, P. Iglesias Sánchez, and E. Müller. Focused clustering and outlier detection in large attributed graphs. In 20th KDD, pages 1346–1355. ACM, 2014.
      Google ScholarLocate open access versionFindings
    • [23] B. Prakash, M. Seshadri, A. Sridharan, S. Machiraju, and C. Faloutsos. Eigenspokes: Surprising patterns and community structure in large graphs. PAKDD, 2010a, 84, 2010.
      Google ScholarLocate open access versionFindings
    • [24] A. Rajaraman, J. D. Ullman, J. D. Ullman, and J. D. Ullman. Mining of massive datasets, volume 1. Cambridge University Press Cambridge, 2012.
      Google ScholarFindings
    • [25] N. Shah, A. Beutel, B. Gallagher, and C. Faloutsos. Spotting suspicious link behavior with fbox: An adversarial perspective. arXiv preprint arXiv:1410.3915, 2014.
      Findings
    • [26] D. N. Tran, B. Min, J. Li, and L. Subramanian. Sybil-resilient online content voting. In NSDI, volume 9, pages 15–28, 2009.
      Google ScholarLocate open access versionFindings
    • [27] C. Tsourakakis. The k-clique densest subgraph problem. In 24th WWW, pages 1122–1132. International World Wide Web Conferences Steering Committee, 2015.
      Google ScholarLocate open access versionFindings
    • [28] S. Virdhagriswaran and G. Dakin. Camouflaged fraud detection in domains with complex relationships. In 12th KDD, pages 941–947. ACM, 2006.
      Google ScholarLocate open access versionFindings
    • [29] H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis without aspect keyword supervision. In 17th KDD, pages 618–626. ACM, 2011.
      Google ScholarLocate open access versionFindings
    • [30] B. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. MTW, 190, 2006.
      Google ScholarLocate open access versionFindings
    • [31] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao. Sybillimit: A near-optimal social network defense against sybil attacks. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 3–17. IEEE, 2008.
      Google ScholarLocate open access versionFindings
    • [32] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman. Sybilguard: defending against sybil attacks via social networks. ACM SIGCOMM Computer Communication Review, 36(4):267–278, 2006.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Best Paper
    Best Paper of KDD, 2016
    Tags
    Comments