Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

    TKDD, Volume 10, Issue 4, 2016.

    Cited by: 50|Bibtex|Views29|Links
    EI
    Keywords:
    Defense Advanced Research Projects Agencydistributed denial of servicecertain accountgraph structurelarge directed graphMore(14+)
    Wei bo:
    We propose a novel method called CATCHSYNC that exploits two signs of artificial and non-organic behavior, synchronicity and normality, to automatically report and catch suspicious nodes on large directed graphs

    Abstract:

    Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service...More

    Code:

    Data:

    0
    Introduction
    • Given a directed graph with many millions of nodes, can the authors tell which nodes are suspicious just based on the graph structure? For many real-world applications, fraudsters try to manipulate networks for personal gain.
    • In social networks, like Twitter’s “who-follows-whom” graph, fraudsters are paid to make certain accounts seem more legitimate or popular through giving them many additional followers
    • The spammers deliver these purchases through either generating fake accounts or controlling real accounts through malware and using them to follow their “customers.”1 This phenomenon creates distorted images of popularity and trustworthiness, with unpleasant or even dangerous effects to honest users.
    • The authors take a strictly graph mining approach, using exclusively the graph structure to find nodes that are suspicious because of their position in the graph
    Highlights
    • Given a directed graph with many millions of nodes, can we tell which nodes are suspicious just based on the graph structure? For many real-world applications, fraudsters try to manipulate networks for personal gain
    • We provide a theorem of the normal shape of SN-plot, which could be the basis for catching suspicious nodes
    • Much of the research on anomaly detection frames the problem as a labeling task, in the real-world anomaly detection is a combination of machine learning, manual verification, and discovering new types of attacks as they arise
    • For each dataset CATCHSYNC only uses the graph structure, but we have the user ID and name associated with the nodes, so that we can provide real links to check the users’ profile information
    • While we have demonstrated that CATCHSYNC is successful at detecting classic spammy behavior, it discovers more subtle types of suspicious behavior that a simpler labeling analysis would miss
    • We propose a novel method called CATCHSYNC that exploits two signs of artificial and non-organic behavior, synchronicity and normality, to automatically report and catch suspicious nodes on large directed graphs
    Methods
    • The authors present an empirical evaluation of CATCHSYNC, demonstrating its effectiveness in spotting suspicious behavior.
    • The authors provide evidence that CATCHSYNC is effective at both the classic problem of labeling suspicious behavior, as well as surfacing new patterns of unusual group behavior:.
    • —Detection effectiveness: The authors demonstrate CATCHSYNC’s ability to accurately label suspicious behavior and remove anomalies through three techniques.
    • (b) Labeling task: The authors test the accuracy, precision, and recall on two real datasets, where the authors use the hand-labeled nodes from a random sample of accounts as ground truth.
    Results
    • For each dataset CATCHSYNC only uses the graph structure, but the authors have the user ID and name associated with the nodes, so that the authors can provide real links to check the users’ profile information.
    • The 5 volunteers are all 20 to 25-year-old college students who have been social network users for at least 3 years
    • They are provided URL links directed to the 1,000 users’ Twitter or Tencent Weibo pages, and read their tweets and profile information.
    • A user is labeled as a suspicious one if the volunteer finds he or she matches too many of the following clues:
    Conclusion
    • The authors propose a novel method called CATCHSYNC that exploits two signs of artificial and non-organic behavior, synchronicity and normality, to automatically report and catch suspicious nodes on large directed graphs.
    • CATCHSYNC has desirable properties.
    • —Effectiveness: it spots synchronized behavior and catches suspicious sourcetarget groups.
    • —Scalability: its complexity is linear in the number of edges.
    • —Parameter free: the operator can implement the algorithm without specifying any parameters such as the density, the number, and scale of groups.
    • It is solely based on topology, and it requires neither labeled nodes nor node attributes, though it can incorporate them for better performance
    Summary
    • Introduction:

      Given a directed graph with many millions of nodes, can the authors tell which nodes are suspicious just based on the graph structure? For many real-world applications, fraudsters try to manipulate networks for personal gain.
    • In social networks, like Twitter’s “who-follows-whom” graph, fraudsters are paid to make certain accounts seem more legitimate or popular through giving them many additional followers
    • The spammers deliver these purchases through either generating fake accounts or controlling real accounts through malware and using them to follow their “customers.”1 This phenomenon creates distorted images of popularity and trustworthiness, with unpleasant or even dangerous effects to honest users.
    • The authors take a strictly graph mining approach, using exclusively the graph structure to find nodes that are suspicious because of their position in the graph
    • Methods:

      The authors present an empirical evaluation of CATCHSYNC, demonstrating its effectiveness in spotting suspicious behavior.
    • The authors provide evidence that CATCHSYNC is effective at both the classic problem of labeling suspicious behavior, as well as surfacing new patterns of unusual group behavior:.
    • —Detection effectiveness: The authors demonstrate CATCHSYNC’s ability to accurately label suspicious behavior and remove anomalies through three techniques.
    • (b) Labeling task: The authors test the accuracy, precision, and recall on two real datasets, where the authors use the hand-labeled nodes from a random sample of accounts as ground truth.
    • Results:

      For each dataset CATCHSYNC only uses the graph structure, but the authors have the user ID and name associated with the nodes, so that the authors can provide real links to check the users’ profile information.
    • The 5 volunteers are all 20 to 25-year-old college students who have been social network users for at least 3 years
    • They are provided URL links directed to the 1,000 users’ Twitter or Tencent Weibo pages, and read their tweets and profile information.
    • A user is labeled as a suspicious one if the volunteer finds he or she matches too many of the following clues:
    • Conclusion:

      The authors propose a novel method called CATCHSYNC that exploits two signs of artificial and non-organic behavior, synchronicity and normality, to automatically report and catch suspicious nodes on large directed graphs.
    • CATCHSYNC has desirable properties.
    • —Effectiveness: it spots synchronized behavior and catches suspicious sourcetarget groups.
    • —Scalability: its complexity is linear in the number of edges.
    • —Parameter free: the operator can implement the algorithm without specifying any parameters such as the density, the number, and scale of groups.
    • It is solely based on topology, and it requires neither labeled nodes nor node attributes, though it can incorporate them for better performance
    Tables
    • Table1: Compare CATCHSYNC with Existing Approaches
    • Table2: Synthetic Data
    • Table3: Symbols and Definitions
    • Table4: Symbols and Descriptions Used in Theorem 3.2
    • Table5: CATCHSYNC+SPOT Outperforms Every Single Other Method
    • Table6: Real Data from Twitter and Tencent Weibo
    • Table7: Plots and Descriptions
    • Table8: CATCHSYNC Consistently Beats Competitors
    Download tables as Excel
    Related work
    • There is a significant body on research related to our proposed problem, which we categorize into three groups: graph-based anomaly detection, subgraph mining algorithms, social spammer detection. Table I discusses the majority of them from aspects of effectiveness, parameter setting, and side information, and it shows the advantages of our new approach CATCHSYNC.

      2.1. Graph-based Anomaly Detection

      Many anomaly detection techniques have been developed based on graphs [Shekhar et al 2001; Noble and Cook 2003; Chandola et al 2009]. AUTOPART [Chakrabarti 2004] groups similar nodes into clusters, and tags the edges deviating from the overall structure as outliers. However, we often have to face the lack of similarity between suspicious/normal behaviors on the graphs. Some recent works based on graphs propose to detect suspicious nodes and edges by discovering structural anomalies and propagating beliefs for fraudulent nodes [Sun et al 2005; Chau et al 2006; Eberle and Holder 2007; Chen et al 2009]. OUTRANK [Moonesinghe and Tan 2008] is a random walk-based approach using a graph to detect outliers with similarity of objects. ODDBALL [Akoglu et al 2010] gives rules in density, weights, ranks, and eigenvalues that are related to “neighborhood undirected sub-graphs,” assuming near-cliques and stars are suspicious. NETPROBE [Pandit et al 2007] uses a list of committed frauds to blame likely
    Funding
    • Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, DARPA, or other funding parties
    Reference
    • Charu C. Aggarwal. 201An Introduction to Social Network Data Analytics. Springer. Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 410–421. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copy-
      Google ScholarFindings
    • Catch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, IW3C2, 119–130 Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph structure in the web. Comput. Netw. 33, 1 (2000), 309–320. Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 201Aiding the detection of fake accounts in large scale social online services. In NSDI. USENIX, 197–210. Deepayan Chakrabarti. 2004. Autopart: Parameter-free graph partitioning and outlier detection. In Knowledge Discovery in Databases: PKDD 2004.
      Google ScholarLocate open access versionFindings
    • Springer, 112–124. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 41, 3 (2009), 15. Duen Horng Chau, Shashank Pandit, and Christos Faloutsos. 2006. Detecting fraudulent personalities in networks of online auctioneers. In Knowledge Discovery in Databases: PKDD 2006.
      Google ScholarLocate open access versionFindings
    • Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014a. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941–950.
      Google ScholarLocate open access versionFindings
    • Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014b. Detecting suspicious following behavior in multimillion-node social networks. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 305–306.
      Google ScholarLocate open access versionFindings
    • Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014c. Inferring strange behavior from connectivity pattern in social networks. In Advances in Knowledge Discovery and Data Mining. Springer, 126–138.
      Google ScholarLocate open access versionFindings
    • George Karypis and Vipin Kumar. 1995. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).
      Google ScholarFindings
    • Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46, 5 (1999), 604–632.
      Google ScholarLocate open access versionFindings
    • Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. Structure and evolution of online social networks. In Link Mining: Models, Algorithms, and Applications. Springer, 337–357.
      Google ScholarFindings
    • Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 20What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. ACM, 591–600.
      Google ScholarLocate open access versionFindings
    • Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots+ machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435–442.
      Google ScholarLocate open access versionFindings
    • Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and S. Yu Philip. 2005. Mining behavior graphs for” backtrace” of noncrashing bugs. In SDM. SIAM, 286–297.
      Google ScholarLocate open access versionFindings
    • H. D. K. Moonesinghe and Pang-Ning Tan. 2008. Outrank: A graph-based outlier detection framework using random walk. Int. J. Artif. Intell. Tools 17, 01 (2008), 19–36.
      Google ScholarLocate open access versionFindings
    • Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 631–636.
      Google ScholarLocate open access versionFindings
    • Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference on World Wide Web. ACM, 201–210.
      Google ScholarLocate open access versionFindings
    • Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 228–238.
      Google ScholarLocate open access versionFindings
    • Charles Perez, Marc Lemercier, Babiga Birregah, and Alain Corpel. 2011. Spot 1.0: Scoring suspicious profiles on twitter. In International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2011). IEEE, 377–381.
      Google ScholarLocate open access versionFindings
    • B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos. 2010. Eigenspokes: Surprising patterns and scalable community chipping in large graphs. In Advances in Knowledge Discovery and Data Mining. Springer, 435–448.
      Google ScholarLocate open access versionFindings
    • Shashi Shekhar, Chang-Tien Lu, and Pusheng Zhang. 2001. Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 371–376.
      Google ScholarLocate open access versionFindings
    • Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 1–9.
      Google ScholarLocate open access versionFindings
    • Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In 5th IEEE International Conference on Data Mining. IEEE, 418–425.
      Google ScholarLocate open access versionFindings
    • David M. J. Tax and Robert P. W. Duin. 1998. Outlier detection using classifier instability. In Advances in Pattern Recognition. Springer, 593–601.
      Google ScholarLocate open access versionFindings
    • Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 104–112.
      Google ScholarLocate open access versionFindings
    • Xifeng Yan and Jiawei Han. 2003. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 286–295.
      Google ScholarLocate open access versionFindings
    • Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang. 2010. Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22, 9 (2010), 1203–1218.
      Google ScholarLocate open access versionFindings
    • Received September 2014; revised March 2015; accepted March 2015
      Google ScholarFindings
    Your rating :
    0

     

    Tags
    Comments