A non-parameter outlier detection algorithm based on Natural Neighbor

    Knowledge-Based Systems, Volume 92, 2016.

    Cited by: 34|Bibtex|Views29|Links
    EI
    Keywords:
    Natural Neighbor Graphoutlier detection algorithmnumerous applicationNatural Outlier Factordatum miningMore(17+)
    Wei bo:
    In addition to the Natural Outlier Factor, Natural Value can be used in other outlier detection algorithm such as local outlier factor and INS, and get nice results

    Abstract:

    Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. Although many Outlier detection algorithm have been...

    Code:

    Data:

    0
    Introduction
    • An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8].
    • The studies on outlier detection is very active.
    • Many outlier detection algorithms have been proposed.
    • Outlier detection algorithm can be roughly divided into distributionbased, depth-based, distance-based, clustering-based and densitybased act.
    • In distribution-based methods, the observations that deviate from a standard distribution are considered as outliers [7].
    • Distribution-based methods not applicable to dataset that multidimensional or the distribution unknown.
    • The depth-based [10,11]
    Highlights
    • Outlier detection is an important data mining activity with numerous applications, including credit card fraud detection, discovery of criminal activities in electronic commerce, video surveillance, weather prediction, and pharmaceutical research [1,2,3,4,5,6,7,8,9].

      An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8]
    • In Section 4, we propose a outlier detection algorithm based on Natural Neighbor
    • The proposed method combine the concept of the Natural Neighbor and previous density-based methods
    • As the most of the previous outlier detection methods, an object with a high outlierness scores is a promising candidate for an outlier
    • In addition to the Natural Outlier Factor, Natural Value can be used in other outlier detection algorithm such as local outlier factor and INS, and get nice results
    • We confirmed that the proposed method can accurately detects outliers from most patterns, and the proposed approach is non-parametric, and the Natural Value is applicable to other outlier detection methods
    Conclusion
    • The authors propose a new density-based algorithm for outlier detection.
    • The proposed method combine the concept of the Natural Neighbor and previous density-based methods.
    • Unlike the most of the previous outlier detection approaches, the method is non-parametric.
    • In order to further prove the effectiveness of Natural value, the authors will apply the Natural Value to more outliers detection and clustering algorithms in further studies
    Summary
    • Introduction:

      An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8].
    • The studies on outlier detection is very active.
    • Many outlier detection algorithms have been proposed.
    • Outlier detection algorithm can be roughly divided into distributionbased, depth-based, distance-based, clustering-based and densitybased act.
    • In distribution-based methods, the observations that deviate from a standard distribution are considered as outliers [7].
    • Distribution-based methods not applicable to dataset that multidimensional or the distribution unknown.
    • The depth-based [10,11]
    • Conclusion:

      The authors propose a new density-based algorithm for outlier detection.
    • The proposed method combine the concept of the Natural Neighbor and previous density-based methods.
    • Unlike the most of the previous outlier detection approaches, the method is non-parametric.
    • In order to further prove the effectiveness of Natural value, the authors will apply the Natural Value to more outliers detection and clustering algorithms in further studies
    Tables
    • Table1: Performance of the synthetic dataset
    Download tables as Excel
    Related work
    • In this section, we will briefly introduce concept of LOF and INS.

      LOF is a famous density-based outlier detection algorithm. And INS is a novel outlier detection algorithm proposed in 2014. Interested readers are referred to papers [17] and [19].

      Let D be a database, p, q, and o be some objects in D, and k be a positive integer. We use d(p,q) to denote the Euclidean distance between objects p and q.
    Funding
    • This research was supported by the National Natural Science Foundation of China (Nos. 61272194 and 61073058)
    Reference
    • W. Jin, A.K. Tung, J. Han, Mining top-n local outliers in large databases, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2001.
      Google ScholarLocate open access versionFindings
    • J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques: Concepts and Techniques, Elsevier, 2011.
      Google ScholarFindings
    • T. Pang-Ning, M. Steinbach, V. Kumar, Introduction to Data Mining, Library of Congress, 2006.
      Google ScholarFindings
    • E.M. Knox, R.T. Ng, Algorithms for mining distance-based outliers in large datasets, Proceedings of the International Conference on Very Large Data Bases, Citeseer, 1998.
      Google ScholarLocate open access versionFindings
    • E.M. Knorr, R.T. Ng, A unified notion of outliers: properties and computation, in: Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, 1997.
      Google ScholarLocate open access versionFindings
    • E.M. Knorr, R.T. Ng, V. Tucakov, Distance-based outliers: algorithms and applications, VLDB J. – Int. J. Very Large Data Bases 8 (3–4) (2000) 237–253.
      Google ScholarLocate open access versionFindings
    • V. Barnett, T. Lewis, Outliers in Statistical Data, vol. 3, Wiley, New York, 1994.
      Google ScholarLocate open access versionFindings
    • D.M. Hawkins, Identification of Outliers, vol. 11, Springer, 1980.
      Google ScholarLocate open access versionFindings
    • S. Shekhar, S. Chawla, A Tour of Spatial Databases, Prentice Hall, Upper Saddle River, New Jersey, 2002.
      Google ScholarFindings
    • I. Ruts, P.J. Rousseeuw, Computing depth contours of bivariate point clouds, Comput. Stat. Data Anal. 23 (1) (1996) 153–168.
      Google ScholarLocate open access versionFindings
    • T. Johnson, I. Kwok, R.T. Ng, Fast computation of 2-dimensional depth contours, Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, Citeseer, 1998.
      Google ScholarLocate open access versionFindings
    • M. Ester, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, 1996.
      Google ScholarLocate open access versionFindings
    • T.N. Raymond, J. Han, Efficient and effictive clustering methods for spatial data mining, in: Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
      Google ScholarLocate open access versionFindings
    • G. Karypis, E.-H. Han, V. Kumar, Chameleon: hierarchical clustering using dynamic modeling, Computer 32 (8) (1999) 68–75.
      Google ScholarLocate open access versionFindings
    • T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, ACM SIGMOD Record, ACM, 1996.
      Google ScholarLocate open access versionFindings
    • S. Guha, R. Rastogi, K. Shim, CURE: an efficient clustering algorithm for large databases, ACM SIGMOD Record, ACM, 1998.
      Google ScholarLocate open access versionFindings
    • M.M. Breunig, LOF: identifying density-based local outliers, ACM Sigmod Record, ACM, 2000.
      Google ScholarFindings
    • W. Jin, Ranking outliers using symmetric neighborhood relationship, Advances in Knowledge Discovery and Data Mining, Springer, 2006, pp. 577–593.
      Google ScholarFindings
    • J. Ha, S. Seok, J.-S. Lee, Robust outlier detection using the instability factor, Knowledge-Based Syst. 63 (2014) 15–23.
      Google ScholarFindings
    • J.L. Bentley, Multidimensional binary search trees used for associative searching, Commun. ACM 18 (9) (1975) 509–517.
      Google ScholarLocate open access versionFindings
    • X. Luo, Boosting the K-nearest-neighborhood based incremental collaborative filtering, Knowledge-Based Syst. 53 (2013) 90–99.
      Google ScholarFindings
    • S.S. Stevens, Mathematics, Measurement, andPsychophysics, 1951.
      Google ScholarFindings
    • C. Lijun, A data stream outlier delection algorithm based on reverse k nearest neighbors, Proceedings of the 2010 International Symposium on Computational Intelligence and Design (ISCID), IEEE, 2010.
      Google ScholarLocate open access versionFindings
    • J. Tang, Enhancing effectiveness of outlier detections for low density patterns, Advances in Knowledge Discovery and Data Mining, Springer, 2002, pp. 535–548.
      Google ScholarFindings
    Your rating :
    0

     

    Tags
    Comments