# A non-parameter outlier detection algorithm based on Natural Neighbor

Knowledge-Based Systems, Volume 92, 2016.

EI

Keywords:

Natural Neighbor Graphoutlier detection algorithmnumerous applicationNatural Outlier Factordatum miningMore(17+)

Wei bo:

Abstract:

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. Although many Outlier detection algorithm have been...

Code:

Data:

Introduction

- An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8].
- The studies on outlier detection is very active.
- Many outlier detection algorithms have been proposed.
- Outlier detection algorithm can be roughly divided into distributionbased, depth-based, distance-based, clustering-based and densitybased act.
- In distribution-based methods, the observations that deviate from a standard distribution are considered as outliers [7].
- Distribution-based methods not applicable to dataset that multidimensional or the distribution unknown.
- The depth-based [10,11]

Highlights

- Outlier detection is an important data mining activity with numerous applications, including credit card fraud detection, discovery of criminal activities in electronic commerce, video surveillance, weather prediction, and pharmaceutical research [1,2,3,4,5,6,7,8,9].

An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8] - In Section 4, we propose a outlier detection algorithm based on Natural Neighbor
- The proposed method combine the concept of the Natural Neighbor and previous density-based methods
- As the most of the previous outlier detection methods, an object with a high outlierness scores is a promising candidate for an outlier
- In addition to the Natural Outlier Factor, Natural Value can be used in other outlier detection algorithm such as local outlier factor and INS, and get nice results
- We confirmed that the proposed method can accurately detects outliers from most patterns, and the proposed approach is non-parametric, and the Natural Value is applicable to other outlier detection methods

Conclusion

- The authors propose a new density-based algorithm for outlier detection.
- The proposed method combine the concept of the Natural Neighbor and previous density-based methods.
- Unlike the most of the previous outlier detection approaches, the method is non-parametric.
- In order to further prove the effectiveness of Natural value, the authors will apply the Natural Value to more outliers detection and clustering algorithms in further studies

Summary

## Introduction:

An outlier is an observation that deviates so much from other observations so that it arouses that it is generated by a different mechanism [8].- The studies on outlier detection is very active.
- Many outlier detection algorithms have been proposed.
- Outlier detection algorithm can be roughly divided into distributionbased, depth-based, distance-based, clustering-based and densitybased act.
- In distribution-based methods, the observations that deviate from a standard distribution are considered as outliers [7].
- Distribution-based methods not applicable to dataset that multidimensional or the distribution unknown.
- The depth-based [10,11]
## Conclusion:

The authors propose a new density-based algorithm for outlier detection.- The proposed method combine the concept of the Natural Neighbor and previous density-based methods.
- Unlike the most of the previous outlier detection approaches, the method is non-parametric.
- In order to further prove the effectiveness of Natural value, the authors will apply the Natural Value to more outliers detection and clustering algorithms in further studies

- Table1: Performance of the synthetic dataset

Related work

- In this section, we will briefly introduce concept of LOF and INS.

LOF is a famous density-based outlier detection algorithm. And INS is a novel outlier detection algorithm proposed in 2014. Interested readers are referred to papers [17] and [19].

Let D be a database, p, q, and o be some objects in D, and k be a positive integer. We use d(p,q) to denote the Euclidean distance between objects p and q.

Funding

- This research was supported by the National Natural Science Foundation of China (Nos. 61272194 and 61073058)

Reference

- W. Jin, A.K. Tung, J. Han, Mining top-n local outliers in large databases, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2001.
- J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques: Concepts and Techniques, Elsevier, 2011.
- T. Pang-Ning, M. Steinbach, V. Kumar, Introduction to Data Mining, Library of Congress, 2006.
- E.M. Knox, R.T. Ng, Algorithms for mining distance-based outliers in large datasets, Proceedings of the International Conference on Very Large Data Bases, Citeseer, 1998.
- E.M. Knorr, R.T. Ng, A unified notion of outliers: properties and computation, in: Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, 1997.
- E.M. Knorr, R.T. Ng, V. Tucakov, Distance-based outliers: algorithms and applications, VLDB J. – Int. J. Very Large Data Bases 8 (3–4) (2000) 237–253.
- V. Barnett, T. Lewis, Outliers in Statistical Data, vol. 3, Wiley, New York, 1994.
- D.M. Hawkins, Identification of Outliers, vol. 11, Springer, 1980.
- S. Shekhar, S. Chawla, A Tour of Spatial Databases, Prentice Hall, Upper Saddle River, New Jersey, 2002.
- I. Ruts, P.J. Rousseeuw, Computing depth contours of bivariate point clouds, Comput. Stat. Data Anal. 23 (1) (1996) 153–168.
- T. Johnson, I. Kwok, R.T. Ng, Fast computation of 2-dimensional depth contours, Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, Citeseer, 1998.
- M. Ester, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of International Conference on Knowledge Discovery and Data Mining, KDD, 1996.
- T.N. Raymond, J. Han, Efficient and effictive clustering methods for spatial data mining, in: Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
- G. Karypis, E.-H. Han, V. Kumar, Chameleon: hierarchical clustering using dynamic modeling, Computer 32 (8) (1999) 68–75.
- T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: an efficient data clustering method for very large databases, ACM SIGMOD Record, ACM, 1996.
- S. Guha, R. Rastogi, K. Shim, CURE: an efficient clustering algorithm for large databases, ACM SIGMOD Record, ACM, 1998.
- M.M. Breunig, LOF: identifying density-based local outliers, ACM Sigmod Record, ACM, 2000.
- W. Jin, Ranking outliers using symmetric neighborhood relationship, Advances in Knowledge Discovery and Data Mining, Springer, 2006, pp. 577–593.
- J. Ha, S. Seok, J.-S. Lee, Robust outlier detection using the instability factor, Knowledge-Based Syst. 63 (2014) 15–23.
- J.L. Bentley, Multidimensional binary search trees used for associative searching, Commun. ACM 18 (9) (1975) 509–517.
- X. Luo, Boosting the K-nearest-neighborhood based incremental collaborative filtering, Knowledge-Based Syst. 53 (2013) 90–99.
- S.S. Stevens, Mathematics, Measurement, andPsychophysics, 1951.
- C. Lijun, A data stream outlier delection algorithm based on reverse k nearest neighbors, Proceedings of the 2010 International Symposium on Computational Intelligence and Design (ISCID), IEEE, 2010.
- J. Tang, Enhancing effectiveness of outlier detections for low density patterns, Advances in Knowledge Discovery and Data Mining, Springer, 2002, pp. 535–548.

Tags

Comments