ESIREOS: Efficient, Scalable, Internal, Relative Evaluation of Outliers Solutions.

William A. Alves,Henrique O. Marques, Murilo Coelho Naldi,Jörg Sander

International Conference on Parallel and Distributed Systems(2023)

引用 0|浏览0
暂无评分
摘要
Anomaly (outlier) detection is one of the main tasks of data mining. Since anomalies can translate into important information in numerous fields, several methods have been developed to identify them. Unsupervised methods for outlier detection, which is the focus of this work, have become increasingly important due to the lack of labeled data in many applications. A common challenge when dealing with unsupervised methods, however, is how to evaluate the quality of their results. Without labels available, one has to rely on the so-called internal evaluation, which is based solely on the data and the assessed solutions. In this context, IREOS was proposed as the first internal evaluation measure for unsupervised anomaly detection. IREOS allows one to select better solutions (algorithms, parameters) for a given problem using only intrinsic information from the data. One major limitation of IREOS, however, is the demand to train many highly complex classifiers, which makes it impractical for large datasets. In this work, we propose the first Efficient, Scalable version of IREOS, ESIREOS. We address the computational performance shortcomings of IREOS by using Massive Parallel Computing (MPC) techniques that efficiently implement horizontal computational scaling for many machine learning problems. ESIREOS also makes use of approximated nearest neighbor graphs (NNGs) to reduce the volume of data and processing power demanded by IREOS without any significant loss in the quality of the results. We evaluate ESIREOS theoretically by estimating its asymptotic complexity and empirically with experiments on real and synthetic datasets to assess its effectiveness and efficiency compared to the original version. Our results showed that ESIREOS significantly improved the computational runtime compared to the original IREOS while maintaining quality. Also, ESIREOS proved capable of evaluating solutions for very large datasets, even those which IREOS cannot evaluate in a feasible time. Therefore, this efficient and scalable new version can be used in many scenarios, mainly, but not limited to, those with large or distributed data.
更多
查看译文
关键词
Data Mining,Outlier Detection,Unsupervised Evaluation,Massive Parallel Computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要