Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce

Lecture Notes in Business Information Processing(2015)

引用 2|浏览22
暂无评分
摘要
Clustering is a major data mining technique that groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Among several types of clustering, density-based clustering algorithms are more efficient in detecting clusters with varied density and different shapes. One of the most important density-based clustering algorithms is DBSCAN. Due to the huge size of generated data by the widespread diffusion of wireless technologies and the complexity of big data analysis, new scalable algorithms for efficiently processing such data are needed. In this chapter we are particularly interested in using traffic data for finding congested areas in a city. For this purpose, we developed a new distributed and efficient strategy of DBSCAN algorithm that uses MapReduce to detect dense areas based on the input parameters. We conducted experiments using real traffic data of a brazilian city, Fortaleza, and compared our approach with the centralized and the MapReduce-based approaches. Our preliminary results confirmed that our approach is scalable and more efficient than the other ones. We also present an incremental version of DBSCAN considering the MapReduce version of it.
更多
查看译文
关键词
DBSCAN,MapReduce,Traffic data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要