A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark.

Swarm and Evolutionary Computation(2018)

引用 33|浏览50
暂无评分
摘要
Nowadays the phenomenon of Big Data is overwhelming our capacity to extract relevant knowledge through classical machine learning techniques. Discretization (as part of data reduction) is presented as a real solution to reduce this complexity. However, standard discretizers are not designed to perform well with such amounts of data. This paper proposes a distributed discretization algorithm for Big Data analytics based on evolutionary optimization. After comparing with a distributed discretizer based on the Minimum Description Length Principle, we have found that our solution yields more accurate and simpler solutions in reasonable time.
更多
查看译文
关键词
Discretizacion,Evolutionary computation,Big Data,Data Mining,Apache Spark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要