MetaPrep - Data preparation pipelines recommendation via meta-learning.

ICMLA(2021)

引用 0|浏览2
暂无评分
摘要
Data preparation is a mandatory phase in the machine learning pipeline. The goal of data preparation is to convert noisy and disordered data into refined data that can be used by the algorithms. However, data preparation is time-consuming and requires specialized knowledge about the data and algorithms. Therefore, automating data preparation is essential to decrease the effort made by data scientists to develop satisfactory models. Despite its relevance, current AutoML platforms disregard or make simple hardcoded data preparation pipelines. Trying to fill this gap, we present a meta-learning-based recommendation system for data preparation. Our system recommends five pipelines, ranked by their relevance, making it useful for users with varying degrees of experience. Using the top-1 pipeline we demonstrated that our proposal allows a better performance of an AutoML system. Furthermore, the accuracy rates of our method were comparable to those achieved by a reinforcement-learning-based algorithm with the same goal, but it was up to two orders of magnitude faster. Moreover, we tested our method in a real-world application and evaluated its benefits and limitations in this scenario.
更多
查看译文
关键词
Meta-learning,data preparation,automated machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要