DescribeML: A dataset description tool for machine learning

SCIENCE OF COMPUTER PROGRAMMING(2024)

引用 0|浏览6
暂无评分
摘要
Datasets are essential for training and evaluating machine learning models. However, they are also the root cause of many undesirable model behaviors, such as biased predictions. To address this issue, the machine learning community is proposing as a best practice the adoption of common guidelines for describing datasets. However, these guidelines are based on natural language descriptions of the dataset, hampering the automatic computation and analysis of such descriptions. To overcome this situation, we present DescribeML, a language engineering tool to precisely describe machine learning datasets in terms of their composition, provenance, and social concerns in a structured format. The tool is implemented as a Visual Studio Code extension.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons .org /licenses /by-nc -nd /4 .0/).
更多
查看译文
关键词
Datasets,Machine learning,Model-driven engineering,Fairness,Domain-specific languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要