MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann,Iain Cruickshank,Grace A. Lewis,Christian Kästner

arxiv（2023）

引用 0|浏览31

暂无评分

摘要

Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

查看译文

关键词

machine learning, test and evaluation, machine learning evaluation, responsible AI

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要