The performance comparison problem: Universal task access for cross-framework evaluation, Turing tests, grand challenges, and cognitive decathlons

Biologically Inspired Cognitive Architectures(2016)

引用 2|浏览44
暂无评分
摘要
A driver for achieving human-level AI and high-fidelity cognitive architectures is the ability to easily test and compare the performance and behavior of computational agents/models to humans and to one another. One major difficulty in setting up and getting participation in large-scale cognitive decathlon and grand challenge competitions, or even smaller scale cross-framework evaluation and Turing testing, is that there is no standard interface protocol that enables and facilitates human and computational agent “plug-and-play” participation across various tasks. We identify three major issues. First, human-readable task interfaces aren’t often translated into machine-readable form. Second, in the cases where a task interface is made available in a machine-readable protocol, the protocol is often task-specific, and differs from other task protocols. Finally, where both human and machine-readable versions of the task interface exist, the two versions often differ in content. This makes the bar of entry extremely high for comparison of humans and multiple computational frameworks across multiple tasks. This paper proposes a standard approach to task design where all task interactions adhere to a standard API. We provide examples of how this method can be employed to gather human and computational simulation data in text-and-button tasks, visual and animated tasks, and in real-time robotics tasks.
更多
查看译文
关键词
Grand challenge,Cognitive decathlon,Turing test,Performance comparison,Simulation,API,Standards
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要