iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries
Proceedings of the 29th International Conference on Intelligent User Interfaces(2024)
摘要
The recent explosion in popularity of large language models (LLMs) has
inspired learning engineers to incorporate them into adaptive educational tools
that automatically score summary writing. Understanding and evaluating LLMs is
vital before deploying them in critical learning environments, yet their
unprecedented size and expanding number of parameters inhibits transparency and
impedes trust when they underperform. Through a collaborative user-centered
design process with several learning engineers building and deploying summary
scoring LLMs, we characterized fundamental design challenges and goals around
interpreting their models, including aggregating large text inputs, tracking
score provenance, and scaling LLM interpretability methods. To address their
concerns, we developed iScore, an interactive visual analytics tool for
learning engineers to upload, score, and compare multiple summaries
simultaneously. Tightly integrated views allow users to iteratively revise the
language in summaries, track changes in the resulting LLM scores, and visualize
model weights at multiple levels of abstraction. To validate our approach, we
deployed iScore with three learning engineers over the course of a month. We
present a case study where interacting with iScore led a learning engineer to
improve their LLM's score accuracy by three percentage points. Finally, we
conducted qualitative interviews with the learning engineers that revealed how
iScore enabled them to understand, evaluate, and build trust in their LLMs
during deployment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要