Quantifying Semantic Query Similarity for Automated Linear SQL Grading: A Graph-based Approach
arxiv(2024)
摘要
Quantifying the semantic similarity between database queries is a critical
challenge with broad applications, ranging from query log analysis to automated
educational assessment of SQL skills. Traditional methods often rely solely on
syntactic comparisons or are limited to checking for semantic equivalence.
This paper introduces a novel graph-based approach to measure the semantic
dissimilarity between SQL queries. Queries are represented as nodes in an
implicit graph, while the transitions between nodes are called edits, which are
weighted by semantic dissimilarity. We employ shortest path algorithms to
identify the lowest-cost edit sequence between two given queries, thereby
defining a quantifiable measure of semantic distance.
A prototype implementation of this technique has been evaluated through an
empirical study, which strongly suggests that our method provides more accurate
and comprehensible grading compared to existing techniques. Moreover, the
results indicate that our approach comes close to the quality of manual
grading, making it a robust tool for diverse database query comparison tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要