Gar: A Generate-and-Rank Approach for Natural Language to SQL Translation.

ICDE(2023)

引用 1|浏览26
暂无评分
摘要
A Natural Language (NL) Interface to Databases (NLIDB) aims to help end-users access databases. State-of-the-art approaches primarily construct language translation models to convert NL queries to SQL queries. While these models exhibit good performance on NLIDB benchmarks, the translation accuracy seems to have stalled at between 70%-75%, and most erroneous translations happen with complex queries that require an understanding of the structure and semantics specific to a database. This paper proposes a Generate-And-Rank approach called Gar. Gar assumes that a set of sample SQL queries is given to represent the possible user-intended queries to the database. In order to provide a broad coverage, akin to avoiding over-fitting, Gar extracts the basic components from the sample set to form the basic building blocks to generate a set of generalized SQL queries. By leveraging a simple rule-based SQL to NL technique, a less natural NL expression called a dialect expression for each sample and generalized SQL query is obtained. Finally, a learning-to-rank method is used for a given NL query to retrieve the best dialect expression and hence the resulting SQL query. Extensive experiments are performed to study Gar in comparison with other approaches. The results show that Gar achieves better performance on the NLIDB benchmarks, including in particular a 78.5% translation accuracy on the popular Spider benchmark, outperforming the best reported accuracy in the literature. An extension to Gar, called Gar-j, is further introduced to aid the translation by annotating join semantics in the sample queries. The experimental results show that Gar-j can further improve translation accuracy on queries with joins. Code for Gar can be found at https://github.com/Kaimary/GAR.
更多
查看译文
关键词
NLIDB,NL2SQL,SQL,learning-to-rank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要