Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
CoRR(2023)
摘要
We introduce VL2NL, a Large Language Model (LLM) framework that generates
rich and diverse NL datasets using only Vega-Lite specifications as input,
thereby streamlining the development of Natural Language Interfaces (NLIs) for
data visualization. To synthesize relevant chart semantics accurately and
enhance syntactic diversity in each NL dataset, we leverage 1) a guided
discovery incorporated into prompting so that LLMs can steer themselves to
create faithful NL datasets in a self-directed manner; 2) a score-based
paraphrasing to augment NL syntax along with four language axes. We also
present a new collection of 1,981 real-world Vega-Lite specifications that have
increased diversity and complexity than existing chart collections. When tested
on our chart collection, VL2NL extracted chart semantics and generated L1/L2
captions with 89.4
generating and paraphrasing utterances and questions with greater diversity
compared to the benchmarks. Last, we discuss how our NL datasets and framework
can be utilized in real-world scenarios. The codes and chart collection are
available at https://github.com/hyungkwonko/chart-llm.
更多查看译文
关键词
visualizations powered
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要