An Automatic Prompt Generation System for Tabular Data Tasks
arxiv(2024)
摘要
Efficient processing of tabular data is important in various industries,
especially when working with datasets containing a large number of columns.
Large language models (LLMs) have demonstrated their ability on several tasks
through carefully crafted prompts. However, creating effective prompts for
tabular datasets is challenging due to the structured nature of the data and
the need to manage numerous columns. This paper presents an innovative
auto-prompt generation system suitable for multiple LLMs, with minimal
training. It proposes two novel methods; 1) A Reinforcement Learning-based
algorithm for identifying and sequencing task-relevant columns 2) Cell-level
similarity-based approach for enhancing few-shot example selection. Our
approach has been extensively tested across 66 datasets, demonstrating improved
performance in three downstream tasks: data imputation, error detection, and
entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要