Leveraging large language models for data analysis automation

Jacqueline A Jansen,Artür Manukyan, Nour Al Khoury,Altuna Akalin

biorxiv(2023)

引用 0|浏览0
暂无评分
摘要
Data analysis is constrained by a shortage of skilled experts, particularly in biology, where detailed data interpretation is vital for understanding complex biological processes and developing new treatments and diagnostics. To address this, we developed mergen, an R package that leverages Large Language Models (LLMs) for data analysis code generation and execution. Our primary goal is to enable humans to conduct data analysis by simply describing their objectives and the desired analyses for specific datasets through clear text. Our approach improves code generation via specialized prompt engineering and error feedback mechanisms. In addition, our system can execute the data analysis workflows prescribed by the LLM providing the results of the data analysis workflow for human review. We evaluated the performance of this data analysis system using various data analysis tasks. Our evaluation revealed that while LLMs effectively generate code for some data analysis tasks, challenges remain in executable code generation, especially for complex data analysis tasks. Our study contributes to a better understanding of LLM capabilities and limitations, providing software infrastructure and practical insights for their effective integration into data analysis workflows. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要