Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
arxiv(2024)
摘要
Recent advances in large language models have demonstrated their potential
for automated generation of hardware description language (HDL) code from
high-level prompts. Researchers have utilized fine-tuning to enhance the
ability of these large language models (LLMs) in the field of Chip Design.
However, the lack of Verilog data hinders further improvement in the quality of
Verilog generation by LLMs. Additionally, the absence of a Verilog and
Electronic Design Automation (EDA) script data augmentation framework
significantly increases the time required to prepare the training dataset for
LLM trainers. This paper proposes an automated design-data augmentation
framework, which generates high-volume and high-quality natural language
aligned with Verilog and EDA scripts. For Verilog generation, it translates
Verilog files to an abstract syntax tree and then maps nodes to natural
language with a predefined template. For Verilog repair, it uses predefined
rules to generate the wrong verilog file and then pairs EDA Tool feedback with
the right and wrong verilog file. For EDA Script generation, it uses existing
LLM(GPT-3.5) to obtain the description of the Script. To evaluate the
effectiveness of our data augmentation method, we finetune Llama2-13B and
Llama2-7B models using the dataset generated by our augmentation framework. The
results demonstrate a significant improvement in the Verilog generation tasks
with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the
current state-of-the-art open-source Verilog generation model, increasing from
58.8
rate improvement compared with GPT-3.5 in Verilog generation and outperforms in
EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要