Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs
CoRR(2024)
摘要
Identifying user intents in information-seeking dialogs is crucial for a
system to meet user's information needs. Intent prediction (IP) is challenging
and demands sufficient dialogs with human-labeled intents for training.
However, manually annotating intents is resource-intensive. While large
language models (LLMs) have been shown to be effective in generating synthetic
data, there is no study on using LLMs to generate intent-aware
information-seeking dialogs. In this paper, we focus on leveraging LLMs for
zero-shot generation of large-scale, open-domain, and intent-aware
information-seeking dialogs. We propose SOLID, which has novel self-seeding and
multi-intent self-instructing schemes. The former improves the generation
quality by using the LLM's own knowledge scope to initiate dialog generation;
the latter prompts the LLM to generate utterances sequentially, and mitigates
the need for manual prompt design by asking the LLM to autonomously adapt its
prompt instruction when generating complex multi-intent utterances.
Furthermore, we propose SOLID-RL, which is further trained to generate a dialog
in one step on the data generated by SOLID. We propose a length-based quality
estimation mechanism to assign varying weights to SOLID-generated dialogs based
on their quality during the training process of SOLID-RL. We use SOLID and
SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size
of existing datasets. Experiments show that IP methods trained on dialogs
generated by SOLID and SOLID-RL achieve better IP quality than ones trained on
human-generated dialogs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要