GrounDial: Human-norm Grounded Safe Dialog Response Generation
CoRR(2024)
摘要
Current conversational AI systems based on large language models (LLMs) are
known to generate unsafe responses, agreeing to offensive user input or
including toxic content. Previous research aimed to alleviate the toxicity, by
fine-tuning LLM with manually annotated safe dialogue histories. However, the
dependency on additional tuning requires substantial costs. To remove the
dependency, we propose GrounDial, where response safety is achieved by
grounding responses to commonsense social rules without requiring fine-tuning.
A hybrid approach of in-context learning and human-norm-guided decoding of
GrounDial enables the response to be quantitatively and qualitatively safer
even without additional data or tuning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要