IEEE transactions on artificial intelligence(2023)
引用0|浏览0
暂无评分
摘要
The Generative Pre-trained Transformer
(GPT) models, renowned for generating human-like text, occasionally produce “hallucinations” - outputs that diverge from human expectations. Current mitigation strategies for these GPT hallucinations largely rely on algorithmic automation, thereby overlooking the complexities of human judgment and cultural influence, particularly in fact interpretation. Addressing this issue, we have introduced a Culturally Sensitive Test that integrates language subjectivity, cultural nuances, and GPT idiosyncrasies. We have applied this test to five GPT models—OpenAI’s ChatGPT-3.5 and ChatGPT-4, Google’s Bard, Perplexity AI and TruthGPT - evaluating their responses to 70 questions across seven categories designed to provoke hallucinations. The evaluated models demonstrated varying performance, with controversial topics, those lacking clear scientific consensus and the brain teasers proving more susceptible to GPT hallucinations. Our study has paved the way for a nuanced assessment of GPT hallucinations.