What Do Language Models Hear? Probing for Auditory Representations in Language Models
CoRR(2024)
摘要
This work explores whether language models encode meaningfully grounded
representations of sounds of objects. We learn a linear probe that retrieves
the correct text representation of an object given a snippet of audio related
to that object, where the sound representation is given by a pretrained audio
model. This probe is trained via a contrastive loss that pushes the language
representations and sound representations of an object to be close to one
another. After training, the probe is tested on its ability to generalize to
objects that were not seen during training. Across different language models
and audio models, we find that the probe generalization is above chance in many
cases, indicating that despite being trained only on raw text, language models
encode grounded knowledge of sounds for some objects.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要