Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

CoRR(2023)

引用 0|浏览10
暂无评分
摘要
Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting the recording equipment. To aid quick speaker and session adaptation of ultrasound tongue imaging-based SSI models, we extend our deep networks with a spatial transformer network (STN) module, capable of performing an affine transformation on the input images. Although the STN part takes up only about 10\% of the network, our experiments show that adapting just the STN module might allow to reduce MSE by 88\% on the average, compared to retraining the whole network. The improvement is even larger (around 92\%) when adapting the network to different recording sessions from the same speaker.
更多
查看译文
关键词
silent speech interfaces,tongue,transformer,ultrasound-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要