STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
arxiv(2023)
摘要
Albeit great performance of Transformer-based speech selfsupervised learning
(SSL) models, their large parameter size and computational cost make them
unfavorable to utilize. In this study, we propose to compress the speech SSL
models by distilling speech temporal relation (STaR). Unlike previous works
that directly match the representation for each speech frame, STaR distillation
transfers temporal relation between speech frames, which is more suitable for
lightweight student with limited capacity. We explore three STaR distillation
objectives and select the best combination as the final STaR loss. Our model
distilled from HuBERT BASE achieves an overall score of 79.8 on SUPERB
benchmark, the best performance among models with up to 27 million parameters.
We show that our method is applicable across different speech SSL models and
maintains robust performance with further reduced parameters.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要