DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 3|浏览26
暂无评分
摘要
Embeddings extracted by deep neural networks have become the state-of-the-art utterance representation in speaker verification (SV). Despite the various network architectures that have been investigated in previous works, how to design and scale up networks to achieve a better trade-off on performance and complexity in a principled manner has been rarely discussed in the SV field. In this paper, we first systematically study model scaling from the perspective of the depth and width of networks and empirically discover that depth is more important than the width of networks for speaker verification task. Based on this observation, we design a new backbone constructed entirely from standard convolutional network modules by significantly increasing the number of layers while maintaining the network complexity following the depth-first rule and scale it up to obtain a family of much deeper models dubbed DF-ResNets. Comprehensive comparisons with other state-of-the-art systems on the Voxceleb dataset demonstrate that DF-ResNets achieve a much better trade-off than previous SV systems in terms of performance and complexity.
更多
查看译文
关键词
boosting speaker verification performance,df-resnet,depth-first
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要