DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

Cited 2|Views73
No score
Abstract
User-defined keyword spotting research has been gaining popularity in recent years. An open-vocabulary keyword spotting system with high accuracy and low power consumption remains a challenging problem. In this paper, we propose the DyCon-vMixer model for tackling the problem. By leveraging dynamic convolution alongside a convolutional equivalent of the MLP-Mixer architecture, we obtain an efficient and effective model that has less than 200K parameters and uses less than 11M MACs. Despite the fact that our model is less than half the size of state-of-the-art RNN and CNN models, it shows competitive results on the publicly available Hey-Snips and Hey-Snapdragon datasets. In addition, we discuss the importance of designing an effective evaluation system and detail our evaluation pipeline for comparison with future work.
More
Translated text
Key words
Dynamic Convolution, Open-vocabulary Keyword Spotting, User-defined Keyword Spotting, Query-by-Example, ConvMixer
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined