Real-world Performance Estimation of Liquid State Machines for Spoken Digit Classification

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN(2023)

引用 1|浏览6
暂无评分
摘要
Liquid State Machine (LSM) is a brain-inspired neural network architecture for solving temporal classification problems like speech recognition. The simple structure of LSM with a reservoir and single-layer classifier is attractive from a hardware implementation perspective. When the LSM is considered for low-power hardware implementation in real-world command word recognition tasks, challenges like nonidealities in sensor filter response and ambient noise become critical concerns. In this work, we evaluate the performance of LSM based on two aspects (1) ambient noise and (2) sensor/preprocessing circuit nonidealities. For Ambient noise, we use additive white gaussian noise (AWGN) and ambient noise using the iNoise Indian Noise dataset that covers various natural indoor, outdoor, and travel-related environmental sounds. To understand the impact of input hardware nonidealities, we analyzed the impact of the audio preprocessing filter's quality factor, order, center frequency variations, and output nonlinearity on LSM performance. We use the spoken digits classification in the TI46 dataset. This paper's findings present design guidelines for the system designers intending to use liquid-state machines for speech classification tasks. In terms of filter design, first, there is a broad Q, order space for filter design where performance is high. We use the hardware-friendly parallel 4th order Butterworth bandpass filter model to provide a baseline 98% accuracy in speech classification tasks. Second, the performance of LSM degrades proportionally to the variation in the center frequency of the bandpass filters in the filter bank. Third, nonlinearity with the third-order harmonic of 50 dBc can be tolerated. Regarding ambient noise, our study shows that a 40 dB SNR for AWGN is sufficient for ideal performance. Second, the best case of "home" noise leads to a performance of 91.4%. Outdoor and travel noise reduce the classification performance to 78.8% and 62.4%, respectively. However, ideal performance is recovered if the signal to noise ratio (SNR) is increased, particularly by 10 dB in indoor conditions and 30 dB in outdoor conditions. Thus, our study presents an engineering evaluation for real-world spoken digit recognition using LSMs.
更多
查看译文
关键词
speech recognition,reservoir computing,Spiking neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要