CQT-Based Cepstral Features for Classification of Normal vs. Pathological Infant Cry

IEEE/ACM transactions on audio, speech, and language processing(2023)

引用 0|浏览0
暂无评分
摘要
Infant cry classification is an important area of research that involves analyzing cry to detect and classify between normal vs . pathological cries. However, signal processing based state-of-the-art feature sets, such as Short-Time Fourier Transform (STFT) representations and Mel Frequency Cepstral Coefficients (MFCC), have been earlier reported for this task. Quasi-periodic sampling of the vocal tract spectrum by high pitch source harmonics results in poor spectral resolution in the STFT and hence, these feature sets fail to produce a satisfactory classification performance. Contrary to the linearly-spaced frequency bins, this study proposes to use geometrically-spaced frequency bins employed in the CQT-based features, namely, Constant Q Cepstral Coefficients (CQCC) to systematically emphasize the required fundamental frequency (F0) and its harmonics (kF0, k ∊ Z) for infant cry classification. For a comprehensive evaluation of the proposed feature set, two datasets have been considered in this work, namely, Baby Chilanto and In-House DA-IICT datasets. The performance of the proposed CQCC feature set is compared against state-of-the-art MFCC, Linear Frequency Cepstral Coefficients (LFCC), and Cepstral feature sets. Experiments were performed using 10 -fold cross-validation on two traditional classifiers, namely, Gaussian Mixture Model (GMM) and Support Vector Machine (SVM). Our study finds that better results were obtained using CQCC-GMM architecture with classification accuracies of 99.8% and 98.24% on the Baby Chilanto and In-House DA-IICT datasets, respectively. Further, this work also illustrates the effectiveness of the form-invariance property of the CQT over the traditional narrowband STFTbased spectrogram. Furthermore, this study also presents the effect of parameter tuning and parameter dimension of the feature vector. Furthermore, this study presents the first-ever cross-database and combined dataset scenarios with an overall improvement of 1.59% on the proposed CQCC feature set. Additionally, the robustness of CQCC is evaluated under signal degradation conditions with additive babble noise having various Signal-to-Noise Ratio (SNR) levels on both datasets. Next, the performance of the proposed CQCC was compared with the other feature sets using statistical measures, such as F 1-score, J-statistics, violin plots, and analysis of latency period for the deployment of the practical system. Finally, this study compares the best obtained results of CQCC with the existing studies on the Baby Chilanto dataset.
更多
查看译文
关键词
Infant Cry Classification,Short-Time Fourier Transform,Constant Q Transform,Form-Invariance,Cepstral Features,Baby Chilanto,Cross-Dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要