Studies on Cross-modal Feature-based Mapping from Voice-source to Texture through Image Association by Listening Speech

JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY(2022)

引用 0|浏览0
暂无评分
摘要
The direct correlations between modality driven parameters of voice-source and texture were investigated. A perceptual experiment was conducted using vowel sounds with three representative phonation differences (modal, creaky and breathy) and texture images annotated with semantic terms. For quantitative analyses, acoustic features measuring vocal fold vibration, periodicity, spectral noise level, fundamental frequency and energy were calculated. Computational textural features containing coarseness, contrast, directionality, busyness, complexity, strength and brightness were extracted. The results showed that the most important feature is the amplitude difference between the first two harmonics (H1-H2). H1-H2 significantly correlates to coarseness, contrast, busyness, complexity, strength and brightness. Harmonic-to-Noise Ratios (HNRs) highly correlate to coarseness, busyness, complexity and strength. Significant correlations were also observed between Cepstral Peak Prominence (CPP) & coarseness, fundamental frequency (F0) & complexity, brightness and energy & strength. These parametric correlations can serve as basic scientific knowledge for cross-modal mapping. (C) 2022 Society for Imaging Science and Technology.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要