Personalized Audio Quality Preference Prediction
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)
摘要
This paper proposes to use both audio input and subject information to predict the personalized preference of two audio segments with the same content in different qualities. A siamese network is used to compare two inputs and predict the preference. Several different structures for each side of the siamese network are investigated. The baseline structure which uses only audio information involves using a pretrained audio encoder followed by fully connected layers. In several different proposed structures, the approach of concatenating subject information with audio embedding before feeding it into fully connected layers outperforms the baseline model the most, resulting in an increase in overall accuracy from 77.56% to 78.04%. Experimental results also demonstrate that utilizing the complete set of subject information, which includes age, gender, and headphone/earphone specifications such as impedance, frequency response range, and sensitivity, is more effective than using a subset of this information.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要