Sensory Media Association through Reciprocating Training

2019 IEEE International Symposium on Multimedia (ISM)(2019)

引用 0|浏览8
暂无评分
摘要
Machine learning achieved great progress in recent years. However, state-of-the-art machine learning systems are still far behind biological learning systems on learning directly from sensors without offline labeling. This paper proposes an approach for automating machine learning from multi-modal sensors. In this learning setup, the system has no access to any human labeling tool which is not available to a biological learning system such as a dog or a newborn baby. We tested the learning proposal with audiovisual data. The testing system contains two deep autoencoders, one for learning speech representations and another for learning image representations. Two deep networks are trained to bridge the latent spaces of two autoencoders, yielding representation mappings for both speech-to-image and image-to-speech. To improve feature clustering in both latent spaces, the system alternately uses one modality to guide the learning of another modality. Different from traditional technology that uses a fixed modality for supervision (e.g. using text labels for image classification), the proposed approach facilitates a machine to learn from sensory inputs of two or more modalities through alternating guidance among these modalities. We evaluate the proposed model with MNIST digit images and corresponding digit speeches in the Google Command Digit Dataset (GCDD) and got very promising results.
更多
查看译文
关键词
machine learning,biological learning,learning directly from sensors,supervised learning,unsupervised learning,IoT learning,learning without labels,learning with privacy,streaming data learning,robotic learning,learning like a human
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要