Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018.
There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal synchronization. We demonstrate that a calibrated curriculum learning scheme, a careful choice of negative exampl...More
Full Text (Upload PDF)
PPT (Upload PPT)