Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

Takaki Makino
Takaki Makino
Yannis M. Assael
Yannis M. Assael
Basilio Garcia
Basilio Garcia
Otavio Braga
Otavio Braga

ASRU, pp. 905-912, 2019.

Cited by: 0|Bibtex|Views34|Links
EI

Abstract:

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the development of such a system, we built a large audio-visual (A/V) dataset of segmented utterances extracted from YouTube public videos, leading to 31k hours of audio-visual training co...More

Code:

Data:

Your rating :
0

 

Tags
Comments