VisualBERT: A Simple and Performant Baseline for Vision and Language

Cited by: 58|Bibtex|Views43|Links

Abstract:

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objecti...More

Code:

Data:

Your rating :
0

 

Tags
Comments