Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms

Yasunori Ohishi
Yasunori Ohishi
Akisato Kimura
Akisato Kimura
Takahito Kawanishi
Takahito Kawanishi
Kunio Kashino
Kunio Kashino
David Harwath
David Harwath

ICASSP, pp. 4352-4356, 2020.

Cited by: 4|Views7
EI

Abstract:

We propose a trilingual semantic embedding model that associates visual objects in images with segments of speech signals corresponding to spoken words in an unsupervised manner. Unlike the existing models, our model incorporates three different languages, namely, English, Hindi, and Japanese. To build the model, we used the existing Engl...More

Code:

Data:

Get fulltext within 24h
Bibtex
Upload PDF

1.Your uploaded documents will be check within 24h, and coins will be credited to your account.

2.As the current system does not support cash withdrawal, you can add staff WeChat (AMxiaomai) to receive it as a red packet.

3.10 coins will be exchanged for 1 yuan.

?

Upload a single paper

for 5 coins

Wechat's Red Packet
?

Upload 50 articles

for 280 coins

Wechat's Red Packet
?

Upload 200 articles

for 1200 coins

Wechat's Red Packet
?

Upload 500 articles

for 3000 coins

Wechat's Red Packet
?

Upload 1000 articles

for 7000 coins

Wechat's Red Packet
Your rating :
0

 

Tags
Comments