JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing.
CoRR(2023)
摘要
Advancements in sign language processing have been hindered by a lack of
sufficient data, impeding progress in recognition, translation, and production
tasks. The absence of comprehensive sign language datasets across the world's
sign languages has widened the gap in this field, resulting in a few sign
languages being studied more than others, making this research area extremely
skewed mostly towards sign languages from high-income countries. In this work
we introduce a new large and highly multilingual dataset for sign language
translation: JWSign. The dataset consists of 2,530 hours of Bible translations
in 98 sign languages, featuring more than 1,500 individual signers. On this
dataset, we report neural machine translation experiments. Apart from bilingual
baseline systems, we also train multilingual systems, including some that take
into account the typological relatedness of signed or spoken languages. Our
experiments highlight that multilingual systems are superior to bilingual
baselines, and that in higher-resource scenarios, clustering language pairs
that are related improves translation quality.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要