Unsupervised Cross-Lingual Sentence Representation Learning via Linguistic Isomorphism

KSEM (2)(2019)

引用 0|浏览62
暂无评分
摘要
Recently, many researches on learning cross-lingual word embeddings without parallel data have achieved success by utilizing word isomorphism among languages. However, unsupervised cross-lingual sentence representation, which aims to learn a unified semantic space without parallel data, has not been well explored. Though many cross-lingual tasks can be solved by learning a unified sentence representation of different languages benefiting from cross-lingual word embeddings, the performance is not competitive with their supervised counterparts. In this paper, we propose a novel framework for unsupervised cross-lingual sentence representation learning by utilizing linguistic isomorphism in both word and sentence level. After generating pseudo-parallel sentence based on the pre-trained cross-lingual word embeddings, the framework iteratively conducts sentence modeling, word embedding tuning and parallel sentences update. Our experiments show that the proposed framework achieves state-of-the-art results in many cross-lingual tasks, as well as improves the quality of cross-lingual word embeddings. The codes and pre-trained encoders will be released upon the paper publishing.
更多
查看译文
关键词
Unsupervised learning, Cross-lingual, Sentence representation, Language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要