Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction.

IEEE ACM Trans. Audio Speech Lang. Process.(2023)

引用 0|浏览27
暂无评分
摘要
The word embedding models such as Word2vec and FastText simultaneously learn dual representations of input vectors and output vectors. In contrast, almost all existing unsupervised bilingual lexicon induction (UBLI) methods use only input vectors without utilizing output vectors. In this article, we propose a novel approach to making full use of both input and output vectors for more robust and strong UBLI. We discover the Common Difference Property that one orthogonal transformation can connect not only the input vectors of two languages but also the output vectors. Therefore, we can learn just one transformation to induce two different dictionaries from the input and output vectors, respectively. Between these two quite different dictionaries, a more accurate lexicon with less noise can be induced by taking the intersection of them in UBLI procedure. Extensive experiments show that our method achieves much more robust and strong results than state-of-the-art methods in distant language pairs, while reserving comparable performances in similar language pairs.
更多
查看译文
关键词
Dictionaries, Linear programming, Training, Task analysis, Speech processing, Mathematical models, Transforms, Word embedding, input vector, Index Terms, output vector, unsupervised bilingual lexicon induction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要