Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese.
LREC(2018)
摘要
A distributional semantics model - also known as word embeddings - is a major asset for any language as the research results reported in the literature have consistently shown that it is instrumental to improve the performance of a wide range of applications and processing tasks for that language. In this paper, we describe the development of an advanced distributional model for Portuguese, with the largest vocabulary and the best evaluation scores published so far. This model was made possible by resorting to new languages resources we recently developed: to a much larger training corpus than before and to a more sophisticated evaluation supported by new and more fine-grained evaluation tasks and data sets. We also indicate how the new language resource reported on here is being distributed and where it can be obtained for free under a most permissive license.
更多查看译文
关键词
distributional semantics, word embeddings, word analogy, lexical similarity, conceptual categorization, Portuguese
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络