Measuring Lexical Semantic Variation using Word Embeddings 1

semanticscholar(2019)

引用 0|浏览0
暂无评分
摘要
This paper discusses an approach to an unsupervised study of lexical semantic variation across languages, dialects, and linguistic variants that is based on a comparison of Distributed Semantics models of lexical items. To achieve this I am using word vectors and embeddings trained on large corpora. My focus in this article is on the South-Slavic languages and variants, Bosnian, Croatian, and Serbian, and taking into account text, corpora, and language models that are explicitly written in Serbo-Croatian. Our focus here is to quantify the lexical overlap and the semantic fields or properties of the lexical items, using a purely unsupervised empirical study based on language use data. There is a long history of studies related to similarities and dissimilarities between the languages of the Balkans, which I will ignore here entirely. The notion of language as a potentially defining feature for
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要