Measuring Lexical Semantic Variation using Word Embeddings 1
semanticscholar(2019)
摘要
This paper discusses an approach to an unsupervised study of lexical semantic variation across languages, dialects, and linguistic variants that is based on a comparison of Distributed Semantics models of lexical items. To achieve this I am using word vectors and embeddings trained on large corpora. My focus in this article is on the South-Slavic languages and variants, Bosnian, Croatian, and Serbian, and taking into account text, corpora, and language models that are explicitly written in Serbo-Croatian. Our focus here is to quantify the lexical overlap and the semantic fields or properties of the lexical items, using a purely unsupervised empirical study based on language use data. There is a long history of studies related to similarities and dissimilarities between the languages of the Balkans, which I will ignore here entirely. The notion of language as a potentially defining feature for
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要