An Adaptive Wordpiece Language Model For Learning Chinese Word Embeddings

2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE)(2019)

引用 6|浏览17
暂无评分
摘要
Word representations are crucial for many nature language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.
更多
查看译文
关键词
word similarity,word analogy,learning Chinese word embeddings,contextual information,Chinese word representation,semantical information extraction,morphological information,adaptive wordpiece language model,natural language processing tasks,stroke n-grams,text classification,question answering,AWLM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要