Cross-domain Chinese Word Segmentation Based on New Word Discovery

Zhang Jun, Lai Zhipeng, Li Xue,Ning Gengxin,Yang Cui

Journal of Electronics & Information Technology(2022)

引用 0|浏览9
暂无评分
摘要
Deep Neural Network (DNN) is the major method in current Chinese word segmentation. However, its performance is significantly degraded when the network trained for one domain is used in other domains due to the Out Of Vocabulary (OOV) words and expression gaps. In this paper, a cross domain Chinese word segmentation system based on new word discovery is built to handle the OOV word and expression gap problems. An unsupervised new word discovery algorithm based on vector enhanced mutual information and weighted adjacency entropy, and a Chinese word segmentation model based on adversarial training are also proposed to improve the performance of the baseline system. Experimental results show that the proposed method is superior to the conventional methods in the OOV rates, precisions, recalls and F-scores.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要