Reassembling Fragmented Entity Names: A Novel Model for Chinese Compound Noun Processing

Yuze Pan,Xiaofeng Fu

Electronics(2023)

引用 0|浏览0
暂无评分
摘要
In the process of classifying intelligent assets, we encountered challenges with a limited dataset dominated by complex compound noun phrases. Training classifiers directly on this dataset posed risks of overfitting and potential misinterpretations due to inherent ambiguities in these phrases. Recognizing the gap in the current literature for tailored methods addressing this challenge, this paper introduces a refined approach for the accurate extraction of entity names from such structures. We leveraged the Chinese pre-trained BERT model combined with an attention mechanism, ensuring precise interpretation of each token's significance. This was followed by employing both a multi-layer perceptron (MLP) and an LSTM-based Sequence Parsing Model, tailored for sequence annotation and rule-based parsing. With the aid of a rule-driven decoder, we reconstructed comprehensive entity names. Our approach adeptly extracts structurally coherent entity names from fragmented compound noun phrases. Experiments on a manually annotated dataset of compound noun phrases demonstrate that our model consistently outperforms rival methodologies. These results compellingly validate our method's superiority in extracting entity names from compound noun phrases.
更多
查看译文
关键词
compound noun phrases, entity name extraction, fragmentation, sequence labeling, sequence parsing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要