Majority or Minority: Data Imbalance Learning Method for Named Entity Recognition
CoRR(2024)
摘要
Data imbalance presents a significant challenge in various machine learning
(ML) tasks, particularly named entity recognition (NER) within natural language
processing (NLP). NER exhibits a data imbalance with a long-tail distribution,
featuring numerous minority classes (i.e., entity classes) and a single
majority class (i.e., O-class). The imbalance leads to the misclassifications
of the entity classes as the O-class. To tackle the imbalance, we propose a
simple and effective learning method, named majority or minority (MoM)
learning. MoM learning incorporates the loss computed only for samples whose
ground truth is the majority class (i.e., the O-class) into the loss of the
conventional ML model. Evaluation experiments on four NER datasets (Japanese
and English) showed that MoM learning improves prediction performance of the
minority classes, without sacrificing the performance of the majority class and
is more effective than widely known and state-of-the-art methods. We also
evaluated MoM learning using frameworks as sequential labeling and machine
reading comprehension, which are commonly used in NER. Furthermore, MoM
learning has achieved consistent performance improvements regardless of
language, model, or framework.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要