RNN Transducers for Named Entity Recognition with constraints on alignment for understanding medical conversations

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览54
Understanding medical conversations requires detecting entities such as Medications, Symptoms, Treatment, Conditions and Diagnosis, which leads to large ontologies with overlapping spans. Moreover, for ease of adoption by the clinicians, the inference also needs to locate the position of the entities in the conversations. Popular solutions to Named Entity Recognition (NER) such as conditional random fields, sequence-to-sequence models, or the question-answering framework are not suitable for this task. We address this problem by proposing a new model for NER task - an RNN transducer (RNN-T), which has hitherto been used only in speech recognition. These models are trained using paired input and output sequences without explicitly specifying the alignment between them, similar to other seq-to-seq models. RNN-T models learn the alignment using a loss function that sums over all alignments. In NER tasks, however, the alignment between words and target labels are available from the human annotations. We propose a fixed alignment RNN-T model that utilizes the given alignment, while preserving the benefits of RNN-Ts such as modeling output dependencies. As a more general case, we also propose a constrained alignment model where users can specify a relaxation of the given input alignment and the model will learn an alignment within the given constraints. In other words, we propose a family of seq-to-seq models which can leverage alignments between input and target sequences when available. Through empirical experiments on a challenging real-world medical NER task with multiple nested ontologies, we demonstrate that our fixed alignment model outperforms the standard RNN-T model, improving F1-score from 0.70 to 0.74.
AI 理解论文