TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
CoRR(2024)
摘要
Confidence estimation of predictions from an End-to-End (E2E) Automatic
Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks.
Class-probability-based confidence scores do not accurately represent the
quality of overconfident ASR predictions. An ancillary Confidence Estimation
Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use
binary target scores for CEM training. However, the binary labels do not reveal
the granular information of predicted words, such as temporal alignment between
reference and hypothesis and whether the predicted word is entirely incorrect
or contains spelling errors. Addressing this issue, we propose a novel
Temporal-Lexeme Similarity (TeLeS) confidence score to train CEM. To address
the data imbalance of target scores while training CEM, we use shrinkage loss
to focus on hard-to-learn data points and minimise the impact of easily learned
data points. We conduct experiments with ASR models trained in three languages,
namely Hindi, Tamil, and Kannada, with varying training data sizes. Experiments
show that TeLeS generalises well across domains. To demonstrate the
applicability of the proposed method, we formulate a TeLeS-based Acquisition
(TeLeS-A) function for sampling uncertainty in active learning. We observe a
significant reduction in the Word Error Rate (WER) as compared to SOTA methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要