Automatic Disfluency Detection from Untranscribed Speech.
CoRR(2023)
摘要
Speech disfluencies, such as filled pauses or repetitions, are disruptions in
the typical flow of speech. Stuttering is a speech disorder characterized by a
high rate of disfluencies, but all individuals speak with some disfluencies and
the rates of disfluencies may by increased by factors such as cognitive load.
Clinically, automatic disfluency detection may help in treatment planning for
individuals who stutter. Outside of the clinic, automatic disfluency detection
may serve as a pre-processing step to improve natural language understanding in
downstream applications. With this wide range of applications in mind, we
investigate language, acoustic, and multimodal methods for frame-level
automatic disfluency detection and categorization. Each of these methods relies
on audio as an input. First, we evaluate several automatic speech recognition
(ASR) systems in terms of their ability to transcribe disfluencies, measured
using disfluency error rates. We then use these ASR transcripts as input to a
language-based disfluency detection model. We find that disfluency detection
performance is largely limited by the quality of transcripts and alignments. We
find that an acoustic-based approach that does not require transcription as an
intermediate step outperforms the ASR language approach. Finally, we present
multimodal architectures which we find improve disfluency detection performance
over the unimodal approaches. Ultimately, this work introduces novel approaches
for automatic frame-level disfluency and categorization. In the long term, this
will help researchers incorporate automatic disfluency detection into a range
of applications.
更多查看译文
关键词
untranscribed speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要