Word segmentation from transcriptions of child-directed speech using lexical and sub-lexical cues.

Zébulon Goriely,Andrew Caines,Paula Buttery

Journal of child language(2023)

引用 0|浏览15
暂无评分
摘要
We compare two frameworks for the segmentation of words in child-directed speech, PHOCUS and MULTICUE. PHOCUS is driven by lexical recognition, whereas MULTICUE combines sub-lexical properties to make boundary decisions, representing differing views of speech processing. We replicate these frameworks, perform novel benchmarking and confirm that both achieve competitive results. We develop a new framework for segmentation, the DYnamic Programming MULTIple-cue framework (DYMULTI), which combines the strengths of PHOCUS and MULTICUE by considering both sub-lexical and lexical cues when making boundary decisions. DYMULTI achieves state-of-the-art results and outperforms PHOCUS and MULTICUE on 15 of 26 languages in a cross-lingual experiment. As a model built on psycholinguistic principles, this validates DYMULTI as a robust model for speech segmentation and a contribution to the understanding of language acquisition.
更多
查看译文
关键词
transcriptions,speech,segmentation,child-directed,sub-lexical
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要