A Truly Joint Neural Architecture for Segmentation and Parsing
CoRR(2024)
摘要
Contemporary multilingual dependency parsers can parse a diverse set of
languages, but for Morphologically Rich Languages (MRLs), performance is
attested to be lower than other languages. The key challenge is that, due to
high morphological complexity and ambiguity of the space-delimited input
tokens, the linguistic units that act as nodes in the tree are not known in
advance. Pre-neural dependency parsers for MRLs subscribed to the joint
morpho-syntactic hypothesis, stating that morphological segmentation and
syntactic parsing should be solved jointly, rather than as a pipeline where
segmentation precedes parsing. However, neural state-of-the-art parsers to date
use a strict pipeline. In this paper we introduce a joint neural architecture
where a lattice-based representation preserving all morphological ambiguity of
the input is provided to an arc-factored model, which then solves the
morphological segmentation and syntactic parsing tasks at once. Our experiments
on Hebrew, a rich and highly ambiguous MRL, demonstrate state-of-the-art
performance on parsing, tagging and segmentation of the Hebrew section of UD,
using a single model. This proposed architecture is LLM-based and language
agnostic, providing a solid foundation for MRLs to obtain further performance
improvements and bridge the gap with other languages.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要