MRL Parsing Without Tears: The Case of Hebrew
arxiv(2024)
摘要
Syntactic parsing remains a critical tool for relation extraction and
information extraction, especially in resource-scarce languages where LLMs are
lacking. Yet in morphologically rich languages (MRLs), where parsers need to
identify multiple lexical units in each token, existing systems suffer in
latency and setup complexity. Some use a pipeline to peel away the layers:
first segmentation, then morphology tagging, and then syntax parsing; however,
errors in earlier layers are then propagated forward. Others use a joint
architecture to evaluate all permutations at once; while this improves
accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test
case, we present a new "flipped pipeline": decisions are made directly on the
whole-token units by expert classifiers, each one dedicated to one specific
task. The classifiers are independent of one another, and only at the end do we
synthesize their predictions. This blazingly fast approach sets a new SOTA in
Hebrew POS tagging and dependency parsing, while also reaching near-SOTA
performance on other Hebrew NLP tasks. Because our architecture does not rely
on any language-specific resources, it can serve as a model to develop similar
parsers for other MRLs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要