Better Arabic parsing: baselines, evaluations, and analysis
COLING(2010)
摘要
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other tree-banks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2--5% F1.
更多查看译文
关键词
different parsers,existing parsing literature,broad insight,better arabic parsing,gold segmentation,penn arabic treebank,arabic constituency,annotation choice,annotation consistency,application setting,better model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要