Better Arabic parsing: baselines, evaluations, and analysis

COLING(2010)

引用 229|浏览33
暂无评分
摘要
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other tree-banks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2--5% F1.
更多
查看译文
关键词
different parsers,existing parsing literature,broad insight,better arabic parsing,gold segmentation,penn arabic treebank,arabic constituency,annotation choice,annotation consistency,application setting,better model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要