Closed yesterday and closed minds: asking the right questions of the corpus to distinguish thematic from sentential relations

COLING '92: Proceedings of the 14th conference on Computational linguistics - Volume 4(1992)

引用 11|浏览0
暂无评分
摘要
Collocation-based tagging and bracketing programs have attained promising results. Yet, they have not arrived at the stage where they could be used as pre-processors for full-fledged parsing. Accuracy, is still not high enough.To improve accuracy, it is necessary to investigate the points where statistical data is being misinterpreted, leading to incorrect results.In this paper we investigate inaccuracy which is injected when a pre-pocessor relies solely on collocations and blurs the distinction between two separate relations: thematic relations and sentential relations.Thematic relations are word paris, not necessarily adjacent, (e.g., adjourn a meeting) that encode information at the concept level. Sentential relations, on the other hand, concern adjacent word pairs that form a noun group. E.g., preferred stock is a noun group that must be identified as such at the syntactic level.Blurring the difference between these two phenomena contributes to errors in tagging of pairs such as expressed concerns, a verb-noun construct, as opposed to preferred stocks, an adjective-noun construct. Although both relations are manifested in the corpus as high mutual-information collocations, they possess different properties and they need to be separated.In our method, we distinguish between these two cases by asking additional questions of the corpus. By definition, thematic relations take on further variations in the corpus. Expressed concerns (a thematic relation) takes concerns expressed, expressing concerns, express his concerns etc. On the other hand, preferred stock (a sentential relation) does not take any such syntactic variations.We show how this method impacts preprocessing and parsing, and we provide empirical results based on the analysis of an 80-million word corpus.
更多
查看译文
关键词
concerns etc.,noun group,preferred stock,thematic relation,right question,concept level,collocation-based tagging,concern adjacent word pair,sentential relation,closed mind,word paris,80-million word corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要