Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender.

CoNLL '09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning(2009)

引用 4|浏览23
暂无评分
摘要
English pronouns like he and they reliably reflect the gender and number of the entities to which they refer. Pronoun resolution systems can use this fact to filter noun candidates that do not agree with the pronoun gender. Indeed, broad-coverage models of noun gender have proved to be the most important source of world knowledge in automatic pronoun resolution systems. Previous approaches predict gender by counting the co-occurrence of nouns with pronouns of each gender class. While this provides useful statistics for frequent nouns, many infrequent nouns cannot be classified using this method. Rather than using co-occurrence information directly, we use it to automatically annotate training examples for a large-scale discriminative gender model. Our model collectively classifies all occurrences of a noun in a document using a wide variety of contextual, morphological, and categorical gender features. By leveraging large volumes of un-labeled data, our full semi-supervised system reduces error by 50% over the existing state-of-the-art in gender classification.
更多
查看译文
关键词
categorical gender feature,gender class,gender classification,large-scale discriminative gender model,noun gender,pronoun gender,English pronoun,automatic pronoun resolution system,frequent noun,infrequent noun,English noun gender,semi-supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要