Unsupervised learning of field segmentation models for information extraction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics(2005)

引用 92|浏览1
暂无评分
摘要
The applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain field structured extraction tasks, such as classified advertisements and bibliographic citations, small amounts of prior knowledge can be used to learn effective models in a primarily unsupervised fashion. Although hidden Markov models (HMMs) provide a suitable generative model for field structured text, general unsupervised HMM learning fails to learn useful structure in either of our domains. However, one can dramatically improve the quality of the learned structure by exploiting simple prior knowledge of the desired solutions. In both domains, we found that unsupervised methods can attain accuracies with 400 unlabeled examples comparable to those attained by supervised methods on 50 labeled examples, and that semi-supervised methods can make good use of small amounts of labeled data.
更多
查看译文
关键词
unsupervised learning,supervised method,small amount,current information extraction technique,certain field,unsupervised fashion,simple prior knowledge,extraction task,unsupervised method,field segmentation model,prior knowledge,general unsupervised hmm learning,information extraction,hidden markov model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要