Automatic Creation of Domain Templates

ACL(2006)

引用 66|浏览99
暂无评分
摘要
Recently, many Natural Language Processing (NLP) applications have improved the quality of their output by using various machine learning tech- niques to mine Information Extraction (IE) patterns for capturing information from the input text. Cur- rently, to mine IE patterns one should know in ad- vance the type of the information that should be captured by these patterns. In this work we pro- pose a novel methodology for corpus analysis based on cross-examination of several document collec- tions representing different instances of the same domain. We show that this methodology can be used for automatic domain template creation. As the problem of automatic domain template creation is rather new, there is no well-defined procedure for the evaluation of the domain template quality. Thus, we propose a methodology for identifying what in- formation should be present in the template. Using this information we evaluate the automatically cre- ated domain templates through the text snippets re- trieved according to the created templates.
更多
查看译文
关键词
text snippet,domain template quality,novel methodology,input text,corpus analysis,domain template,automatic domain template creation,automatic creation,information extraction,ie pattern,natural language processing,computer science,information technology,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要