Training And Evaluating A Statistical Part-Of-Speech Tagger For Natural Language Applications Using Kepler Workflows

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012(2012)

引用 2|浏览48
暂无评分
摘要
A core technology of natural language processing (NLP) incorporated into many text processing applications is a part-of-speech (POS) tagger, a software component that labels words in text with syntactic tags such as noun, verb, adjective, etc. These tags may then be used within more complex task such as parsing, question-answering, and machine translation (MT). In this paper we describe the phases of our work training and evaluating statistical POS taggers on Arabic texts and their English translations using Kepler workflows. While the original objectives for encapsulating our research code within Kepler workflows were driven by software engineering needs to document and verify the re-usability or our software, our research benefitted as well: the ease of rapid retraining and testing enabled our research to detect reporting discrepancies, document their source, independently validating the correct results.
更多
查看译文
关键词
natural language processing, part-of-speech tagging, computational linguistics, parallel corpora, machine translation, Arabic NLP, Penn treebank
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要