A novel method for parallel resources acquisition from bilingual web pages

Journal of Computational Information Systems(2013)

引用 0|浏览33
暂无评分
摘要
A new approach is proposed for acquiring parallel resources by expanding seeds (some high-quality parallel sentence pairs) in the same page. Specifically, given a bilingual web page, the method is composed of three challenging tasks: 1) page's content structure is detected for adaptive viewing on its main data region; 2) some high-quality parallel sentence pairs in the main data region are selected as seeds for the next task; 3) wrappers for mining parallel resources are learnt by expanding the seeds. We identify and extract more parallel resources in the same page using the learnt wrappers. Our method can be applied to web documents written in any markup language and in any human language. The test is based on 500 bilingual web pages randomly selected from 12 popular English learning web sites, which gets a high F-score of 85.68%. The experimental results show that our method is quite promising. Copyright © 2013 Binary Information Press.
更多
查看译文
关键词
parallel resource,svm,vips,wrapper construction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要