Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop(2010)

引用 23|浏览25
暂无评分
摘要
More and more people are using Web data for linguistic and NLP research. The workshop, the sixth in an annual series, provides a venue for exploring how we can use it effectively and what we will find if we do, with particular attention to • Web corpus collection projects, or modules for one part of the process (crawling, filtering, de-duplication, language-id, tokenising, indexing, . . . ) • characteristics of Web data from a linguistics/NLP perspective including registers, domains, frequency distributions, comparisons between datasets • using crawled Web data for NLP purposes (with emphasis on the data rather than the use) Previous WAC workshops have been in Europe and Africa. The west coast of the US is the global centre for web development, hosting Google, Microsoft, Yahoo and a thousand others, so we are glad to be here!
更多
查看译文
关键词
NLP research,Corpus Workshop,web corpus collection project,Previous WAC workshop,Web data,global centre,NLP perspective,NAACL HLT,frequency distribution,annual series,web development,Sixth Web,NLP purpose
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要