The Coronavirus Corpus Design, construction, and use

INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS(2021)

引用 7|浏览0
暂无评分
摘要
This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May-June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently about 12.1 billion words in size and which grows by about two billion words each year. These two corpora are updated every night, with about 6-10 million words for NOW and 2-3 million words for the Coronavirus Corpus. The Coronavirus Corpus allows users to see the frequency of words and phrases over time (even by individual day), and users can find all words that are more frequent in one time period than another. Users can also see the collocates for words and phrases, and compare the collocates to see what is being said about particular topics over time.
更多
查看译文
关键词
corpus design, NOW corpus, text archive, Coronavirus, COVID-19
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要