MegaLite-2: An Extended Bilingual Comparative Literary Corpus

SAI (1)(2021)

引用 2|浏览1
暂无评分
摘要
In this paper we introduced an extended bilingual version of the literary corpus MegaLite. This new version contains literary documents in Spanish and French. The motivation is to provide to the community a specialized and free linguistic support for different NLP tasks. The creation of this genre of corpus is very important for designing algorithms of Text Generation, Text Classification and Sentiment Analysis. The corpora contain about 6 500 documents: 1 500 in French (MegaLite-Fr) and near of 5 000 in Spanish (MegaLite-Es), all collected manually into the genres narrative, poetry and plays. A shallow linguistic comparison using the Jensen-Shannon divergences is presented and discussed. The MegaLite-2 corpora will be available to the community as a free resource in several suitable formats.
更多
查看译文
关键词
Spanish and French Comparative Literary Corpus, Machine Learning Algorithms, Divergence of probability distribution, Linguistic resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要