TweetMT: A Parallel Microblog Corpus.
LREC(2016)
摘要
We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.
更多查看译文
关键词
Machine Translation,Microblogs,Tweets,Social Media,parallel corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络