langid.py: An Off-the-shelf Language Identification Tool.

ACL '12: Proceedings of the ACL 2012 System Demonstrations(2012)

引用 715|浏览467
暂无评分
摘要
We present langid.py, an off-the-shelf language identification tool. We discuss the design and implementation of langid.py, and provide an empirical comparison on 5 long-document datasets, and 2 datasets from the microblog domain. We find that langid.py maintains consistently high accuracy across all domains, making it ideal for end-users that require language identification without wanting to invest in preparation of in-domain training data.
更多
查看译文
关键词
present langid,language identification,long-document datasets,off-the-shelf language identification tool,empirical comparison,high accuracy,in-domain training data,microblog domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要