Language Geometry Using Random Indexing

QUANTUM INTERACTION, QI 2016(2017)

引用 51|浏览4
暂无评分
摘要
Random Indexing is a simple implementation of Random Projections with a wide range of applications. It can solve a variety of problems with good accuracy without introducing much complexity. Here we demonstrate its use for identifying the language of text samples, based on a novel method of encoding letter N-grams into high-dimensional Language Vectors. Further, we show that the method is easily implemented and requires little computational power and space. As proof of the method's statistical validity, we show its success in a language-recognition task. On a difficult data set of 21,000 short sentences from 21 different languages, we achieve 97.4% accuracy, comparable to state-of-the-art methods.
更多
查看译文
关键词
N-gram vector, Language profile, Vector symbolic architecture, Multiply-Add-Permute Algebra
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要