Outlier Detection in Sparse Data with Factorization Machines.

CIKM(2017)

引用 10|浏览54
暂无评分
摘要
In sparse data, a large fraction of the entries take on zero values. Some examples of sparse data include short text snippets (such as tweets in Twitter) or some feature representations of categorical data sets with a large number of values, in which traditional methods for outlier detection typically fail because of the difficulty of computing distances. To address this, it is important to use the latent relations between such values. Factorization machines represent a natural methodology for this, and are naturally designed for the massive-domain setting because of their emphasis on sparse data sets. In this study, we propose an outlier detection approach for sparse data with factorization machines. Factorization machines are also efficient due to their linear complexity in the number of non-zero values. In fact, because of their efficiency, they can even be extended to traditional settings for numerical data by an appropriate feature engineering effort. We show that our approach is both effective and efficient for sparse categorical, short text and numerical data by an extensive experimental study.
更多
查看译文
关键词
outlier detection, sparse data, factorization machines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要