An Analysis of the t-SNE Algorithm for Data Visualization.

COLT(2018)

引用 101|浏览157
暂无评分
摘要
A first line of attack in exploratory data analysis is _data visualization_, i.e., generating a 2-dimensional representation of data that makes _clusters_ of similar points visually identifiable. Standard JL dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications.This work gives a formal framework for the problem of _data visualization_ -- finding a 2 or 3-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the ``ground-truthu0027u0027 clustering (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations. We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of arbitrary well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in _partially_ recovering cluster structure even when the above deterministic condition is not met.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要