Visualizing The Finer Cluster Structure Of Large-Scale And High-Dimensional Data

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III(2021)

引用 1|浏览11
暂无评分
摘要
Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases with high dimensions in data science. A successful dimension reduction and visualization method seeks to produce a low-dimensional representation of high-dimensional data that preserves both the global and local structure of the data. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional space. In particular, a single parameter v is introduced to the generalized sigmoid function in low-dimensional space, so that we can adjust the slope and the heaviness of the function tail by changing the value of the parameter easily. Using real-world data sets with different sample sizes and dimensions, we show that our proposed method can generate visualization results that are competitive with those of the state-of-the-art methods, such as uniform manifold approximation and projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and related methods. In addition, by adjusting the value of v, our proposed method can preserve more of both the global and finer cluster structure of the data. Furthermore, like UMAP, our proposed method can easily scale to massive high-dimensional data. Finally, we use domain knowledge to demonstrate that the finer subclusters that are revealed with small values of v are meaningful.
更多
查看译文
关键词
Data visualization, Manifold learning, Nonlinear dimension reduction, Cluster structure, Generalized sigmoid function
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要