WeChat Mini Program
Old Version Features

Variable Feature Weighted Fuzzy K-Means Algorithm for High Dimensional Data

Multimedia Tools and Applications(2025)CCF CSCI 4区

Indian Institute of Technology

Cited 4|Views16
Abstract
This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement— In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods.
More
Translated text
Key words
k-means,Fuzzy k-means,Cluster validation,Sparse data,Fuzzy entropy
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
R WILSON,M SPANN
1990

被引用1080 | 浏览

JZX Huang, MK Ng,HQ Rong,ZC Li
2005

被引用1010 | 浏览

EH RUSPINI
1969

被引用1435 | 浏览

J. C. Dunn†
1974

被引用2094 | 浏览

Javed Aslam, Alain Leblan
2000

被引用914 | 浏览

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种基于熵的变量特征加权模糊K均值算法,用于高维数据的子空间聚类,通过引入两个熵项调整目标函数,实现了特征权重的动态调整,优化了聚类效果。

方法】:通过修改模糊K均值的目标函数,加入两个熵项,第一个熵项用于最小化簇内分散度和最大化负熵以确定簇的关联性,第二个熵项用于控制特征权重,以实现数据的更优划分。

实验】:作者在多个数据集上测试了所提算法,并与现有先进方法进行了比较,证明了算法的有效性。具体数据集名称未在摘要中提及。