Developing effective clustering and statistical methods for high-dimensional sparse
data presents unique challenges compared to traditional low-dimensional data. To address
this, a novel approach is proposed, leveraging fuzzy data principles to enhance the
clustering and statistical performance of high-dimensional sparse datasets. The method
builds upon the fuzzy C-means clustering algorithm, introducing key modifications
for better suitability to high-dimensional sparse data. One crucial enhancement involves
tackling the local optimization problem by optimizing the initial clustering center,
significantly reducing clustering statistical time. Replacing the original Euclidean
distance with cosine distance improves the clustering and statistical performance
of high-dimensional sparse data. Experimental results have shown that this method
has superior clustering statistical performance when the data dimensions are different.
When the data dimension is low, and the blocking ratio is 10%, the clustering statistical
effect is optimal. When the data dimension is high, and the blocking ratio is 40%,
the clustering statistical effect is optimal. This method has higher hit rates and
clustering statistical efficiency at different sparsity levels.