Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

  • Universidad de Cuenca
  • Universidad Estatal de Milagro

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA. We analyze multiple data sets with varying variability structures (USArrests, Iris, Decathlon2) to show that the correspondence between clusters of variables and principal components depends on the data’s inherent structure. For cases of simple variability, such as USArrests, the clusters group variables with high loadings into specific components, facilitating a clear interpretation. In more complex structures, such as Iris and Decathlon2, the relationship is a bit fuzzy, since even though Kmeans clustering of variables on the transposed data still provides useful complementary information on the joint behavior of the variables. The method not only enriches the interpretation of PCA by linking principal components to meaningful groups of variables but also, provides a reproducible methodological framework for exploring and understanding variable clustering in multivariate analysis. The proposed method itself becomes a valuable tool for exploratory data analysis and applications with high-dimensional data, facilitating pattern identification, variable selection and feature engineering, contributing to a deeper understanding of complex data sets.

Idioma originalInglés
Título de la publicación alojada2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
EditoresGaston Lefranc, Claudio Cubillos
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798350357363
DOI
EstadoPublicada - 2025
Evento2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025 - Valparaiso, Chile
Duración: 28 oct. 202530 oct. 2025

Serie de la publicación

NombreProceedings - IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, ChileCon
ISSN (versión impresa)2832-1529
ISSN (versión digital)2832-1537

Conferencia

Conferencia2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
País/TerritorioChile
CiudadValparaiso
Período28/10/2530/10/25

Huella

Profundice en los temas de investigación de 'An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis'. En conjunto forman una huella única.

Citar esto