TY - GEN
T1 - An Approach to Variable Clustering
T2 - 2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
AU - Saquicela, Victor
AU - Palacio, Kenneth
AU - Chifla, Mario
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA. We analyze multiple data sets with varying variability structures (USArrests, Iris, Decathlon2) to show that the correspondence between clusters of variables and principal components depends on the data’s inherent structure. For cases of simple variability, such as USArrests, the clusters group variables with high loadings into specific components, facilitating a clear interpretation. In more complex structures, such as Iris and Decathlon2, the relationship is a bit fuzzy, since even though Kmeans clustering of variables on the transposed data still provides useful complementary information on the joint behavior of the variables. The method not only enriches the interpretation of PCA by linking principal components to meaningful groups of variables but also, provides a reproducible methodological framework for exploring and understanding variable clustering in multivariate analysis. The proposed method itself becomes a valuable tool for exploratory data analysis and applications with high-dimensional data, facilitating pattern identification, variable selection and feature engineering, contributing to a deeper understanding of complex data sets.
AB - Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA. We analyze multiple data sets with varying variability structures (USArrests, Iris, Decathlon2) to show that the correspondence between clusters of variables and principal components depends on the data’s inherent structure. For cases of simple variability, such as USArrests, the clusters group variables with high loadings into specific components, facilitating a clear interpretation. In more complex structures, such as Iris and Decathlon2, the relationship is a bit fuzzy, since even though Kmeans clustering of variables on the transposed data still provides useful complementary information on the joint behavior of the variables. The method not only enriches the interpretation of PCA by linking principal components to meaningful groups of variables but also, provides a reproducible methodological framework for exploring and understanding variable clustering in multivariate analysis. The proposed method itself becomes a valuable tool for exploratory data analysis and applications with high-dimensional data, facilitating pattern identification, variable selection and feature engineering, contributing to a deeper understanding of complex data sets.
KW - EDA
KW - Kmeans
KW - PCA
KW - Transposed
UR - https://www.scopus.com/pages/publications/105038008695
U2 - 10.1109/CHILECON66915.2025.11476364
DO - 10.1109/CHILECON66915.2025.11476364
M3 - Contribución a la conferencia
AN - SCOPUS:105038008695
T3 - Proceedings - IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, ChileCon
BT - 2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
A2 - Lefranc, Gaston
A2 - Cubillos, Claudio
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 October 2025 through 30 October 2025
ER -