Skip to main navigation Skip to search Skip to main content

An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

  • Universidad de Cuenca
  • Universidad Estatal de Milagro

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA. We analyze multiple data sets with varying variability structures (USArrests, Iris, Decathlon2) to show that the correspondence between clusters of variables and principal components depends on the data’s inherent structure. For cases of simple variability, such as USArrests, the clusters group variables with high loadings into specific components, facilitating a clear interpretation. In more complex structures, such as Iris and Decathlon2, the relationship is a bit fuzzy, since even though Kmeans clustering of variables on the transposed data still provides useful complementary information on the joint behavior of the variables. The method not only enriches the interpretation of PCA by linking principal components to meaningful groups of variables but also, provides a reproducible methodological framework for exploring and understanding variable clustering in multivariate analysis. The proposed method itself becomes a valuable tool for exploratory data analysis and applications with high-dimensional data, facilitating pattern identification, variable selection and feature engineering, contributing to a deeper understanding of complex data sets.

Original languageEnglish
Title of host publication2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
EditorsGaston Lefranc, Claudio Cubillos
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350357363
DOIs
StatePublished - 2025
Event2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025 - Valparaiso, Chile
Duration: 28 Oct 202530 Oct 2025

Publication series

NameProceedings - IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, ChileCon
ISSN (Print)2832-1529
ISSN (Electronic)2832-1537

Conference

Conference2025 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2025
Country/TerritoryChile
CityValparaiso
Period28/10/2530/10/25

Keywords

  • EDA
  • Kmeans
  • PCA
  • Transposed

Fingerprint

Dive into the research topics of 'An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis'. Together they form a unique fingerprint.

Cite this