TY - JOUR
T1 - Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm
AU - Sotomayor, Gonzalo
AU - Hampel, Henrietta
AU - Vázquez, Raúl F.
N1 - Publisher Copyright:
© 2017 Elsevier Ltd
PY - 2018/3/1
Y1 - 2018/3/1
N2 - A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010–2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Caliński-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach.
AB - A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010–2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Caliński-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach.
KW - Genetic algorithm
KW - Land cover
KW - Pattern recognition
KW - Water quality
UR - https://www.scopus.com/pages/publications/85038026385
U2 - 10.1016/j.watres.2017.12.010
DO - 10.1016/j.watres.2017.12.010
M3 - Artículo
C2 - 29248805
AN - SCOPUS:85038026385
SN - 0043-1354
VL - 130
SP - 353
EP - 362
JO - Water Research
JF - Water Research
ER -