TY - JOUR
T1 - Optimisation in machine learning
T2 - An application to topsoil organic stocks prediction in a dry forest ecosystem
AU - Gebauer, Anika
AU - Brito Gómez, Victor M.
AU - Ließ, Mareike
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/11/15
Y1 - 2019/11/15
N2 - Soil organic carbon (SOC) sequestration plays a key role in reducing the atmospheric greenhouse gas concentration. However, dry forest ecosystems in Ecuador are endangered to become a source of carbon emissions because of deforestation. Often spatial information, necessary to quantify potential carbon loss to the atmosphere, is missing. This particularly applies to remote areas of limited accessibility. This study aims to regionalise the SOC stocks of a small and poorly accessible dry forest ecosystem in southwestern Ecuador by using boosted regression tree (BRT) models. Resampling in a nested repeated k-fold cross validation approach was applied to develop robust models for a dataset of 118 samples with limited predictor information. To select an optimal set of model parameters, optimisation by differential evolution (DE) was applied for parameter tuning. Predictor selection was implemented using the same optimisation algorithm. This study demonstrates how the predictive performance of BRT models can be improved by applying an optimisation approach for parameter tuning and predictor selection. Model performance was improved by approximately 40% concerning the R2. Still, the results also demonstrated the difficulties of machine learning applications in small and highly heterogeneous natural areas. Very variable or even random factors were assumed to distort the relationship between predictor and response variables. We assume that the presented approach is particularly successful in the case of a real-valued multivariate space of tuning parameters. However, this requires testing in further machine learning applications and algorithms.
AB - Soil organic carbon (SOC) sequestration plays a key role in reducing the atmospheric greenhouse gas concentration. However, dry forest ecosystems in Ecuador are endangered to become a source of carbon emissions because of deforestation. Often spatial information, necessary to quantify potential carbon loss to the atmosphere, is missing. This particularly applies to remote areas of limited accessibility. This study aims to regionalise the SOC stocks of a small and poorly accessible dry forest ecosystem in southwestern Ecuador by using boosted regression tree (BRT) models. Resampling in a nested repeated k-fold cross validation approach was applied to develop robust models for a dataset of 118 samples with limited predictor information. To select an optimal set of model parameters, optimisation by differential evolution (DE) was applied for parameter tuning. Predictor selection was implemented using the same optimisation algorithm. This study demonstrates how the predictive performance of BRT models can be improved by applying an optimisation approach for parameter tuning and predictor selection. Model performance was improved by approximately 40% concerning the R2. Still, the results also demonstrated the difficulties of machine learning applications in small and highly heterogeneous natural areas. Very variable or even random factors were assumed to distort the relationship between predictor and response variables. We assume that the presented approach is particularly successful in the case of a real-valued multivariate space of tuning parameters. However, this requires testing in further machine learning applications and algorithms.
KW - Cross validation
KW - Differential evolution
KW - Dry forest
KW - Machine learning
KW - Model fitting
KW - Soil organic carbon
UR - https://www.scopus.com/pages/publications/85069648816
U2 - 10.1016/j.geoderma.2019.07.004
DO - 10.1016/j.geoderma.2019.07.004
M3 - Artículo
AN - SCOPUS:85069648816
SN - 0016-7061
VL - 354
JO - Geoderma
JF - Geoderma
M1 - 113846
ER -