Towards interpretable machine learning for observational quantification of soil heavy metal concentrations under environmental constraints

Heavy metals, unlike organic pollutants, are non-biodegradable and bio-accumulative. Soil contamination occurs when concentrations surpass self-purification limitations, resulting in prolonged effects on terrestrial ecosystems (Nriagu, 1996). Increasingly jumping demands for agricultural land and global industries are leading to significant dispersion of specific elements at local and regional scales, putting the sustainability of effectively monitoring metal concentration changes using traditional field surveys at risk (Hou and Ok, 2019; Schmidt-Traub et al., 2017). As a result, vast regions of the globe remain unexplored (Smith et al., 2016). The scientific community has recognized the necessity for global multi-scale soil quantitative management to guide conservation actions, and this requires satellite (Lara-Alvarez et al., 2024; Wang et al., 2023; Xi et al., 2023; Burke et al., 2021; Feng et al., 2023).

Quantitative practices based on hyperspectral data primarily involve feature extraction and modelling. Feature extraction aims to maximize the extraction of sensitive spectral features from hyperspectral data. Wavelet analysis is an excellent approach for spectral analysis, capable of mining much and subtle information across multiple decomposition scales (Meng et al., 2020; Rossel and Behrens, 2010). Discrete Wavelet Packet Transform (DWPT) is designed to appropriately extend spectral information by considering the specific properties of each metal, rather than applying the same spectral features to all metals. Although wavelet processing can theoretically capture the main expressive features of the spectral set, the complexity of the dataset results in models with variable attributes. Although some excellent dimensionality reduction algorithms have proven to be efficient in filtering features, the lack of interpretability of their outputs remains the most significant restriction (Sun et al., 2023). Recursive feature elimination (RFE), also known as an efficient backward selection wrapper algorithm, allows a specific underlying algorithm to determine a robust ordering of features in an iterative manner. This gives RFE a distinct advantage over other feature selection methods (Poggio et al., 2021).

Modelling is advancing through improved machine learning (ML) methodologies. Artificial intelligence (AI) modelling has successfully been developed for the prediction of heavy metals. Most notably, extreme gradient boosting (XGB) has emerged as the leading algorithm among numerous ML evaluations, due to its computational performance, ability to handle high-dimensional data, and robustness against overfitting (Xu et al., 2023). It has demonstrated remarkable efficacy across various domains and has previously proven successful in the quantification of soil properties (Liu et al., 2021; Sun et al., 2023). However, the black-box nature of highly predictive ML models often hinders their application in decision-making. To tackle this issue, interpretable ML has recently been introduced to understand the underlying mechanisms involved in AI-based modelling (Murdoch et al., 2019; Zhong et al., 2022).

Interpretability, a term that is more frequently applied in the public field, is not a novel concept in ML. The scientific community has a growing focus on “interpretable AI or ML” (Miller, 2019). Consequently, to enhance quantifications, features can be combined with interpretable ML to improve the interpretation of uncertainty in data and accuracy requirements of observational models (Gevaert, 2022). It is of primary importance to comprehend how models make certain predictions within the field of observational quantification (Roscher et al., 2020). A notable application is the utilisation of SHapley Additive exPlanations (SHAP) in multiple domains, which facilitates the identification of features essential for tree ensemble algorithms to produce estimations (Lundberg et al., 2020). Recent practices in computer science suggest that SHAP can provide a consistent measure of feature relevance across model tuning elements and offer local interpretability for links with environmental features. It can also be used to gain insights into identifying essential features for predictive algorithms, through both global and local explanations (Abdulalim Alabdullah et al., 2022; Huang et al., 2023).

To date, several studies probing satellite observations for heavy metals modelling have provided quantifications based on associations with spectrally active components (Malmir et al., 2019; Zhang et al., 2018). This can be attributed to the limited spectral response, particularly at low and medium concentrations. Bioavailability, a prerequisite for metal toxicity, is strongly influenced by soil pH and organic carbon (OC) (Semple et al., 2004). As these components interact with metals, they may play a role in estimations but also contribute significant uncertainties to the spectral response. A recent study revealed that, despite strong correlations between spectrally active elements in soil sediments and trace metal levels, the accuracy of inversion models did not improve (Zhao et al., 2022). Indeed, soil ecosystems are inherently very heterogeneous at local scales, particularly regarding pH and organic matter, which has been widely demonstrated to exhibit spectral activity with strong local interdependencies (Guerra et al., 2020). Therefore, a comprehensive assessment of the spectral behaviour of various heavy metals across multiple gradients under environmental constraints is needed to facilitate the application of spaceborne hyperspectral remote sensing. An interpretable prediction path is a prerequisite for this.

To conclude, two aspects of quantitative scientific methodologies concerning soil heavy metal concentrations require attention: interpretable ML is rarely used to reveal the modelling process, and the effectiveness of interpretable models in enhancing prediction and identifying potential feature interactions between spectra and environments has yet to be fully utilised.

In order to bridge knowledge gaps, we present an interpretable machine learning strategy that integrates DWPT-extended onboard hyperspectral data and environmental data (pH and OC). This approach utilises XGBoost with Recursive Feature Elimination (RFE) for model development and SHAP for analytical interpretation to predict and explain the concentrations of heavy metals such as chromium (Cr) and cadmium (Cd). This study aims to: (i) evaluate spectral predictability for metal concentrations across multiple gradients, (ii) establish best practices for soil element concentration quantification under environmental constraints, (iii) explore the interpretability of machine learning for heavy metal prediction across different concentration gradients and within the range of soil environmental factors.

#interpretable #machine #learning #observational #quantification #soil #heavy #metal #concentrations #environmental #constraints

source: https://www.sciencedirect.com/science/article/pii/S0048969724020746?dgcid=rss_sd_all

Reverdict

Subscribe to newsletter

Digest

Dynamics

Don't Mis

Featured

Future

Talkies

Spotlight

Titbids

Trends

Digest

Dynamics

Don't Mis

Featured

Future

Talkies

Spotlight

Titbids

Trends

Top 5 This Week

Related Posts

Towards interpretable machine learning for observational quantification of soil heavy metal concentrations under environmental constraints

LEAVE A REPLY Cancel reply

Popular Articles

reVerdict

About us

Latest Articles

Most Popular

Subscribe