https://www.nature.com/articles/s41598-026-39496-2
Authors: Azam Jafari, Fereydoon Sarmadian, Ahmad Heidari & Zahra Rasaei
13 February 2026
Abstract
Soil organic carbon (SOC) is a key indicator of soil health, stability, and security, and a major component of global carbon sequestration. Accurate SOC modeling is vital for environmental management and climate change mitigation. Despite machine learning advances, spatial autocorrelation remains a challenge in SOC prediction. This study assesses its impact by comparing four Random Forest (RF) models at 0–30 cm soil depth: (1) classic RF with non-spatial variables and random cross-validation, (2) classic RF with spatial and non-spatial variables and random cross-validation, (3) classic RF with non-spatial variables and spatial cross-validation, and (4) spatial RF with non-spatial variables and random cross-validation. Using 281 soil samples collected over six years in Abyek, Iran, the study integrates spatial and non-spatial predictors, including topographic, climatic, and vegetation indices. Scenario 4 yields the highest predictive accuracy (R2 = 0.86, RMSE = 0.11) and eliminates spatial autocorrelation in residuals (Moran’s I = -0.01, p = 0.82). In contrast, models lacking spatial components (e.g., Scenario 1) show residual clustering and biased predictions (Moran’s I > 0.15, p < 0.001). These results highlight the importance of incorporating spatial dependencies and proper validation to enhance SOC prediction. The study informs digital soil mapping, carbon stock assessment, and land management for sustainable agriculture.
Source: Scientific Reports