Integrating transformer-based learning and Sentinel-2 bare soil composites for soil organic carbon mapping in the black soil region of Northeast China

4 views
Skip to first unread message

Geoengineering News

unread,
Jan 6, 2026, 6:04:27 PM (3 days ago) Jan 6
to CarbonDiox...@googlegroups.com
https://www.nature.com/articles/s41598-025-33682-4
Authors: Na Chen, Zhikang Wei, Xuancheng Jin, Nan Lin, Fan Yang, Ling Zhao & Song Wu

05 January 2026

Abstract
Accurate assessment of soil organic carbon (SOC) is essential for sustainable cropland management and carbon sequestration monitoring. However, high-resolution SOC mapping remains challenging due to two persistent limitations: (1) the difficulty of extracting true bare-soil reflectance—especially when single-date imagery is used and spectral signals remain influenced by vegetation, residue, and soil moisture; and (2) reliance on models that require large training datasets and may underperform in typical small-sample soil survey settings. To address these challenges, we developed an approach that integrates multi-temporal Sentinel-2 bare-soil composites with a transformer-based foundation model—Tabular Prior-data Fitted Network (TabPFN)—for SOC prediction in the black soil region of Northeast China. Bare soil pixels were extracted using a Normalized Difference Vegetation Index threshold (0.1–0.4), and two compositing strategies—the 50th percentile (P50) and 90th percentile (P90)—were compared. We systematically evaluated three advanced algorithms: TabPFN, convolutional neural network (CNN), and Extreme Gradient Boosting (XGBoost). Results demonstrated that the TabPFN model coupled with P50 composites achieved the highest prediction accuracy (R2 = 0.78, RMSE = 1.90 g kg⁻1), outperforming CNN and XGBoost by 4–6%. TabPFN’s distinct advantage lies in its design as a prior-data fitted transformer, which enables robust generalization from limited samples (N = 174) without extensive hyperparameter tuning, effectively addressing the “small data” challenge pervasive in digital soil mapping. SHapley Additive exPlanations analysis indicated that shortwave infrared band (B12) and precipitation have the greatest effect on model output, indicating joint role of soil spectral response and climate variability. This is one of the first studies to apply the TabPFN architecture to SOC estimation, offering a novel, interpretable, and scalable workflow that bridges the gap between data scarcity and model complexity. The proposed framework provides a reliable tool for high-resolution SOC mapping in heterogeneous croplands, supporting precision agriculture and long-term carbon accounting initiatives.

Source: Scientific Reports‎
Reply all
Reply to author
Forward
0 new messages