TY - JOUR
T1 - Enhancing soil organic carbon estimation with generative AI and Nix color sensor
AU - Singh, Rachna
AU - De, Mriganka
AU - Banerjee, Rintu
AU - Nayak, Anshuman
AU - Dasgupta, Shubhadip
AU - Das, Ayan
AU - Dey, Subhadip
AU - Biswas, Asim
AU - Weindorf, David C.
AU - Chakraborty, Somsubhra
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Soil organic carbon (SOC) is a key indicator of soil health, yet conventional laboratory assays are labor-intensive and costly. This study investigates a rapid and low-cost alternative by using a handheld Nix Spectro 2 Color Sensor, which captured high-resolution color data from air-dried soil samples. These color parameters were used to predict SOC with four data-driven prediction engines: Random Forest (RF), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost), and an Artificial Neural Network (ANN) and further strengthened them with synthetic data augmentation techniques. A total of 641 surface soil samples collected from six districts in West Bengal, India, were divided into 70% calibration and 30% validation subsets. Synthetic samples were produced using a combination of generative artificial intelligence (AI) techniques [generative adversarial networks (GANs) and Gaussian mixture models (GMM)] and non-parametric/statistical data augmentation methods [k-nearest neighbors (KNN) and bootstrapping] to fill critical gaps in the SOC range (3–14%). Among the baseline models using raw Nix color data, RF achieved the best validation accuracy (R² = 0.71, RMSE = 0.93%). After augmenting the calibration set with 44 GMM-generated samples (3–7% SOC), RF performance rose to R² = 0.77 and RMSE = 0.84%, while bias dropped and coverage across the SOC distribution improved markedly. The incorporation of synthetic data mitigated model bias and enhanced predictive accuracy despite Levene’s test revealing significant variance differences between calibration and validation datasets. The enhanced generalization of the model was attributed to better coverage of the SOC distribution, reducing underrepresented gaps in the dataset. The study highlighted the potential of AI-driven soil monitoring techniques in precision agriculture, demonstrating that integrating the Nix color sensor with synthetic data augmentation, provides a rapid and cost-effective solution for on-site soil assessments. Future research should expand these methodologies to multi-parameter soil assessments, digital soil mapping, and broader applications in sustainable soil management and climate change mitigation.
AB - Soil organic carbon (SOC) is a key indicator of soil health, yet conventional laboratory assays are labor-intensive and costly. This study investigates a rapid and low-cost alternative by using a handheld Nix Spectro 2 Color Sensor, which captured high-resolution color data from air-dried soil samples. These color parameters were used to predict SOC with four data-driven prediction engines: Random Forest (RF), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost), and an Artificial Neural Network (ANN) and further strengthened them with synthetic data augmentation techniques. A total of 641 surface soil samples collected from six districts in West Bengal, India, were divided into 70% calibration and 30% validation subsets. Synthetic samples were produced using a combination of generative artificial intelligence (AI) techniques [generative adversarial networks (GANs) and Gaussian mixture models (GMM)] and non-parametric/statistical data augmentation methods [k-nearest neighbors (KNN) and bootstrapping] to fill critical gaps in the SOC range (3–14%). Among the baseline models using raw Nix color data, RF achieved the best validation accuracy (R² = 0.71, RMSE = 0.93%). After augmenting the calibration set with 44 GMM-generated samples (3–7% SOC), RF performance rose to R² = 0.77 and RMSE = 0.84%, while bias dropped and coverage across the SOC distribution improved markedly. The incorporation of synthetic data mitigated model bias and enhanced predictive accuracy despite Levene’s test revealing significant variance differences between calibration and validation datasets. The enhanced generalization of the model was attributed to better coverage of the SOC distribution, reducing underrepresented gaps in the dataset. The study highlighted the potential of AI-driven soil monitoring techniques in precision agriculture, demonstrating that integrating the Nix color sensor with synthetic data augmentation, provides a rapid and cost-effective solution for on-site soil assessments. Future research should expand these methodologies to multi-parameter soil assessments, digital soil mapping, and broader applications in sustainable soil management and climate change mitigation.
KW - Artificial intelligence
KW - GMM
KW - Nix color sensor
KW - Random forest
KW - Soil color
KW - Soil organic carbon
UR - https://www.scopus.com/pages/publications/105022222902
U2 - 10.1038/s41598-025-24236-9
DO - 10.1038/s41598-025-24236-9
M3 - Article
C2 - 41253941
AN - SCOPUS:105022222902
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 40628
ER -