Enhancing soil organic carbon estimation with generative AI and Nix color sensor

Rachna Singh, Mriganka De, Rintu Banerjee, Anshuman Nayak, Shubhadip Dasgupta, Ayan Das, Subhadip Dey, Asim Biswas, David C. Weindorf, Somsubhra Chakraborty

Research output: Contribution to journalArticlepeer-review

Abstract

Soil organic carbon (SOC) is a key indicator of soil health, yet conventional laboratory assays are labor-intensive and costly. This study investigates a rapid and low-cost alternative by using a handheld Nix Spectro 2 Color Sensor, which captured high-resolution color data from air-dried soil samples. These color parameters were used to predict SOC with four data-driven prediction engines: Random Forest (RF), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost), and an Artificial Neural Network (ANN) and further strengthened them with synthetic data augmentation techniques. A total of 641 surface soil samples collected from six districts in West Bengal, India, were divided into 70% calibration and 30% validation subsets. Synthetic samples were produced using a combination of generative artificial intelligence (AI) techniques [generative adversarial networks (GANs) and Gaussian mixture models (GMM)] and non-parametric/statistical data augmentation methods [k-nearest neighbors (KNN) and bootstrapping] to fill critical gaps in the SOC range (3–14%). Among the baseline models using raw Nix color data, RF achieved the best validation accuracy (R² = 0.71, RMSE = 0.93%). After augmenting the calibration set with 44 GMM-generated samples (3–7% SOC), RF performance rose to R² = 0.77 and RMSE = 0.84%, while bias dropped and coverage across the SOC distribution improved markedly. The incorporation of synthetic data mitigated model bias and enhanced predictive accuracy despite Levene’s test revealing significant variance differences between calibration and validation datasets. The enhanced generalization of the model was attributed to better coverage of the SOC distribution, reducing underrepresented gaps in the dataset. The study highlighted the potential of AI-driven soil monitoring techniques in precision agriculture, demonstrating that integrating the Nix color sensor with synthetic data augmentation, provides a rapid and cost-effective solution for on-site soil assessments. Future research should expand these methodologies to multi-parameter soil assessments, digital soil mapping, and broader applications in sustainable soil management and climate change mitigation.

Original languageEnglish
Article number40628
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Scopus Subject Areas

  • General

Keywords

  • Artificial intelligence
  • GMM
  • Nix color sensor
  • Random forest
  • Soil color
  • Soil organic carbon

Fingerprint

Dive into the research topics of 'Enhancing soil organic carbon estimation with generative AI and Nix color sensor'. Together they form a unique fingerprint.

Cite this