Open-Source AI Model Predicts Groundwater Levels for Proactive Water Governance

Researchers from Université Paris Cité, Sorbonne Université, Telecom SudParis, Institut Polytechnique de Paris, and Shanghai Jiao Tong University have developed a machine learning pipeline that classifies groundwater levels across France using 3.4 million observations, achieving high predictive accuracy. The open-source system provides early drought warnings and guides resource allocation, offering a scalable, data-driven tool for proactive groundwater management under climate stress.


CoE-EDP, VisionRICoE-EDP, VisionRI | Updated: 29-08-2025 09:46 IST | Created: 29-08-2025 09:46 IST
Open-Source AI Model Predicts Groundwater Levels for Proactive Water Governance
Representative Image.

Groundwater, the planet’s most vital hidden reservoir, underpins ecosystems, sustains agriculture, and secures drinking water for billions, yet remains one of the hardest resources to monitor effectively. A new study led by researchers from LIPADE at Université Paris Cité, Sorbonne Université, Telecom SudParis, and Institut Polytechnique de Paris, in collaboration with Shanghai Jiao Tong University, introduces a machine learning–driven framework that could transform how nations anticipate shortages and allocate this fragile resource. The urgency is undeniable: climate change is disrupting rainfall patterns, evapotranspiration is intensifying during hotter summers, and water extraction is climbing in regions where aquifers serve as lifelines for agriculture and public supply. Conventional modeling tools like MODFLOW and FEFLOW provide scientific rigor but depend on dense monitoring networks, extensive hydrogeological surveys, and heavy computational power, leading to delays in delivering actionable insights at national scales.

Building a National-Scale Open-Source System

The Franco-Chinese research team responded by constructing a national-scale, open-source pipeline, tested in France, where groundwater plays a critical role in both agriculture and public supply. The system integrates more than 3.4 million groundwater observations from 1,500 monitoring wells in the ADES database maintained by the French Geological Survey. These records were combined with high-resolution meteorological data from Météo-France SAFRAN and physiographic layers from the European Corine Land Cover database. By harmonizing these inputs, the pipeline generates a consistent, information-rich dataset. A key innovation lies in its feature engineering: rainfall totals, temperature averages, evapotranspiration levels, and seasonal cycles were distilled into meaningful predictors of groundwater variability. These features were then processed by AutoGluon, an automated machine learning framework that ensembles models such as LightGBM, CatBoost, XGBoost, Random Forests, and neural networks. The system classifies groundwater into five operational states—Very Low, Low, Average, High, and Very High—mirroring the categories used by water managers for drought warnings and allocation decisions.

Capturing Local Realities with Smart Data Links

Spatial realism is ensured through a k-nearest-neighbor matching system, illustrated in the study’s schematic diagrams. Each well is automatically paired with the closest meteorological, hydrological, and abstraction stations, with 90 percent of connections falling within 25 kilometers. This ensures that groundwater observations are contextualized by local environmental drivers without resorting to heavy geostatistical interpolation. The pipeline also incorporates rolling-window features and interaction terms to mimic the delayed effects of wetness and energy availability. For instance, combining rainfall totals with temperature averages helps capture evapotranspiration dynamics, which play a decisive role in recharge and depletion processes. This coherent data framework is then fed into AutoGluon, which automatically selects and tunes the best-performing models, stacking them in layers to maximize predictive power. By optimizing for the weighted F1 score, the system places particular emphasis on correctly identifying rare but critical states like Very Low groundwater.

Strong Results and Real-World Scenarios

The performance is noteworthy. On validation data, the ensemble achieved a weighted F1 score of 0.927, with precision and recall both exceeding 0.92. On a temporally distinct 2023 test set of over 600,000 records, the model maintained a weighted F1 of 0.67 and an accuracy of 0.66, despite natural distribution shifts. Most importantly, the extremes were predicted most reliably: an F1 of 0.78 for Very Low and 0.72 for Very High, which are the categories that matter most for proactive drought and flood-risk management. Feature importance analysis confirmed hydrological logic—long-term rainfall aggregates and seasonal markers dominated the predictions, along with well depth and temperature proxies. To showcase operational value, the authors conducted a drought-simulation experiment for southern France in summer 2023. Imposing a 30 percent rainfall deficit and a 1.5°C temperature rise, the model projected that nearly 38 percent of wells would fall into the Very Low category by mid-July, with another 25 percent classified as Low. Such insights, available within hours, could help authorities issue early scarcity warnings, restrict abstractions, and prepare communities for water-use limits weeks in advance.

Challenges, Future Directions, and Global Promise

Despite its success, the researchers acknowledge limitations. Predictions for mid-range categories such as Low and Average remain weaker, reflecting overlapping signals and concept drift. More frequent retraining and adaptive validation are recommended. The team also sketches future improvements: moving beyond categorical classes to probabilistic forecasts, integrating Graph Neural Networks to capture spatial dependencies, and fusing Earth observation data from satellites like GRACE-FO and SMAP. They also emphasize the need for dynamic land-use and abstraction records to better represent human pressures on aquifers. Interpretability remains another frontier, with calls for explainable AI to make ensemble models more transparent to policymakers.

The broader implications are significant. By making the system open source, the team ensures scalability and global adoption, offering a low-cost complement to physics-based models. Its impact statement highlights both opportunities and risks: while the pipeline can enable earlier drought alerts, reduce agricultural losses, and support equitable water sharing, careless reliance without hydrological oversight could lead to misinterpretations. Continuous validation, expert involvement, and clear communication of uncertainty are essential. Yet the achievement is undeniable. By fusing piezometric, meteorological, and physiographic data into a machine learning engine, the researchers have delivered not just a technical innovation but a vision for data-driven groundwater governance. In a world where aquifers are under escalating pressure from climate change and human demand, this study offers a roadmap for nations to move from reactive crisis management to proactive, intelligent stewardship of their most indispensable hidden resource.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback