Machine learning powers next-gen flood risk mapping with unprecedented accuracy

Beyond achieving technical excellence, the study underscores the practical utility of explainable AI in flood risk management. Using SHAP (SHapley Additive exPlanations) values, the authors analyzed the contribution of each variable to the model's output, providing insights into the physical drivers of flood risk in the region.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 09-07-2025 09:27 IST | Created: 09-07-2025 09:27 IST
Machine learning powers next-gen flood risk mapping with unprecedented accuracy
Representative Image. Credit: ChatGPT
  • Country:
  • Romania

A new scientific study published in Water has developed a highly accurate artificial intelligence-driven model to predict flood-prone areas in Romania’s Buzău River catchment. The research addresses the persistent threat of flooding in the Danube River Basin, a region that continues to endure significant ecological and socio-economic damages due to increasingly frequent extreme weather events.

The study titled "Flood Susceptibility Assessment Using Multi-Tier Feature Selection and Ensemble Boosting Machine Learning Models" introduces a data-driven framework powered by ensemble boosting machine learning algorithms. The work employs explainable AI (XAI) techniques and a meticulous feature selection strategy to improve flood risk assessments and decision-making for disaster mitigation and spatial planning.

What makes the Buzău River catchment critical for flood susceptibility modeling?

The Buzău River catchment is among Romania's most vulnerable areas to flood hazards, primarily due to its complex topography, dense river network, and a mix of forested and urbanized landscapes. Stretching across five counties and covering over 5,200 square kilometers, the region lies within the eastern Danube River Basin, an area that has seen numerous historical flood events—most recently during the 2024 floods triggered by Storm Boris.

Recurring floods in Romania have been attributed to both natural and human-induced factors. Heavy rainfall, snowmelt, deforestation, and unregulated urban expansion have exacerbated the severity of flooding. With economic damages often reaching hundreds of millions of euros annually and some counties losing over 4% of their GDP to floods, the need for advanced flood susceptibility modeling is more urgent than ever.

In this context, the research zeroes in on the Buzău River catchment to demonstrate the application of cutting-edge machine learning tools in evaluating and mapping areas at risk. The region's varied elevation and land use types make it a representative testbed for such methodologies.

How does the study utilize AI to predict flood susceptibility?

The research implements a comprehensive modeling pipeline that combines data preprocessing, multi-tier feature selection, machine learning modeling, and interpretability analysis. Initially, 13 flood conditioning factors were collected, including topographical elements like slope and elevation, hydrological indicators such as topographic wetness index and distance from rivers, remote sensing indices measuring surface imperviousness and vegetation health, and soil characteristics including clay content and bulk density.

To refine this dataset, the authors employed a four-step feature selection approach involving Variance Inflation Factor (VIF), Condition Index (CI), Mutual Information (MI), and Information Gain (IG). This rigorous filtration process ensured the removal of redundant and irrelevant factors, reducing the dataset to nine critical variables.

Four ensemble boosting algorithms, AdaBoost, CatBoost, LightGBM, and XGBoost, were then trained and evaluated using a balanced dataset of 410 points, evenly split between known flood and non-flood locations. The spatial resolution for all geospatial inputs was standardized to 30 meters, and the models were executed on high-performance computing platforms using Google Colab and Kaggle.

Performance metrics including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), R-squared (R²), precision, recall, F1-score, accuracy, κ-index, ROC-AUC, and Precision-Recall Average Precision (PRC-AP) were used to compare the models. CatBoost emerged as the top performer across all criteria, demonstrating superior predictive accuracy and robustness.

CatBoost’s exceptional performance was attributed to its ordered boosting and use of symmetric decision trees, which provide better handling of categorical and numerical data without the need for preprocessing. The model achieved an ROC-AUC of 0.972 and an average precision score of 0.971, outperforming even the highly regarded XGBoost algorithm.

What are the implications for flood risk management and policy?

Beyond achieving technical excellence, the study underscores the practical utility of explainable AI in flood risk management. Using SHAP (SHapley Additive exPlanations) values, the authors analyzed the contribution of each variable to the model's output, providing insights into the physical drivers of flood risk in the region.

The most influential conditioning factor was slope, followed by proximity to rivers, topographic wetness index, and land use/land cover. These variables significantly influence runoff patterns, water accumulation, and flood susceptibility, validating long-held assumptions in hydrological studies. Conversely, factors like soil clay content and vegetation indices played more marginal roles in this specific catchment, highlighting the importance of localized analysis.

The study advocates for the integration of such high-resolution, AI-driven models into regional land-use planning and disaster mitigation strategies. It recommends stricter zoning laws, real-time flood monitoring systems, and sustainable watershed management practices to minimize risk in high-susceptibility zones.

The research also highlights the importance of continuous data updates. Although recent remote sensing indices were included, the land cover data used was from 2018, which could potentially limit the model’s applicability to newly urbanized areas. The authors identify this as a limitation and suggest that future models incorporate more dynamic datasets and include sensitivity and uncertainty analysis.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback