Research & Publications

An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues

By Rajaul Karim | 03 May, 2025

An optimized ensemble ML-WQI model for reliable water quality prediction by minimizing the eclipsing and ambiguity issues
Type: Journal Paper.
Journal Name: Applied Water Science (Impact Factor: 5.8 | Q1 Journal).
Publisher: Springer Nature
Date: 29 April 2025

🔗 Access Link :
📄 Article: Read Here

Abstract: Monitoring water quality is essential for the sustenance of the ecosystem and various forms of life on Earth. The water quality index (WQI) models are the widely adopted approach to water quality monitoring. However, they received much criticism for the reliability and inconsistency of the model, often triggered by eclipsing and ambiguity issues. In addressing these, recently, data-driven approaches through the integration of machine learning or deep learning (ML/DL) techniques are notably applied to develop improved WQI models. Although these models perform better than the conventional ones, recent studies have reported that the proposed approaches often produce inconsistent results due to data variability and outliers. The purpose of this research is to define a robust and reliable ensemble ML-WQI model that is optimized to attenuate the effect of data variability, eclipsing, and ambiguity issues for accurate water quality prediction. To define the ensemble model, eight prominent regression ML models are used to select the best-performing base-estimators and the meta-learner. The Irish WQI dataset used in the study includes 29,159 samples spanning over 15 years. Each data sample records 11 (eleven) water quality parameters and the corresponding measurement and classification of WQI, calculated using three traditional WQI models, namely, CCME, Brown, and SRDD. To evaluate performance, mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R-squared ( ), fivefold cross-validation, and a comparative evaluation with existing ML models are carried out. In addition, resilience to eclipsing, ambiguity, and outliers is quantitatively assessed using the WQI classification data. The findings revealed that the ensemble ML-WQI model with linear regression (LR), random forest (RF), and extreme gradient boosting (XGB) as base-estimators, and decision tree (DT) as the meta-learner, achieves high classification accuracy with MAE, MSE, RMSE, and scores of 0.01, 0.001, 0.0034, and 1.00, respectively. This performance measure is better than the existing regression-based ML-WQI models. In addition, the model shows greater resilience to outliers by classifying all WQIs close to the general trend of water quality. The model has a very low eclipsing effect (23.9%) as compared to CCME (50.50%), Brown (32.20%), and SRDD (77.20%). In relation to the ambiguity issue, the model demonstrates greater stability than traditional WQI models. Therefore, the proposed ensemble model is robust to the inherent variability of the water quality data in predicting a reliable WQI classification. This data-driven, autonomous, cost-effective, and easy-to-comprehend ML-WQI model should provide strong support to researchers in building a comprehensive water quality monitoring and management system.

Main Architecture:

Research approach followed in this study

 

Method to measure eclipsing for the CCME, SRDD, brown, and the ensemble ML-WQI model

Fig. 14
Temporal variation of WQI values for the four WQI models due to eclipsing
Fig. 15
Assessment of ambiguity in the four WQI models
×