Flood susceptibility mapping (FSM) is crucial for effective flood risk management, particularly in flood-prone regions like Pakistan. This study addresses the need for accurate and scalable FSM by systematically evaluating the performance of 14 machine learning (ML) models in high-risk areas of Pakistan. The novelty lies in the comprehensive comparison of these models and the use of explainable artificial intelligence (XAI) techniques. We employed XAI to identify significant conditioning factors for flood susceptibility at both the model training and prediction stages. The models were assessed for both accuracy and scalability, with specific focus on computational efficiency. Our findings indicate that LGBM and XGBoost are the top performers in terms of accuracy, with XGBoost also excelling in scalability, achieving a prediction time of ~18 s compared to LGBM’s 22 s and random forest’s 31 s. The evaluation framework presented is applicable to other flood-prone regions and highlights that LGBM is superior for accuracy-focused applications, while XGBoost is optimal for scenarios with computational constraints. The findings of this study can assist in accurate FSM in different regions and can also assist in scaling up the analysis to a larger geographical region which could assist in better decision-making and informed policy production for flood risk management.