Explainable artificial intelligence for predicting cardiovascular events in hospitalised COVID-19 patients

Introduction Coronavirus disease (COVID-19) increases the risk of cardiovascular complications, and artificial intelligence (AI) offers promising tools for early risk prediction. Objective To develop AI models capable of identifying predictors of cardiovascular events in hospitalized COVID-19 patients. Methodology Retrospective multicentre cohort, which included adult COVID-19 patients from 25 hospitals (March/2021–August/2022). Cardiovascular outcomes, inluding arrhythmia, acute heart failure, myocardial infarction, myocarditis, and pericarditis, were combined into a composite outcome. Two predictive models were developed using the Light gradient-boosting machine (LightGBM): model 1 used 59 variables (demographic, clinical, laboratory, and socioeconomic data) while model 2 used 52 variables (excluding socioeconomic factors). Model performance was assessed using accuracy, macro-F1, recall, precision, and area under the receiving operating characteristic curve (AUROC). Shapley additive explanation (SHAP) values identified the most influencial predictors. To address class imbalance, we applied random oversampling. Results Among 10,700 patients (median age 59 years [interquatile range 48–70]), 5.3% experienced the composite outcome. Both models showed moderate discrimination (AUROC: 0.752 and 0.760) and high accuracy (94.6% and 94.5%). However, class imbalance resulted in low macro-F1 scores (51.2% and 50.7%). F1 scores were high for the majority class (non-events: 97.2%) but very low for the minority class (cardiovascular events: 5.2% and 4.2%). Even after oversampling, performance for the minority class remained limited, with a maximum F1 score of 21.5%, primarily driven by gains in recall. SHAP analysis identified age, urea, platelet count, and oxygen saturation/inspired oxygen fraction (SatO2/FiO2) as key predictors. Conclusion Despite moderate AUROC and high accuracy, both AI models demonstrated limited ability to detect cardiovascular events due to class imbalance. The persistently low F1 score for the minority class underscores this limitation. Traditional rebalancing techniques produced only small gains, mostly improving recall occurring at the expense of precision. Age, urea levels, platelet count, and SatO2/FiO2 were identified as the most relevant predictors of cardiovascular complications in this cohort.