This study aimed to evaluate the performance of machine learning and regression methods in the prediction of 3-level version of EQ-5D (EQ-5D-3L) index scores from a large diverse data set.
A total of 30 studies from 3 countries were combined. Predictions were performed via eXtreme Gradient Boosting classification (XGBC), eXtreme Gradient Boosting regression (XGBR) and ordinary least squares (OLS) regression using 10-fold cross-validation and 80%/20% partition for training and testing. We evaluated 6 prediction scenarios using 3 samples (general population, patients, total) and 2 predictor sets: demographic and disease-related variables with/without patient-reported outcomes. Model performance was evaluated by mean absolute error and percent of predictions within clinically irrelevant error range and within correct health severity group (EQ-5D-3L index <0.45, 0.45-0.926, >0.926).
The data set involved 26 318 individuals (clinical settings n = 6214, general population n = 20 104) and 26 predictor variables plus diagnoses. Using all predictors and the total sample, mean absolute error values were 0.153, 0.126, and 0.131, percent of predictions within clinically irrelevant error range were 47.6%, 39.5%, and 37.4%, and within the correct health severity group were 56.3%, 64.9%, and 63.3% by XGBC, XGBR, and OLS, respectively. The performance of models depended on the applied evaluation criteria, the target population, the included predictors, and the EQ-5D-3L index score range.
Regression models (XGBR and OLS) outperformed XGBC, yet prediction errors were outside the clinically irrelevant error range for most respondents. Our results highlight the importance of systematic patient-reported outcome (EQ-5D) data collection. Dialogs between artificial intelligence and outcomes research experts are encouraged to enhance the value of accumulating data in health systems.