Previous analysis of xG models have aggregated data over the entire season and calculated the r2 correlation coefficient between actual and expected goals. However, this loses important information on the accuracy of each indiviudal shot. For this reason, we use the Root Mean Square Error (RMSE) of the validation dataset, defined by
RMSE=√∑ni(xGi−Gi)2nRMSE=∑in(xGi−Gi)2n
where xG is the prediction by the model (a probability from 0 to 1) for the shot labelled by the index i, and G is the true outcome (0 for a non-goal, 1 for a goal). A lower RMSE indicates better performance.
In the following table we give our RMSE values for different models trained with variable numbers of features.