A comprehensive guide to Expected Goals…
Expected goals, commonly abbreviated as xG, is a model based on historical shot data that reflects the probability (on a scale of 0 to 1) that a shot will result in a goal based on the characteristics of that shot and the events leading up to it.
These variables or characteristics may include but are not limited to: The location of the shooter (distance and angle away from goal), if the shot was with a header or foot, the assist type, as well as other variables that would intrinsically affect the probability of a shot resulting in a goal. A value of ‘0 xG’ would represent a certain chance of the shot not being a goal, and a value of ‘1 xG’ indicates a certain goal.
It is important to note that not all xG models are created equally, and some are better than others. Intuitively, xG models richer in historical shot data and other variables will result in an improved and thus more accurate model. StatsBomb’s expected goal model, from what I’ve seen, is the most granular in the industry and has been since 2018 as it includes ‘freeze frames’ on every shot. These freeze frames, powered by computer vision, include the position and pressure of the defenders and the goalkeeper’s positioning.
For example, if a player had a 1v1 against the goalkeeper and went round him for an opportunity at an open goal, but from relatively far out, other xG models may value the opportunity as a relatively low-value xG or chance, as it does not have the defenders and goalkeeper positioning in the model to accurately reflect the high-quality chance as StatsBomb does. Hence, StatsBomb refers to these models, albeit arguably harsh, as ‘naive xG’.
Last year, StatsBomb further improved their xG model by including the variable ‘Shot impact height’, which invariably affects the probability of scoring a goal from a shot.
The stat, model or metric of xG has gained much publicity in the world of Football. The big spike in interest started when the BBC’s ‘Match of the Day’ started, from the beginning of the 2017/18 season, displaying each match’s expected goals. Excitement and outcry followed as some viewers were confused and did not know what to make of this new metric.
Although this was the first time xG was broadcast on TV, xG has been around for longer than you might think. xG was first introduced in a study published in 2004 and then grew to prominence among football analytical circles from 2012 onwards after Sam Green’s innovative article for Opta.
Expected Goals is a bit like blue cheese
xG is a bit like blue cheese; some love it, some hate it. There are three different and distinct ‘camps’ when it comes to one’s opinion on xG. There is the die-hard pro xG camp comprised of data analysts, statisticians and quantitative analysts, and those well-versed and integrated within the world of football analytics.
To some, they are viewed as the ‘progressives’ and see football differently to the ‘traditionalists’ and realise that there is perhaps more to be understood from football performance than just the final result of a match or the standings in a league table.
There is the neutral camp who are ambivalent and do not harbour strong opinions towards it and maybe don’t quite understand xG.
And then there is the dreaded – ‘I f@*^!%g hate xG camp‘. These are your typical middle-aged football pundit, ex-player or manager, or fan that sees football in a ‘traditional’ or ‘old-school’ way. There have been many notorious or infamous rants on expected goals by these so-called ‘experts’. Craig Burley, on ESPN, and Jeff Sterling, along with several other pundits on Sky Sports, ripped into Expected Goals, calling it a ‘whole lot of nerdy nonsense’ [Burley in 2018], and ‘the most useless stat in the history of football’ [Sterling in 2017] with a look of anger, bemusement and frustration, as if xG is some farce that is taking over our beloved game.
I strongly advise watching the following two videos:
Craig Burley rant:
Jeff Sterling rant:
Whilst I certainly cringe a little and strongly disagree with their viewpoints when listening to their rants, I don my empathetic hat and try to put myself in their shoes for a moment. And immediately, it becomes clear that they do not understand xG at its core and are misinformed on how to use it as a metric.
I hypothesise that because xG is a metric or statistic derived from a model rather than an absolute stat that one can count, such as completed passes or tackles or even shots on target, they find it difficult to get their heads around the jargon. Perhaps because of the name ‘expected goals’, they believe that it is a ludicrous statistic because it simply does not matter how many goals were expected to be scored in a match because the final score is all that matters.
Now, I don’t know anyone in or out of the industry of football analytics that will be popping open champagne bottles after a cup final if their team has just lost the match but ‘won on xG’ – if their team’s xG was of greater value than that of the opposition. No advocate for xG has ever suggested that it is more important than the final score-line. This is what everyone interested in the sport cares most about. That is what directly affects the league table or who progresses to the next round of the cup.
However, from my perspective, xG is a metric that is a bit like a Swiss Army knife in the sense that it can be used in so many different ways. By looking at a team’s xG over the course of a season, one can see how effective a team is at generating shooting chances offensively as well as preventing oppositional shooting chances defensively. xG is often used to evaluate and quantify the quality of the shooting chances an attacking player generates for himself. xG has also given rise to the birth of new metrics that those in football analytical circles will be familiar with, such as expected assists (xA) and post-shot expected goals (PSxG or xG2), a metric predominantly used to evaluate goalkeeper performance or finishing skill.
No stat is perfect, and there are certainly limitations to xG, particularly when used to evaluate a team’s dominance on a per match basis. This is partly because it is challenging to assign an expected goal value accurately on a per-shot basis. Moreover, game state (whether a team is losing, winning or drawing at a certain point in the match) can significantly affect the match’s final xG. Also, because football is a very low-scoring sport with a lot of variance and luck, certain match events may dramatically affect xG. For example, if a through-ball puts the striker through on goal, it likely results in a ‘high xG chance or value’ if the striker gets the shot off, but what if he doesn’t? What if the through-ball was intercepted? Teams can be dominant and create opportunities but not necessarily get the final shot off on goal, which is what ultimately counts towards xG.
Shots only account for approximately 2% of all actions in a football match, so we can’t just use a model or metric based on this to form one’s opinion on a single match. However, xG is not the only metric that can be used for this purpose. Other more advanced metrics based on mathematical models can be used in an attempt to quantify a team’s dominance, such as expected threat (xT), VAEP (Valuing Actions by Estimating Probabilities), as well as Non-shot expected goals (NSxG).
So, you finally ask, how the bloody’ell should we use xG then? Well, it’s simple – you use it like you would any statistic or model. Understand it fully, its strengths and limitations, and use it as a guide to form your evaluation. Don’t base your entire thesis on the xG of a single match, as you are only setting yourself up for misleading results.
A common phrase, particularly in the world of football, is that stats are or can be misleading. While this can indeed be the case, it is crucial to note that stats are only misleading if you decide to use or misleadingly interpret them. It is not xG’s fault that Craig Burley, Jeff Sterling, and more recently, Jim White, do not know how to use it.
You often hear the expression that by the end of the season, ‘the league table never lies’, as if all good-luck and bad-luck balances itself out and a worthy winner is crowned. But how true is this?
How repeatable or predictive is success? Is success always the outcome of a good process?
In part 2, I will explore some of the metrics that can be used to forecast next seasons’ league table. Hint: expected goals may be a better predictor of future results and performance than using point tallies from previous seasons, Mr Craig Burley.
Thank you for reading, and stay tuned for Part 2!
Words by Guy Kaye – @GuyKaye2