Back to Blogs

#Euro2020 Nothing Left to Chance: How Big Data Transformed Football

How Big Data Transformed Football

Published on Jun 24, 2021

When a football club’s star signing is a data analyst, you know something’s up.  

The phenomenon, in fact, is now common among football clubs worldwide. Bottom, middle, or top tier, clubs now heavily invest in data analytics for the same reason, say, investment banks do: to make more accurate, wide-ranging predictions; to track progress; to gain a competitive edge.  

Those who have seen the Oscar-nominated film “Moneyball,” you know what we are talking about.  

For the rest, understand that intuition in football, on and off the field, is important. But it’s overrated. 

Data is the future of football. And of sports, for that matter.  

Numbers provide a measure of true potential. Insights tell us how to tap it. But, as we’ll find out, only if you know what you’re looking for.  

The rise of data-driven football 

Take Michael Edwards, for example.  

Edwards is the sporting director of Liverpool Football Club, the English Premier League’s most decorated club. And he is renowned, almost worshiped by his club’s fans, for leveraging data analytics to sign players. 

In the last few years, Edwards has negotiated deals for players like Sadio Mane, Mohamed Salah, and Virgil van Dijk. These players were critical to the club’s historic run of winning the Premier League, Champions League (a Europe-only league of the best of the best), and the Club World Cup.  

When signed, the players, except perhaps van Dijk, were deemed better than average, and therefore cost the club, in aggregate, just north of 160 million USD.  

Today, they are considered world-beaters. For the 2019-2020 season, the three ranked 4th, 5th, and 2nd, respectively, in the rankings of the Ballon d’Or — football’s highest annual honor. At their peaks, their joint valuation crossed 450 million USD.  

That’s quite the bargain if you ask us. 

The point is, Edwards and his team of world-class analysts used data-driven metrics and KPIs to scout these players for years, identifying their less-known but astronomic potential before anyone else.  

What’s remarkable is that this was also the fate for several other Liverpool players Edwards helped sign, like Andrew Robertson and Fabinho.   

As a result, Liverpool has earned the badge of “making superstars, not buying them.”  

That said, Liverpool’s outrageous deals and data-driven success shouldn’t come as a surprise to those who know its owners.  

Liverpool Football Club is owned by Fenway Sports Group (FSG), the sports investment giant that also owns Boston Red Sox, the baseball club that pioneered the data-driven model to run a sports enterprise.  

The American investors took the algorithmic approach at the heart of their baseball empire and applied it to the English football club.  

The wild success of their structured and fact-based model couldn’t be more evident. And everyone wants in. 

How data-driven football works  

Using statistical analysis to predict football or sports outcomes, in general, is not new. Its origins are attributed to Charles Reep, a gifted accountant who recognized that a passing move of more than four passes was more likely to end up in a goal.  

Reep’s central idea became the cornerstone of data-driven sports modeling — past performance is an excellent indicator of future performance.  

Though a breakthrough, collecting data on past performance itself was a challenge in Reep’s time, let alone analyzing it with sophisticated computers.  

That, of course, changed as time passed.  

In the 1990s, tape technology advanced and video footage of training sessions and matches became easily available.  

The data was limited, but coaches didn’t let perfect become the enemy of good. It was enough to evolve their understanding of the sport.  

Visual data made identifying weak spots in the opposition, for example, slightly easier, so that teams could develop game plans that capitalized on them. The edge would be slight, but significant.  

Then, the data revolution occurred.  

Today, not only is data abundant, but we have also made strides in computing technology, creating sophisticated and powerful analytic tools like machine learning.  

Earlier, data analysis was one, perhaps two-dimensional. Today, the dimensionality is mind-boggling.  

A boom in wearable sensors and different-angled, dedicated high-definition cameras for individual players has led to an explosion of data, which has allowed analysts to get insights into player performance not just in terms of scoring goals and making assists, but to further break down the performance to get a more detailed and in-depth view of their abilities.  

Most tech or data-savvy fans might be familiar with Expected Goals (xG) and Expected Assists (xA), indicators, as the names suggest, of how many goals or assists a player or team is expected to score or make.  

But analysts further break down the numbers in terms of specific areas, such as forward passes (further broken down in terms of different distances), the number of shots taken (where on the field, and how?), tackles, ball recoveries, and so forth.  

Here’s how one analyst put it to The Guardian: “Right now my model evaluates shot attempts across a variety of axes: where was the shot attempted from? What sort of pass assisted the shot? With what body part was the shot taken? Did the attacker dribble past his defender before trying the shot? How fast was the attacking move that led to the shot? Was the shot off a rebound or from a set play? All of these factors clearly influence the likelihood of scoring a goal. By aggregating this information into a model, I can estimate the likelihood of scoring different shooting chances in a match or over a season.” 

Then, there’s health. 

How fast does a player run? For how long? What’s the injury record? Sensors track heart rate, sweating, heat, muscle tension, and hundreds and perhaps thousands of KPIs and metrics to track fatigue, fitness, and health. And let’s not get into diet.  

This is the magnitude of past performance data today — as many as 1.4 million data points per match, according to one estimate.  

The deluge of data is fed to sophisticated algorithms that offer valuable insights into improving fitness and performance. More importantly, the insights are measured and ultra-specific. 


The limits of data 

We have to admit that the title is misleading.  

Given the tremendous success of the data-driven model, most clubs are eager to adopt it. Especially those who are short on funds and want to make the most of their resources.  

Then there are clubs like Manchester City, Chelsea, and Paris Saint Germain, whose pockets are so deep that they can break the bank to sign a superstar on short notice. 

It’s not that these clubs don’t rely on data. Today, that’s outright impossible. To some extent, every club does, especially the big-league ones. But that they can afford to take a financial risk to solve a problem immediately.  

The data-driven model fails when one expects returns immediately. It delivers results in the long run. That is, if the stars align. Secondly, one can never get rid of luck or uncertainty from the equation.  

FSG acquired Liverpool FC in 2010, and the business was self-sustained. Which is great financially, but not enough for fans who desire silverware.  

What changed in the last few years? Well, the management did. Liverpool found a world-class manager in Jürgen Klopp in 2015, and the rest is history.  

Data is meaningless without context. Numbers become just that — numbers — when we don’t know what we’re looking for.  

Where Liverpool excels is not finding players that score high on certain KPIs, but in finding players whose KPI scores are relevant to them. In other words, players that are a perfect fit for their system of play.  

This is what we mean by the stars aligning. Everything must fit together like a jigsaw puzzle.  

But before we begin to put together the pieces, we must first have an understanding of what the bigger picture is, and how the pieces work. What even are the pieces?  

The challenge is to not just invest in data and software, but also in brainpower that can understand the data and analyze it to harvest groundbreaking insights.  

But even then, yes, even then clubs can manage to fail because data cannot capture everything. 

What makes a manager or player world-class is not just tactics and xG or xA, but also passion, commitment, courage, how well they inspire their teammates, and other qualities that slip through the net of data analytics.  

Rich owners can adopt statistical models but fail because they hire coaches that don’t get the best out of their players. The alternative is to splash the cash on verified superstars and eminent coaches, and hope it all works out.  

To attain success, the stars must align, which is rare or time-consuming. That’s the first limit.  

The second limit is luck.  

Luis Amaral, an engineering professor at Northwestern University, is the creator of AFR, Average Footballing Rating, a score attributed to players or teams based on calculations by sophisticated coding tools and complex algorithms.  

The algorithms might be different, but the principle is identical — collecting and analyzing tons of different variables to arrive at a defined number that relates to success. In AFR’s case, a score out of 100.  

AFR is widely cited, but even Amaral admits, like any humble analyst, that uncertainty is unavoidable. Things happen. Humans err. Whether it’s missing an easy chance or getting injured. 

Data-driven models are based on probability. What they offer is not certainty, but consistency.  

If a player is expected to score one goal per game, the player will not literally score one goal every game, but score two today, three tomorrow, and often none, such that given some time (the longer the better), the player’s average will regress to its mean — one goal per game.  

And that’s another reason quality is critical.  

An average player with a higher scoring average may crumble and miss a crucial chance in a big game. But a world-class player, despite a low average, could be counted on when it matters most.  

As the adage goes: Form is temporary. Class is permanent.  

And therefore, what’s recommended is neither complete reliance on intuition. Nor complete reliance on data.  

As is with everything else in life, a balance must be struck.