The midfielder algorithm explainer

Short Introduction

First of all, I would like to thank each of you for going through the midfielder dashboard generator app and sharing your support and feedback. In this article, I will go through the details of what each category of midfielder means, and what method did I follow to calculate all the scores.

(Sidenote: if you find this article somewhere apart from my Twitter, allow me to introduce myself. I am Debatra Chatterjee. In the context of this article, I am a person who does football data analysis as a hobby. I have recently launched a streamlit application where you can generate dashboards for midfielders in various roles. This article acts as an explainer for the application. You can find the application here - https://debatra-midfielder-dashboard.streamlit.app/)

Q1. What does each category of midfielder mean?

Currently, I have divided the profile of midfielders into 6 different roles - Build-up DM, Ball-carrying DM, Ball-carrying CM, Deep-Lying Playmaker, Advanced Midfielder and Defensive Midfielder. They are all segregated by differing weights of each of the 5 aspects of midfielders - Build-up, Progression via Pass, Progression via Carry, Chance Creation and Defending. Here's what each of them have stronger focus on -

1. Build-up DM:

In my definition, a build-up DM is one who will mostly stay in the def and middle 3rd of the pitch in possession. They will receive the ball in deeper areas and accurately play it to their teammates. Out of possession they will be a good defensive presence.

For this role I have given the equal highest weights to two aspects - build-up score and defensive score, with the other three aspects having equal lower weightage.

2. Ball-carrying DM:

A ball-carrying DM does not participate too much in the build-up. In possession they receive the ball from players in the same line as them (their midfield partner or an advanced fullback) and then they carry the ball through an opposition line. Out of possession, they will be a strong defensive presence.

For this role, progression via carrying and defensive aspects have equal high weightage, with the other three aspects having equal lower weightage.

3. Ball-carrying CM:

They are in many ways the budget version of ball-carrying DMs. Exceptionally good at carrying through opposition lines, but not expected to contribute defensively. They are squad players who can be brought on during specific game-states.

For this, only progression via carry has a very high weightage, with the other four aspects having equal low weightage.

4. Deep-lying playmaker

Players who can sit deep and can both slow the game down or speed it up when in possession. This type of midfielder is not expected to defend even though their home is outside the team's attacking third. They can use the extra space in deeper areas to play longer balls to start attacks. They are also trusted to accurately play the ball in build-up phases. With the increased focus on frontline pressing and increasing fitness levels in football, this is also fast becoming a squad player profile who can be brought on during specific game states.

For this, I have provided the highest weight to progression via passing, followed by build-up aspect. The other three aspects have equal low weights.

5. Advanced midfielder

The chance creating machine, this position is self-evident from the name. This should ideally shortlist the advanced 8s or the 10s (fbref's position labels make this not always true)

Chance creation has the highest weightage, followed by equal weights to progression via pass and progression via carry. Build-up and defense aspects have the equal lowest weight.

6. Defensive midfielder

Again, self-evident from the name. They are pure defensive stalwarts in midfield, with no other responsibilities. Defensive aspect has the single highest weightage, the other four having equal low weights.

Finally, we have the balanced role, the one where all aspects have equal weights. This is to flag the best all-round midfielders in the league.

Q2. What was the scoring methodology used?

Each of the 5 aspects of the midfielders are further comprised of a combination of multiple data points, which are either all available on fbref or can be feature engineered from existing fbref data points. For example, the progression-via-pass aspect is made up of 18 data points, of which 9 are available directly from fbref and 9 have been generated via mathematical computations on two or more fbref columns. These data points are a combination of both raw numbers (for example, number of passes completed) and scaled numbers (for example, number of attacking touches per 100 touch). However, we know that using either of the raw numbers or scaled numbers in isolation can provide skewed results.

Resolving this skewness was the biggest challenge of my project. After trying multiple approaches, including running full scale linear regression models at one point, I finally decided on a simpler solution. I created an attribute by pairing up each scaled data point with its corresponding raw data point, and balanced the weight between the scaled data point and raw data point on the basis of absolute value of correlation between the scaled and the raw value for that league.

For example, the raw data point is number of attacking touches per 90, and the scaled data point is number of attacking touches per touch. If there was zero correlation between these two, then each of them will have equal weights (=0.5). Then the score for this attribute would be = (0.5 times the percentile of the player for number of attacking touches per touch) + (0.5 times the percentile of the player for number of attacking touches per 90). But if there is a correlation, that will increase the weight of the scaled data point at the expense of the raw data point in such a way that the sum of weights is always equal to 1.

Now, with the score of each combined attribute generated, each of the attributes are assigned individual weights based on their relative importance for the aspect (i.e., build-up, progression via pass, etc.), and the score for each aspect is calculated.

Finally, based on the role of the midfielder, each aspect is given weights (as discussed in Q1.) and the score for the player for that particular role is calculated under overall score. Then the midfielders are ranked for each role in descending order of this overall score.

Short Conclusion

Phew. This was (I'm not kidding) a brief overview of how the algorithm is using fbref data to generate scores for the players. I would like to thank you for your time if you have read this far. Also, I would love to discuss this further, so feel free to shoot me a DM on Twitter (or X)

Ciao.

0
Subscribe to my newsletter

Read articles from Debatra Chatterjee directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Debatra Chatterjee
Debatra Chatterjee