FIFA 2022 Distribution Analysis
- sherry salek
- Aug 5, 2022
- 4 min read
I am going to analyze FIFA 22 Soccer Game for EA Games Company. Since 1993, FIFA, the world's governing body for soccer, and Electronic Arts, a U.S. video game company, have partnered in making the world's best-selling sports video game franchise.
of video games, EA Games. EA made the game. FIFA lent its name. In almost 30 years, they've sold more than 300 million copies and made more than $20 billion. The game is sold in more than 50 countries, with commentary available in almost 20 languages.
In this analysis, I need to ensure the game is well rounded and provides a genuinely enjoyable experience
for all the customers.
The game has a professional competitive scene, so it needs to be balanced which means that no team or individual player should invariably be a preferred option regardless
of the opposition.
Therefore I expect to have an equal number of good players and poor players in the game.
Let's see if that's the case.
I used FIFA 22 Complete Player Dataset in Kaggle containing the stats for each individual player in FIFA 22. So let's have a closer look.

Let's examine the "Overall" column.
It represents the quality of a player in their natural position on a scale from 1 to 100. This value is a sort of weighted average of the many individual stats each player has. As you probably know, the importance of attributes varies for different positions on the field. For instance, acceleration and top speed are more important for a winner than tackling. However, the inverse is true for center backs. Thus we alter the weight for each stat based on the position of the player. Therefore, we do not have a single formula which calculates the overall evaluation. To get an idea of how well distributed the overall values are, we can construct a histogram and set the bin size to one.

The graph is bell shaped and resembles a normal distribution.
Since we are dealing with rounded averages, we are inclined to believe that the overall
value is not entirely discrete but rather an approximation.
Let's take a closer look at the graph.
Now we can notice its thin tails, which suggests a smaller number of outliers.
This reflects real life quite accurately, since very few professional players are exceptionally good
or bad at every single aspect of the sport.
Besides, even the least skilled professional soccer players are far superior to the average person.
That explains why the lowest overall values start from around 50 rather than zero.
The stats should reflect the performance of players in the real world as normal distribution is the
most frequently observed in nature.
It is only logical that the data resembles this distribution.
Moreover, the bell shaped graph with thin tails further supports this idea.
Since one of the main characteristics of a normal distribution is symmetry, the overall values are
symmetrically distributed.
Thus, we can safely consider the game balanced and acceptable for competitive play.
It is also worth mentioning that players within the single team or division share similar stats.
This skews the data a certain way and explains why we cannot expect the values to follow a normal distribution.
Now, if we wish to further test the balance of the overall stats, we can examine a small sample of random players. For instance, we can construct a histogram of the first 30 players in the data set based on their ID number. Since our data is limited, we need to adjust the size of the bins. Otherwise, it is possible for each value to occur only once or twice. If we adjust the bin size to three, we will see that the graph slightly resembles a normal distribution. However, we will also notice the fatter tails since the number of observations is limited. We can safely consider this sample follows a student's T distribution.

The student's T distribution is also symmetric.
So we are confident that even the small sample we are examining confirms our goal of a balanced game.
Now, let's explore how a single stat is distributed among the players in the game. If we examine a goalkeeping trait like GK diving, we will be able to see the division into types.

We have two completely different clusters.
The low value represents how outfield players would perform in gold, and the higher one represents
the actual goalkeeper's performance.
If we only examine the goalies, we will see the values are normally distributed once again.
So the game is indeed balanced.
Another aspect which makes the game more enjoyable is creating a sense of realism. For instance, the young professional soccer players outnumber the veterans which means having fewer players above the age of 35 than players below the age of 20. To make sure the game captures this aspect of the sport, let's check out The Age column.

Age is a discrete variable representing the age of each player.
In addition, age has a minimum value of 16, since the game only consists of first team players who
have signed a professional contract.
Thus, we can consider 16 as the starting point for any player who can sign a professional contract. It seems as sort of an origin for a Poisson distribution.
Then each bar in the graph would showcase the likelihood of a certain player within the data to be a
specific age.
Since a Poisson distribution is skewed, the younger players outnumber the older ones.
That is also true in real life.
Therefore, this creates an additional layer of realism to the game and should make it more enjoyable
for the customers.
コメント