Soccer, the most played sport around the world, a way to have fun with friends, to get hoarse celebrating the game winner goal or cry out loud after a disappointing loss, is a sport that mixes all these feelings and nothing can describe better the World Cup than this, a way to mess with our feelings. Trying to avoid a heart-breaking elimination (again), we decided to Data Science it and try to figure out which countries have the most chance to bring the trophy home.
Guess the future is not an easy task, and even our best partners here at the Data Science team could, and probably are going to miss, so whenever probability that you read here, it is a estimation, and not a certain.
The method we are going to apply here was developed by ourselves, and consists in a few steps, the data ingestion, data processing, and finally, the calculations. The universe of data we are going to use are all the results of the matches that lead to the big stage at Qatar, for each of the 32 national squads.
After getting this data, we decided to look at goals scored by each interval of 15 minutes during the game, we called it facts. Counting these facts by interval, we can calculate some descriptive measures, like the standard deviation and mean. Knowing this numbers become easy to apply the prediction model.
For example, if Germany scores 0.43 goals on average between 0 and 15 minutes it means that if we roll one dice with 10 sides, every time the dice rolls 1, 2, 3 or 4 it’s a goal, because its inside the interval, of course this gives us some problems of rounding, but thanks to the magic of Python, we can run a dice of how many sides we want. Here is a example of how a distribution of scoring goals is for the France National Team.
Now, its just play the games, we have the averages for each quarter for each team, so we play them against each other in the same way they are going to play in real life and then we are going to have the match results. But only one run could be at the mercy of the gods randomness and to avoid it, we are going to run the same model 10000, so the results would be less random.
After hours of dice rolling, we get that the top 5 favorites for the FIFA World Cup Qatar 2022 its:
The only way we can know the results for sure is to wait for the games. In the meanwhile, let’s cheer!