This week should see FC Dallas presented with a very interesting opportunity - to go on the road and dominate possession. Possession is not wildly correlated with winning soccer games, nor is it (generally speaking) directly related to scoring opportunities. That said, it is easier to score if you have the ball than if you do not have the ball, so for that I am excited. Luchi has the team leading the league in possession by more than three percentage points per game, which is exciting and a rather big change from previous seasons with the club in Frisco.
Steve was correct in picking the results of last weekend’s two game choices, putting him at two correct and one incorrect on the season (no pushes). This week we have another two games to predict and I am going to give you the predicted probabilities for the three different results (win, draw, loss). This week our model choice was not so much a typical machine learning model, but we used a Monte Carlo simulation to give us our predicted outcomes.
Monte Carlo Simulation
A Monte Carlo simulation is a technique used commonly in a variety of fields with finance being a rather common application. These simulations are used to predict the probability of different outcomes that can be difficult to predict. Uncertainty and difficult variables, as well as risk/reward come into play quite frequently in the use of Monte Carlo simulations and therefore I thought it would be a good method for us to predict scorelines. Soccer has lots of risk and uncertainty and the relationship between variables to the final score is not always apparent. There can appear to be much randomness in soccer statistics and so for this week we are using a Monte Carlo simulation.
To arrive at the predicted number, we sample values of x (the input) to therefore predict y (the outcome) value. Steve did this 10,000 times - much faster to do programmatically rather than manually - and come about our probabilities for win, draw, and loss. Steve then calculated the variance to show us the accuracy of our model. The variance is defined as the averages of the square distances from the mean. We take our predicted results, square those results and then subtract those from the square of the average of the overall average. I will not dive too deep into this today but 10,000 seemed to be a solid enough number and thus I stuck with it today. Sometimes these simulations require more than 10,000 and sometimes they require fewer, but selecting the correct number of samples is important, which we will get into in more detail next week.
FC Dallas at Columbus Crew
Projected Columbus win: 53.4%
Projected FC Dallas win: 33.1%
Projected draw: 13.5%
Los Angeles FC at New York City FC
Projected LAFC win: 64.8%
Projected NYCFC win: 26.3%
Projected draw: 8.9%
Thus, we take the highest probability outcome and use that as our prediction. Columbus beats FC Dallas (boo) and LAFC beats NYCFC. There are some notes to be found about using these types of simulations - most notably that they assume perfectly efficient markets or environments. This is likely not true in our situation but we will make this naive assumption for the purposes of this prediction. Please guys, comment and let me know if you have any questions. Who do you think wins today and why?