After some deliberation I decided to name the “machine” that is predicting things for us here. His name is Steve Machine and he is a friendly guy. Sometimes he is hosted elsewhere, depending on what he is working on, but for now he is on my local Macbook.
If you remember, last week we used a linear regression to predict the final score of the FC Dallas - New England Revolution game. That prediction gave us the estimated scoreline of C Dallas - 1.789 goals to New England - 1.124 goals. The final score wound up 1-1. The betting line last week had FC Dallas as a half-goal favorite. Our prediction was pretty much in line with the bookies, so that’s a good place to be!
For week two, we are going with a different method for predicting the scoreline. Instead of using a linear regression we are going to use a K-Nearest Neighbors (KNN) model today. Unlike our regression that we used last week that gave us an actual number as a predictor, our KNN model is a classification model. That means that we have “classes” for the model to train and then predict. Think of it this way: at an intersection you can go a number of different directions (left, right, straight) - those would be the “classes” that a model would try to predict. So using all of the information, your age, previous turning patterns, speed at which you approached, traffic in each direction, etc the model would predict which direction you would go. That’s one example of what a classification problem is and one method of solving it.
I should probably give a high-level overview of how exactly the KNN model determines what is “closest” and thus gives us our prediction. Well that varies depending up on the algorithm that we select. Common usage has us using Euclidian Distance to calculate how far apart two points are. Euclidian distance is just the straight line difference between the locations on a multidimensional space. In other words - draw a straight line between two points and that’s the distance. This is what we used here for simplicity sake. There are several other methods that are super valuable, and KNN models can be enhanced in several ways, but I encourage you to read up on Neighborhood Components Analysis and Large Margin Nearest Neighbors if you are interested. Also feel free to ask away.
Our K-Nearest Neighbors model makes the assumption that similar things stay close together. What that really means is that it will take in the data that we give it and then predict a score for each team. The number of goals scored is the number of classes that we have. We have nine classes for this problem (zero goals all the way to eight goals) simply because no one has scored more than eight goals in an MLS game that I could find. With all of the data that we have collected and with the information we have about tomorrow’s FC Dallas game versus the LA Galaxy (presumably sans Zlatan), the machine learning model makes a prediction.
I like KNN models for a number of reasons, but possibly the biggest one is that it is relatively simple. The whole “birds of a feather flock together” saying is pretty well represented in the methodology behind building KNN models. If previous game results and the statistics surrounding those results are good predictors then we will have a halfway decent model on our hands.
Steve predicts a final score of (Bovada betting lines in parentheses):
FC Dallas: 2 (-1.0 / +120)
LA Galaxy: 1 (+1.0 / -145)
The betting lines have shifted toward FC Dallas of late, likely because of word coming out about Zlatan missing the game. I will say that I am rather optimistic about this result prediction. The possibly poor weather tomorrow does have me a bit worried but for now the weather is supposed to be clear. The FCD facilities guys are the absolute best so any weather in the morning should not be a problem.
At the behest of some friends and colleagues I have decided to add a game this week that is not FC Dallas related. DC United vs New York City FC was the game that was most requested and here is the predicted result by Steve:
DC United: 1 (+0.5 / +110)
NYCFC: 1 (-0.5 / -130)
This one is a bit more interesting because Steve thinks that the teams won’t score the three goals predicted by most betting models and Steve actually predicts a higher likelihood of DCU winning vs NYCFC winning. Something interesting to keep an eye on. Next week Steve and I will predict another two match results and then if there is enough interest we might work our way up to the entire league week predictions. Who knows?
Let me know in the comments what you liked and what you would like to see next week.