clock menu more-arrow no yes

Filed under:

Machine Learning Season Prediction: After Week 1

How do things look after the first week of the season?

Here we are again after an interesting weekend of results. Our guys in Frisco worked hard and fought out a draw in their first game under new manager Luchi Gonzalez. LAFC came back against Sporting KC at the absolute death. FC Cincinnati came into the league and promptly was throttled by an energetic and clinical Seattle Sounders team.

We could go on and on about the results, but the real meat here is that we have more data with which to predict the 2019 season! Our neural network predicted the final points totals last week and had some understandable and some interesting results as well. There were some questions both on Reddit.com/r/MLS and on here as well about several things. I want to emphasize here that this is exercise not meant to provide us with the most accurate model, if we wanted that we would do several things differently. What this method and model does do is give us the opportunity to explore some relationships between team success and pieces of data that, at least on the surface, appear weakly correlated or unrelated.

Maths
Giphy

The Neural Network - a bit deeper

If you remember from last week, I walked through what machine learning is and the two big use cases for machine learning: regression and classification. We are going to keep using our original model for the season-long predictions but will use some different methods for game to game predictions and analysis. Our regression model points us at a specific number (final points totals) rather than a set of categories (think movie genres).

I was asked several times why I didn’t use linear regression for this prediction. The answer to that is: I didn’t want to do that. You could certainly use the dataset and predict point totals for 2019 using linear regression. That was not the goal here and as I said earlier the goal here is to find possible correlation and relationships between the input variables (the data collected) and the output variable (the actual prediction). I was building a vaguely similar neural network for something else and decided to go with this type of neural network instead of something else.

All of that being said, I am going to explain a little bit about what our neural network does and why it is interesting to examine as a method of prediction. Neural networks have three specific pieces - the input layer (the stuff we put in), the hidden layer or layers (where the magic happens), and the output layer (where we get told how amazing we are for our predictions). Neurons, nodes, units, variables are terms that all refer to the same things. I tend to use nodes but may user other terms as well, just know that they all refer to the various input and output data, as well as the mutations and variations of the input data.

Here’s a basic diagram breaking down the three parts of a neural network again.

A very basic neural network

A multilayer perceptron (MLP) regressor uses fully connected layers (this just means that every connection is explored and examined instead of just a few) and a number of hidden layers that are selected by hand. I determined the number of hidden layers for this model and will keep that number the same throughout the process for consistency.

One of my favorite things about using a neural network is the ease of using back propagation. Back propagation is the backwards work that checks for the error between prediction and actual values. There are lots of methods to do this involving math and calculations, but machine learning models do this as long as you direct them. Think of it this way: you tried to hit a ball in a batting cage and missed. In your head you went over why you missed and adjusted accordingly for future swings, and you do this every time you see a pitch. You attempted, failed and then worked backwards to adjust and make your next attempt even better. That’s essentially what back propagation is, computers just do it much faster than we do.

Predictions

Well here we are again, with the predictions for the East, West, and Supporters’ Shield. I’ll put the results up here and then go into a brief explanation of what changed.

Western Conference

name points
name points
Los Angeles FC 55
Seattle Sounders FC 54
FC Dallas 50
Real Salt Lake 50
LA Galaxy 48
Portland Timbers 48
Sporting Kansas City 48
Vancouver Whitecaps FC 48
Colorado Rapids 42
Houston Dynamo 41
Minnesota United 40
San Jose Earthquakes 37

The west has some movers but nothing too drastic. The biggest change from week 1 to week 2 is Colorado moving up seven points to the 9th spot. San Jose takes over the bottom spot with three points less than last week and the top three positions remain unchanged.

Now to the east...

Eastern Conference

name points
name points
Atlanta United 60
Toronto FC 52
New York Red Bulls 51
New York City FC 50
New England Revolution 47
Orlando City SC 47
FC Cincinnati 46
Montreal Impact 46
Philadelphia Union 45
Chicago Fire 44
D.C. United 44
Columbus Crew SC 43

Some interesting notes here with DC actually losing ground after a very good win and the Revolution gaining ground after a solid away draw. The Crew stay in the bottom, not sure what we can really read into that as they did lose at home but it is quite early. As the weeks go along we will see changes here (almost certainly). Atlanta went down three points and is tied with Orlando as biggest movers from last week, with Atlanta coming back to the pack and Orlando surging upward.

Connections are everywhere...
It’s Always Sunny in Philadelphia, Giphy

Last week’s results changed the table structure pretty strongly for only one week of data. Instead of just projecting (extrapolating) the stats of every team from week 1 over a full season, I weighted the first week of results against the previous data that we used and came to an area of solid middle ground. I will do the same each week so that it is consistent and we can watch the teams jockey and fight each other in our little simulation. Check back on Friday for the game score prediction (might even throw in some additional score predictions besides FC Dallas). Comment, chastise, condemn, mock, praise, thank - whatever you want just let me know what you think!

Oh and here’s the compiled SS standings based on this machine learning model.

Supporters’ Shield

teamName predicted_points
teamName predicted_points
Atlanta United 60
Los Angeles FC 55
Seattle Sounders FC 54
Toronto FC 52
New York Red Bulls 51
FC Dallas 50
Real Salt Lake 50
New York City FC 50
LA Galaxy 48
Portland Timbers 48
Sporting Kansas City 48
Vancouver Whitecaps FC 48
New England Revolution 47
Orlando City SC 47
FC Cincinnati 46
Montreal Impact 46
Philadelphia Union 45
Chicago Fire 44
D.C. United 44
Columbus Crew SC 43
Colorado Rapids 42
Houston Dynamo 41
Minnesota United 40
San Jose Earthquakes 37