clock menu more-arrow no yes

Filed under:

Steve Machine - Correlation Matrix

MLS: FC Dallas at Columbus Crew SC Greg Bartram-USA TODAY Sports

We are back with another week of Steve predicting the final standings based on some nontraditional predictors. This week I have brought with me a correlation matrix that shows the strength of relationships between different pieces of data. The rankings changed in some interesting ways after the third round of games and based on our Steve’s prediction, we have some insights into what he feels is important.

Also, it is important to note that this series is simply a fun exercise into variable exploration and correlation between seemingly unrelated pieces of information. All comments are welcome though, so please

Predictions - Week 4

First - the predictions. In the west LAFC slips from the top spot. I suppose I should probably say that Seattle has overtaken them, but realistically a missed chance for LAFC in the final seconds against NYCFC cost them a chance to be top for another week. It is important to note that I included the predicted points out to two decimal points this week so that everyone can see the tiebreakers more clearly. The bottom four stayed the same with only minor point total changes, but Sporting KC joined the non-playoff bound teams again this week and Portland snuck up into the 7th seed (by dropping a point from their predicted season total mind you). Vancouver dropped two points and subsequently is down to the 6th spot. Overall no large changes here.

Western Conference

Team Points
Team Points
Seattle Sounders FC 54.53
Los Angeles FC 53.89
Real Salt Lake 53.37
FC Dallas 51.74
Vancouver Whitecaps FC 50.39
LA Galaxy 49.24
Sporting Kansas City 47.05
Portland Timbers 44.83
Minnesota United 42.58
Houston Dynamo 40.32
Colorado Rapids 39.64
San Jose Earthquakes 38

Now to the east. Some much larger changes here, and some things worth discussion. First and foremost, Atlanta’s is loved by Steve. I swear he probably has a Martinez jersey somewhere around here. Regardless of the team’s struggles he still predicts the team to catch fire and finish strong. The team from down south is back on top of the east followed by NYCFC, who makes a strong two and a half point jump from last week. Toronto, even with the victory and perfect record slips down a couple of spots to 3rd. Toward the bottom of the table we see more movement, some teams’ point totals did not change much (looking at you D.C. United), which is interesting but not shocking since goals, goal difference, assists, etc are not considered here. Orlando drops down six points from last week to sit comfortably at the bottom of the east. The Union stays 11th this week but Cincinnati makes a substantial rise (five points and three spots) to 7th. Lastly I will note Montreal who, while not moving in the standings per se, gained four projected points. Lots of subtle movement and a jam-packed east looks like fun to watch as the season goes along. There will likely be at least one team that fails to reach the 40 point mark in the east, probably two if we are being honest, but with our predictors this is what we are looking at presently.

Eastern Conference

Team Points
Team Points
Atlanta United 55.76
New York City FC 54.47
Toronto FC 53.12
New York Red Bulls 51.07
New England Revolution 49.44
Montreal Impact 48.96
FC Cincinnati 47.53
Orlando City SC 46.03
Chicago Fire 45.62
D.C. United 45.5
Philadelphia Union 44.56
Columbus Crew SC 41.38

Correlation Matrix - Gaining Insights

Alrighty, here is the good stuff. I was going to talk about something else this week, and we will eventually get to lots of pieces of neural networks, but I wanted to focus this week on correlation. Correlation is a connection or relationship between two, three, or more things. In this case we have lots of variables (our input and predictor data), and I wanted to show everyone the relationships that have been observed.

A correlation matrix is just a visual representation of the correlation coefficients (values) for all of our datapoints. I have included four six pieces of data that we are not using in our prediction so that you can see the relationships we are using and not using for comparison purposes. Points is what we are predicting so that is what I will focus on here.

What we typically look for in correlation matrices are either strong positive correlation or strong negative correlation. Positive correlation means that as one data point INCREASE, it shows an INCREASE in the values of the other data point (example: goals for & points). Dark blue results are what we are looking for to indicate positive correlation. Also important is negative correlation, which indicates that as the value for one data point DECREASES then there is an INCREASE in another data point (example: goal difference & goals against).

From our selected data points, the strongest correlation to points is shots on goal, which is not surprising. Shooting more, and shooting more on target for that matter, tends to increase goals scored. Fouls committed has a moderately strong negative correlation with points, which is interesting to be sure. Lastly, average age has a negative correlation to points that is not strong but sits right around -0.4. I find this interesting and a larger exploration of longer-term results would be beneficial, but I will leave that to someone else for now.

Correlation matrix for our regression.
Jeff Loftin

Most importantly, we can see that yes, many of our variables are NOT strongly correlated either positively or negatively to points. That makes some sense but again our sample size is relatively small (only 5 seasons). Feel free to dig into the correlation matrix and see if you can pull out any insights. There are some nuggets hidden in there that some insightful readers might be able to pull out. Feel free to comment and contribute however you feel about it. Steve will be back Friday with some more game predictions (hint: he is doing quite well).