We are back with another week of Steve predicting the final standings based on some nontraditional predictors. This week I have brought with me a correlation matrix that shows the strength of relationships between different pieces of data. The rankings changed in some interesting ways after the third round of games and based on our Steve’s prediction, we have some insights into what he feels is important.
Also, it is important to note that this series is simply a fun exercise into variable exploration and correlation between seemingly unrelated pieces of information. All comments are welcome though, so please
Predictions - Week 4
First - the predictions. In the west LAFC slips from the top spot. I suppose I should probably say that Seattle has overtaken them, but realistically a missed chance for LAFC in the final seconds against NYCFC cost them a chance to be top for another week. It is important to note that I included the predicted points out to two decimal points this week so that everyone can see the tiebreakers more clearly. The bottom four stayed the same with only minor point total changes, but Sporting KC joined the non-playoff bound teams again this week and Portland snuck up into the 7th seed (by dropping a point from their predicted season total mind you). Vancouver dropped two points and subsequently is down to the 6th spot. Overall no large changes here.
|Seattle Sounders FC||54.53|
|Los Angeles FC||53.89|
|Real Salt Lake||53.37|
|Vancouver Whitecaps FC||50.39|
|Sporting Kansas City||47.05|
|San Jose Earthquakes||38|
Now to the east. Some much larger changes here, and some things worth discussion. First and foremost, Atlanta’s is loved by Steve. I swear he probably has a Martinez jersey somewhere around here. Regardless of the team’s struggles he still predicts the team to catch fire and finish strong. The team from down south is back on top of the east followed by NYCFC, who makes a strong two and a half point jump from last week. Toronto, even with the victory and perfect record slips down a couple of spots to 3rd. Toward the bottom of the table we see more movement, some teams’ point totals did not change much (looking at you D.C. United), which is interesting but not shocking since goals, goal difference, assists, etc are not considered here. Orlando drops down six points from last week to sit comfortably at the bottom of the east. The Union stays 11th this week but Cincinnati makes a substantial rise (five points and three spots) to 7th. Lastly I will note Montreal who, while not moving in the standings per se, gained four projected points. Lots of subtle movement and a jam-packed east looks like fun to watch as the season goes along. There will likely be at least one team that fails to reach the 40 point mark in the east, probably two if we are being honest, but with our predictors this is what we are looking at presently.
|New York City FC||54.47|
|New York Red Bulls||51.07|
|New England Revolution||49.44|
|Orlando City SC||46.03|
|Columbus Crew SC||41.38|
Correlation Matrix - Gaining Insights
Alrighty, here is the good stuff. I was going to talk about something else this week, and we will eventually get to lots of pieces of neural networks, but I wanted to focus this week on correlation. Correlation is a connection or relationship between two, three, or more things. In this case we have lots of variables (our input and predictor data), and I wanted to show everyone the relationships that have been observed.
A correlation matrix is just a visual representation of the correlation coefficients (values) for all of our datapoints. I have included four six pieces of data that we are not using in our prediction so that you can see the relationships we are using and not using for comparison purposes. Points is what we are predicting so that is what I will focus on here.
What we typically look for in correlation matrices are either strong positive correlation or strong negative correlation. Positive correlation means that as one data point INCREASE, it shows an INCREASE in the values of the other data point (example: goals for & points). Dark blue results are what we are looking for to indicate positive correlation. Also important is negative correlation, which indicates that as the value for one data point DECREASES then there is an INCREASE in another data point (example: goal difference & goals against).
From our selected data points, the strongest correlation to points is shots on goal, which is not surprising. Shooting more, and shooting more on target for that matter, tends to increase goals scored. Fouls committed has a moderately strong negative correlation with points, which is interesting to be sure. Lastly, average age has a negative correlation to points that is not strong but sits right around -0.4. I find this interesting and a larger exploration of longer-term results would be beneficial, but I will leave that to someone else for now.
Most importantly, we can see that yes, many of our variables are NOT strongly correlated either positively or negatively to points. That makes some sense but again our sample size is relatively small (only 5 seasons). Feel free to dig into the correlation matrix and see if you can pull out any insights. There are some nuggets hidden in there that some insightful readers might be able to pull out. Feel free to comment and contribute however you feel about it. Steve will be back Friday with some more game predictions (hint: he is doing quite well).