I like college sports and I like data. Analytics in sports is a growing industry. Many of the names in this industry are familiar to avid fans (Ken Pomeroy, Jeff Sagarin, Ken Massey, Bill Connelly, to name a few). I am not a familiar name, and I take a much different approach to my analytics than the giants in the industry.
I do not use any advanced metrics in my projections. My input sources are minimal: players on the current roster, coaches, and venues. I assign a value to each of these, based on what I’d call macro-data; I am looking at full-game and full-season performance rather than every play/possession as is the more common practice in the industry. I’m keeping it simple for a few reasons. First, this is just a hobby for me and my time spent on sports analytics is very limited. Second, I am mining for data that is free, publicly available, and easy to find. Finally, the results are surprisingly accurate with this minimal input data. My philosophy on what makes a team win is very similar to Bill Connelly’s mantra of talent acquisition, development, and deployment. Teams with the most talent are most likely to win, and teams that are well-coached are more likely to beat teams that are poorly-coached, even with a modest talent disparity.
It is impossible to build a predictive model that is 100% accurate. Upsets happen, which is why I assign win probabilities rather than straight picks for each game. Teams with a high win probability do win most of the time (the higher the probability, the more likely a win), but that should not be confused with a guarantee. I will never place a 100% (or 0%) win probability on any team on this site; it is only 100% (or 0%) once the game has completed. I am aiming to achieve the highest success rate possible, which I hypothesize is somewhere around 85% for college football (probably similar for basketball, but I haven’t tested that model as thoroughly yet). In testing my college football model with 2014 data, I was able to generate a 72% success rate, and with 2015 data a 76% success rate. I’m hoping to achieve similar results going live in 2016, and will post progress updates throughout the season.
Occasionally my data will highlight a game that appears to be a “safe” pick against the spread. This feature of the model is still in its infancy, but I will be testing it live in 2016 and aim to release a more polished Against the Spread feature in 2017.