I like college sports and I like data. Analytics in sports is a growing industry. Many of the names in this industry are familiar to avid fans (Ken Pomeroy, Jeff Sagarin, Ken Massey, Bill Connelly, to name a few). I am not a familiar name, and I take a much different approach to my analytics than the giants in the industry.
I do not use any advanced metrics in my projections. My input sources are minimal: players on the current roster, coaches, and venues. I assign a value to each of these, based on what I’d call macro-data; I am looking at full-game and full-season performance rather than every play/possession as is the more common practice in the industry. I’m keeping it simple for a few reasons. First, this is just a hobby for me and my time spent on sports analytics is very limited. Second, I am mining for data that is free, publicly available, and easy to find. Finally, the results are surprisingly accurate with this minimal input data. My philosophy on what makes a team win is very similar to Bill Connelly’s mantra of talent acquisition, development, and deployment. Teams with the most talent are most likely to win, and teams that are well-coached are more likely to beat teams that are poorly-coached, even with a modest talent disparity.
It is impossible to build a predictive model that is 100% accurate. Upsets happen, which is why I assign win probabilities rather than straight picks for each game. Teams with a high win probability do win most of the time (the higher the probability, the more likely a win), but that should not be confused with a guarantee. I will never place a 100% (or 0%) win probability on any team on this site; it is only 100% (or 0%) once the game has completed. I am aiming to achieve the highest success rate possible, which I hypothesize is somewhere around 85% for college football (probably similar for basketball, but I haven’t tested that model as thoroughly yet). In testing my college football model with 2014 data, I was able to generate a 72% success rate, and with 2015 data a 76% success rate. My first live season was 2016, which had a 71% success rate.
I made a few minor revisions to my formulas with the intention of increasing the success rate for 2017. I will again be posting live updates weekly throughout the 2017 season.