BJOL College Football Power Ratings - The System
By Bill James
October 23, 2007
We have added here, as our first non-baseball feature, power ratings for college football teams. There are a lot of sources that have these power ratings, and I’ve wanted to have my own for many years. For probably twenty years, since we got computers with spreadsheets, I have tried once or twice a year to make a rating system work on a spreadsheet. I would always put five or ten or twenty-five hours into it, typing in scores, before concluding that I couldn’t make it work. But this year (2007), for some reason, I was finally able to get all the columns in the right place.
Here’s how my rating system works. Every team in the country is given a starting value of 50. It doesn’t matter what the starting value is. . .in fact, you wind up with the teams in the exact same order if you use as a start point the assumption that Middle Tennessee State has an initial value of 6000 and everybody else has an initial value of zero. Giving everybody an start value of 50 is just a way of saying “we’ll start with the assumption that everybody is the same.”
For every game that is played, we alter the rankings in this way. Suppose that Clemson is playing North Carolina State, at NC State, and suppose that Clemson wins 42-20. We add the points for the two competitors together—50 for each team, a total of 100—and divide that by two. Since Clemson won by 22 points we make that +11 for Clemson and –11 for NC State, making an “output score” for the game of 61.00 for Clemson, 39.00 for NC State.
Except that the game was played at NC State—which, by the way, it actually was; this is an actual game. Since the game was played at NC State, we would expect NC State to have about a six-point advantage if the two teams were even. Clemson overcame that six-point disadvantage to win by 22, meaning that they appear to be, based on this one contest, 28 points better than NC State. So we add three to the road team, subtract three from the home team, and we make it Clemson 64.0, NC State 36.0.
But this is just one game. Clemson’s scores for their first six games are 50.0 (for beating Florida State at home 24-18), 58.5 (for beating Louisiana-Monroe at home by 49-26), 61.0 (for beating Furman at home, 38-10), 64.0 (NC State), 48.0 (for losing to Georgia Tech on the road, 3-13), and 38.0 (for losing to Virginia Tech at home, 23-41).
We throw out the Furman game, because Furman is not a Division 1 program, so they’re not in our system. We don’t know what that means, but the average of the other five games is 51.7:
| Florida State | 50.0 |
| Louisiana-Monroe | 58.5 |
| North Carolina State | 64.0 |
| Georgia Tech | 48.0 |
| Virginia Tech | 38.0 |
| Average: | 51.7 |
North Carolina State’s average, for their first six games, is 41.4:
| Central Florida at home | lost 23-25 | Score: 46.0 |
| Boston College on the road | lost 17-37 | Score: 43.0 |
| Wofford at home | won 38-17 | Doesn’t count; not Division 1 |
| Clemson at home | lost 20-42 | Score: 36.0 |
| Louisville at home | lost 10-29 | Score: 37.5 |
| Florida State on the road | lost 10-27 | Score: 44.5 |
| Average | | Score: 41.4 |
The output figures for the first round, then, are 51.7 for Clemson, and 41.4 for North Carolina State.
We then repeat the process, using the first-round output figures as the starting values for the second round. Rather than adding 50 + 50 and dividing that by two (to determine the combined strength of the two combatants), we add 51.7 to 41.4, and divide that by two—making 45.55. Clemson now comes in 14 points above that number—59.55—and NC State comes out 14 points below that number, 32.55.
Again, these output figures are averaged for the five Division-1 games of each team, making, in the second round for Clemson:
| Florida State | 52.3 |
| Louisiana-Monroe | 56.1 |
| North Carolina State | 59.55 |
| Georgia Tech | 51.8 |
| Virginia Tech | 40.7 |
| Average: | 52.28 |
And in the second round for North Carolina State:
| Central Florida at home | 41.2 |
| Boston College on the road | 42.7 |
| Clemson at home | 32.55 |
| Louisville at home | 34.1 |
| Florida State on the road | 41.7 |
| Average | 38.43 |
We then repeat the process again, using 52.28 as the starting value for Clemson, and 38.43 as the starting value for North Carolina State. We then repeat it again, using the output of the third round as the starting value for the fourth, the output of the fourth as the starting value of the fifth, etc.
After repeating this process a large number of times—about 50—the numbers stop moving. Eventually Clemson has a value of 53.24, NC State a value of 37.10, and, if you repeat the process again, Clemson will have a value of 53.24, NC State a value of 37.10. Clemson appears to be 16 points better than North Carolina State—a little more sometimes, a little less other times, but always in that area.
What happens is that, as the system gradually realizes that North Carolina State isn’t very good, Clemson gradually loses credit for beating them. On the other hand, the system gradually realizes that Florida State is pretty good, so it increases the points given to Clemson for beating Florida State. Eventually, the system gives Clemson more credit for beating Florida State at home by 6 than it does for beating NC State on the road by 22, because it realizes that Florida State is much tougher opponent than North Carolina State.
One of the problems with rating systems is that you don’t know how accurate they are because, in real life, you don’t know how good the teams really are. A couple of years ago I asked my son Isaac, who is a computer programmer, to program an experiment to dodge that problem. We actually ran the experiment with basketball teams, which is a lot more complicated—three times as many teams, each of them playing three times as many games, but you get a lot more reliable data that way. Anyway, in this experiment we generated 330 imaginary basketball teams, generated a schedule for each team, and randomly assigned “true performance levels” to each team. We then generated a score for each game, based on the assumption that the teams would play within some range of their true value—sometimes they’d play a little better than their true performance level, sometimes a little worse. Sometimes a team would lose a game that they should win.
We then built a wall between the scores and the true performance levels, and rated the teams on the output scores, but without any knowledge of their true performance level. We then rated the teams by a variety of different methods, and rated the rating systems based on how well the output matched the underlying true performance level. . .and, of course, repeated the experiment thousands of times.
Our conclusion: all of the rating system performed at essentially the same level, except that the one system on which the NCAA places the most reliance, the RPI system, doesn’t really work at all. That’s overstating it; the RPI system distinguishes between good teams and bad teams at some level, but it was far less accurate than this system or than almost any of the competitive ranking systems. It was the least accurate system we could find. I think that says something very interesting about the relationship between authority and knowledge, but let’s move on.
Anyway, that’s how the system works. The system tries to push Clemson’s score for the Florida State game and their score for the NC State game—and every other game—to one common point, which is the true value of the Clemson team. It can’t exactly do this, of course, because teams play better sometimes than they do other times. The system is comparing every score to every other score in a vast web, searching for the perfect balance at which every score is as nearly as possible to what we would have expected it to be based on all of the other scores. When the maximum possible balance has been achieved, the system stops moving. Clemson is “rated” at 53.24—meaning 3.24 points better than an average Division 1 football team—and NC State is rated at 37.10, meaning 12.9 points worse than an average Division 1 football team.
The current ratings for all NCAA football teams are on this system somewhere, or anyway should be.
Note added October 23: A gentleman who works for Baseball Info Solutions, Jon Vrecsics (which I think is pronounced "Very Slick"), challenged the decision to treat the home field advantage as six points. This led some research, by which we learned that
a) When two Division-1A football teams meet, the home team (this year) has won by an average of 6.55 points, but
b) 2.73 of those points are explained by the fact that the home teams in these games were actually the better teams, since strong programs schedule more home games.
I thus changed the +/- 3 points for home/road in the system to +/- 1.9 points.
That's what we mean by "Beta". ...
Note added November 4, 2007: I have now switched from ranking teams so that an average Division 1A team is 50.000 to ranking them so that an average team is 100.000. This was done because I decided to include a few more teams in the ranking system. I was going to add the Ivy League, but dropped that idea when I realized that there were NO points of contact between the Ivy League and the major college programs. Not only have the Ivy League teams not played any games against major college programs this season, I don't believe they have played any games against any teams that have played against a major college program. So I couldn't expand the system in that direction.
I decided to add the teams from the Southland Conference because teams from the Southland Conference, I believe, have played more games against Division 1A football programs than any "excluded" conference, thus providing a more solid basis for comparison. Four of the eight Southland teams are weaker than any Division 1A program, but the other four are not, and the best team in the conference, McNeese, would be essentially an average Division 1A program.
The Southland Conference is very far from being the weakest football conference in America, but as I added the Southland teams to the data, I began to get more games in which teams had negative scores. The weakest team in the Southland, Stephen F. Austin, is about 41 points behind an average Division 1A team. It was thus apparent that if we extended this further, including weaker conferences, we would have teams with negative rankings. I thought this was not good, so I moved the center from 50.00 to 100.00 so that teams would not have negative scores.
It also occurred to me, as I was working on this, that I could measure not only each team's strength, but also their consistency. . ..I could produce a ranking of the most CONSISTENT football teams in the country, and the most inconsistent, based on the Standard Deviation of their weekly scores. These are the most consistent teams in college football, as of November 3:
| 1. | Minnesota | Standard Deviation: 2.700 |
| 2. | Southeastern Louisiana |
| 3. | McNeese |
| 4. | Idaho |
| 5. | Fresno State |
| 6. | Utah State |
| 7. | Kansas |
| 8. | North Carolina |
| 9. | Southern Methodist |
| 10. | Illinois |
And these are the most INconsistent:
| 1. | UCLA |
| 2. | Northwestern State |
| 3. | Iowa State |
| 4. | Utah |
| 5. | Central Michigan |
| 6. | Clemson |
| 7. | Mississippi State |
| 8. | Central Arkansas |
| 9. | UNLV |
| 10. | Colorado |