Monday, April 26, 2010

My Baseball Ratings

Emphasis on the "My."
You remember a post several weeks ago, the one where I described my gambling epiphany, where I extolled the virtues of a purely objective analytic approach? Yes, of course. I was happy because I had found such a system, and put it to fairly successful use during the end of the college basketball season. But that system only worked for that sport, and of course that system was completely stolen, mostly from Ken Pomeroy's ratings.
So now that college basketball is over, and I've promised not to make subjective bets, I basically have to sit on my hands until the fall, when I could try stealing another system, this time applying it to football. Of course that's not a preferred course of action for someone like me, especially with probably my favorite sport--baseball--occurring daily over the next several months.
Baseball is a tricky sport to handicap, but it does provide about 15 games per day, and its every-day nature does force books to produce lines on extremely short notice, so the chances that a rogue line appears and blesses me with a wonderful edge opportunity are high. I only needed to find a rating system to do the work for me.
This would seem an easy task, what with the proliferation of high-level statistics and statistical analysts working on the game. Alas, most all of the work being done is for "legitimate" purposes, or, more often, on the individual level, not the team level.
As I thought about it, I came to believe that I could just create my own rating system, or at least I thought I would enjoy trying. What follows is the surprisingly satisfactory result of this effort.

My first choice in creating the ratings was to use actual closing gambling lines. I get them from Pinnacle, which runs a site that is the go-to spot for intelligent gamblers (no, I don't have an account there, so I officially am still not "intelligent"), largely because their lines are so good and fair. For each game played, I convert a team's moneyline into a win%, then adjust that based on home-field advantage. I then just average all the individual game ratings to produce an overall team rating. This is pretty simple, converting to an opponent-neutral requires more work. For each team, I had to input their schedule, and add a formula to the spreadsheet to automatically adjust the opponent's rating, again factoring in home-field. Once I took the time to enter this stuff once, I never have to do it again.
All that gives me both a team rating and an opponent rating for each team. To combine the two and create the master rating, I weight the team rating twice as heavily as the opponent rating, then regress the team rating toward it's preseason projection and the opponent rating toward an average opponent (I haven't yet decided how much to regress these, but for now I'm using 50%. If that is right--and it seems pretty good--then I'll just need to lower the amount as the season progresses).

Now that that is out of the way, let me reveal the actual results. First, the National League, updated through right now:
1. St Louis .54841
2. Philadelphia .53803
3. Atlanta .53528
4. Los Angeles .53119
5. Colorado .52630
6. Chicago .51071
7. Milwaukee .50699
8. Florida .50060
9. Arizona .50055
10. San Francisco .49811
11. Cincinnati .48843
12. New York .48437
13. San Diego .48317
14. Houston .46747
15. Washington .46143
16. Pittsburgh .45581

The great thing about the rating is that it doubles as an expected winning percentage, neutralized for home-field, against a theoretically .500 opponent.
To my eye, it looks like the extremes aren't quite extreme enough. St Louis would finish a 162 game season with just 89 wins given that percentage, and Pittsburgh with a relatively robust 73 (though you'd need to subtract one or two from PIT's total there because they wouldn't face a purely .500 schedule, just because they won't benefit from playing their sorry selves. Still, the numbers look just slightly too centralized). Over time I think this will work itself out, especially after I start regressing less, and anyway, if your system has a slight bunching problem rather than an outlier problem, then you're in much much better shape.
Here is the American League:
1. New York .56907
2. Boston .55092
3. Tampa Bay .53248
4. Minnesota .52342
5. Texas .51473
6. Los Angeles .50679
7. Chicago .50555
8. Oakland .49436
9. Seattle .49213
10. Detroit .48937
11. Cleveland .48890
12. Kansas City .47113
13. Baltimore .47070
14. Toronto .46143

A quick look at these and we can get an instant sense of where our gambling opportunities lie.
- Oakland jumps out. They're currently in first place, but still I think most people think they are terrible. Their pitching is actually quite good, and bookies know this. In fact, depending on neutral pitching matchups, Oakland playing at home ought to be favored over Minnesota every time, and over even Tampa some of the time.
- The Yankees are very very good. There is necessarily some public bias in the ratings, because lines are often goosed in favor of popular teams. This doesn't happen as much at a book like Pinnacle, but it's still there. It's instructive, though, to know that they've played the toughest schedule in the AL so far, they've been the favorite in every game but the three at Boston (this includes series at Tampa and at LA), and that even their individual game lines playing at Boston produced close to 50-50 expectations.
- The Florida Marlins, Atlanta Braves, and Washington Nationals are three teams all in the same division who rate higher than expected. This despite the fact that Philly is very nearly the top team. The NL East is for real. Washington has played the toughest schedule thus far in the NL, tougher even than the Yankees mentioned above. The Marlins have played 12 road games already, but were actually favored in six of them, including one against Philly and the first two games of the season at the Mets (special wow-factor that they were favored at NY on opening day, opposing Johan Santana). I've got two eyes on them for the time being.
- Easiest schedules so far: Phillies at .450, Detroit and Toronto at .477. Compare these to Washington at .546 and the Yankees at .532. Phillies have essentially played all games against Pittsburgh, while Washington has played all games against a team like St Louis.

I still need to add a category for future schedule strength. This will actually allow me to create predicted end-of-season standings, very valuable for futures betting on division winners. (I can tell you now without even having created these that Washington at 25-1 to win the NL East, while definitely a long shot, is just as definitely a positive value bet. To be positive value, the Nats would need to have just a 3.8% chance of winning the division.)
In addition to futures, this system will of course work for individual game lines. Just take the teams' ratings plus apply a personalized factor for each starting pitcher, and you've got a perfect line to compare to the bookie's offering.
I'm actually almost done figuring individual pitcher factors, too. I'll report back in the next couple days on that, since there are some interesting results. I'll also try to report back with observations and updated ratings, plus a report on the gambling results, once I feel confident enough to start placing wagers (I'm not far away).

No comments: