Tuesday, November 10, 2009

a new xBABIP Calculator

I've been a big fan of the hardball times xBABIP calculator over the last 6 months or so, but there were a couple of things that I didn't like about it. The first thing I didn't like, was having to stick in exact numbers for AB's, HR's, etc. When dealing with projections, I much prefer to work in percentages. With percentages you can see what their BABIP for a partial season, or even a span of several years, or a career much easier. I also am not so sure about the inclusion of stolen bases as a statistic.

I'm a big fan of the fangraphs website, and they provide a wide array of batted ball data for each player. I determined that BABIP is very strongly determined by a combination of LD%, GB%, FB%, IFFB%, HR/FB%, and IFH%. That is to say, as much as BABIP can be. This is right along with what the hardball times uses, except in my case, I'm dealing strictly with percentages, and I've substituted in IFH% as opposed to SB's. It's worth noting, that I'm not taking into account ballpark factors (which surely have some kind of effect on BABIP as well).

I came up with my numbers, plotting a large amount of data (3 years worth of individual player statistics), and doing a multi-variable regression analasys on it (I'm not sure if that's the right wording or not, I have no formal training in statistical analsys, just some stuff I've picked up).

Here's the equation I came up with:

xBABIP =0.391597252 + (LD% x 0.287709436 ) + ((GB% - (GB% * IFH%) ) x -0.151969035 ) + ((FB% - (FB% x HR/FB%) - (FB% x IFFB%)) x -0.187532776) + ((IFFB% * FB%) x -0.834512464) + ((IFH% * GB%) x 0.4997192 )

Here's a published view of a spreadsheet showing it in action:

http://spreadsheets.google.com/ccc?key=0AuaVTUnZda7fdFVpY2NoRC1zS1p0UlNPaDlVdlRhN1E&hl=en

Here's a download of the spreadsheet in open office (Forgive the lame hosting service, I wasn't sure where to upload):

http://www.filefactory.com/file/a1a2d5a/n/public_xBABIP_Calculator_ods

I've been using the following calculator (along with a number of other equations) to build my own projections for 2010, and here are a few of the interesting things I've noticed.

First off, LD% has a very strong correlation to BABIP (not exactly a revolutionary statement), but it's also very hard to project it seems. There seems to be a lot of luck built into it, so even taking career LD% rates is still factoring in some luck, so I tend to trend them closer towards the league average (19.5).

GB% is a little easier to predict Higher GB% tend to yield higher BABIP's, but that's based on your IFH% as well. A player who can post high IFH% with a lot of ground balls will greatly increase their BABIP, while a slow player with a terrible IFH% with a lot of GB% won't increase their BABIP nearly as much (makes sense).

FB% is again easier to predict then LD% typically, and high FB% tend to yield lower BABIP's, as they are more likely to record outs. But you've got to look at HR/FB, and IFFB% as well to get an accurate picture. A player who hits a ton of fly balls, but has a very high HR/FB rate, with a very low IFFB% (ryan howard), can post more respectable BABIP's (they have a better shot of landing if they are getting out of the in field)

HR/FB is also a little easier to predict, and doesn't directly effect your BABIP, it's only used to take the home runs out of your fly balls (which in turn helps your BABIP). One thing that strikes me as problematic here, is line drive home runs.

IFFB% seems somewhat player controlled, but also has a large luck component to it from year to year (probably largely due to sample size). This has a definite impact on your BABIP, as fly balls on the infield are automatic outs.

IFH% seems very speed dependant. The more in field hits you have, the higher your BABIP as well. This can vary from year to year with luck, but generally speedy players will post better (there are a few notable exceptions, like jason bay's abnormally high IFH%, which I chalk up to some luck) numbers. Ballpark factors play a role here I'm sure as well (which I'm not accounting for).

So in the end, what we get, is a way to take numbers directly from fangraph (over the course of a career, full season, or even partial season), and get a descent idea of what their BABIP should be like, and how lucky they have been.

I'm very interested in any feedback/critique that anyone has to offer, or any ideas on improving it. I've also got a number of other calculators (one that does batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, xSLG, that I'd be willing to throw out there as well, but I figured before I went through the trouble, I'd see what kind of buzz I get from this one.

2 comments:

  1. have you run the r^2, corr and error on your formula? also, i wouldn't mind seeing you're ideas on batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, xSLG

    ReplyDelete
  2. I'm actually looking for a xHR/FB calculator, do you have one? You said you have batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, and xSLG calculators so I figured I should ask if you could give me the link for it. Thanks.

    ReplyDelete