Tuesday, November 10, 2009

a new xBABIP Calculator

I've been a big fan of the hardball times xBABIP calculator over the last 6 months or so, but there were a couple of things that I didn't like about it. The first thing I didn't like, was having to stick in exact numbers for AB's, HR's, etc. When dealing with projections, I much prefer to work in percentages. With percentages you can see what their BABIP for a partial season, or even a span of several years, or a career much easier. I also am not so sure about the inclusion of stolen bases as a statistic.

I'm a big fan of the fangraphs website, and they provide a wide array of batted ball data for each player. I determined that BABIP is very strongly determined by a combination of LD%, GB%, FB%, IFFB%, HR/FB%, and IFH%. That is to say, as much as BABIP can be. This is right along with what the hardball times uses, except in my case, I'm dealing strictly with percentages, and I've substituted in IFH% as opposed to SB's. It's worth noting, that I'm not taking into account ballpark factors (which surely have some kind of effect on BABIP as well).

I came up with my numbers, plotting a large amount of data (3 years worth of individual player statistics), and doing a multi-variable regression analasys on it (I'm not sure if that's the right wording or not, I have no formal training in statistical analsys, just some stuff I've picked up).

Here's the equation I came up with:

xBABIP =0.391597252 + (LD% x 0.287709436 ) + ((GB% - (GB% * IFH%) ) x -0.151969035 ) + ((FB% - (FB% x HR/FB%) - (FB% x IFFB%)) x -0.187532776) + ((IFFB% * FB%) x -0.834512464) + ((IFH% * GB%) x 0.4997192 )

Here's a published view of a spreadsheet showing it in action:

http://spreadsheets.google.com/ccc?key=0AuaVTUnZda7fdFVpY2NoRC1zS1p0UlNPaDlVdlRhN1E&hl=en

Here's a download of the spreadsheet in open office (Forgive the lame hosting service, I wasn't sure where to upload):

http://www.filefactory.com/file/a1a2d5a/n/public_xBABIP_Calculator_ods

I've been using the following calculator (along with a number of other equations) to build my own projections for 2010, and here are a few of the interesting things I've noticed.

First off, LD% has a very strong correlation to BABIP (not exactly a revolutionary statement), but it's also very hard to project it seems. There seems to be a lot of luck built into it, so even taking career LD% rates is still factoring in some luck, so I tend to trend them closer towards the league average (19.5).

GB% is a little easier to predict Higher GB% tend to yield higher BABIP's, but that's based on your IFH% as well. A player who can post high IFH% with a lot of ground balls will greatly increase their BABIP, while a slow player with a terrible IFH% with a lot of GB% won't increase their BABIP nearly as much (makes sense).

FB% is again easier to predict then LD% typically, and high FB% tend to yield lower BABIP's, as they are more likely to record outs. But you've got to look at HR/FB, and IFFB% as well to get an accurate picture. A player who hits a ton of fly balls, but has a very high HR/FB rate, with a very low IFFB% (ryan howard), can post more respectable BABIP's (they have a better shot of landing if they are getting out of the in field)

HR/FB is also a little easier to predict, and doesn't directly effect your BABIP, it's only used to take the home runs out of your fly balls (which in turn helps your BABIP). One thing that strikes me as problematic here, is line drive home runs.

IFFB% seems somewhat player controlled, but also has a large luck component to it from year to year (probably largely due to sample size). This has a definite impact on your BABIP, as fly balls on the infield are automatic outs.

IFH% seems very speed dependant. The more in field hits you have, the higher your BABIP as well. This can vary from year to year with luck, but generally speedy players will post better (there are a few notable exceptions, like jason bay's abnormally high IFH%, which I chalk up to some luck) numbers. Ballpark factors play a role here I'm sure as well (which I'm not accounting for).

So in the end, what we get, is a way to take numbers directly from fangraph (over the course of a career, full season, or even partial season), and get a descent idea of what their BABIP should be like, and how lucky they have been.

I'm very interested in any feedback/critique that anyone has to offer, or any ideas on improving it. I've also got a number of other calculators (one that does batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, xSLG, that I'd be willing to throw out there as well, but I figured before I went through the trouble, I'd see what kind of buzz I get from this one.

Wednesday, October 28, 2009

Predicting Runs and OBP

OK, so we've talked about Batting Average, now it's time to move on to OBP, and Runs. OBP is a measure of a hitter's ability to get on base, while runs measures the number of times he's crossed the plate. OBP is a product of a batters batting average, combined with his ability to take walks. Runs, is a product of a runners ability to get on base, run the bases, and ultimately get some help from his teammates.

First, how do we project On Base Percentage. Just like batting average, this will fluctuate a lot from year to year, based on a hitters luck with balls in play (BABIP). Since we've already determined batting average, the main thing to look at now, is a players walk percentage. Unlike batting average, this is more about the players skills, and thus, it's easier to predict. This is a stat that players tend to improve on as they develop, so it's easier to expect a player to repeat, or even improve upon last years walk% (and thus improve their OBP). If a player's walk% remains relatively constant over their career, I'll use that walk%. If they have shown improvements in recent years, I'll tend towards those numbers. Rarely does a player actually decline their walk% (though it does happen). Unfortunately, built into OBP is a player's sacrifice fly's, and bunt's. This makes it impossible to simply project a players OBP using their batting average, and BB% alone (Sac fly's, and bunt's will vary a lot from player to player, and even from year to year). So the way that I project it, is to pick a year in the players career that best represents the walk % I predict for that given player (if it exists), and I'll add or subtract from that years OBP based on the batting average I projected for them. So if their batting average was 20 points better in my projection, I'll add 20 points to that particular OBP, and use that as my projection. For younger players, this is more difficult, and I find myself often just taking an existing OBP (career, or even 1 year), and tweaking it upwards. For young players, I will look at their minor league numbers as well, for a point of reference, as they tend to move close to (and sometimes exceed) their minor league numbers as time goes on.

Alright, so there's OBP, my method's aren't highly mathematical, but I think taking into account trends in BB%, and taking out the batting average fluctuations, makes for a fairly accurate OBP projection. Now it's on to predicting a players runs, and this is where my research gets a little more interesting. Runs are based on a few things, some of those things (OBP, Speed, Plate Appearances), are statistical in nature, while others (where they hit in the lineup, and how well the people behind them in the lineup are knocking them in) are out of the players control, and difficult to project. So what I've done, is thrown out what's out of the players control, and figured out a way to predict a players runs, based on their skills alone. So what you get is "skill runs", that is, the number of runs that a players skills should allow him to score. In a better batting slot, they will perform better then their skill runs, while in a worse one, they will perform under it. But batting slot is very difficult to predict, so for the sake of our projections, let's throw that out entirely.

So how did I do it? I Took a large sample of data (3 years worth of player data, that I took from fangraphs), and I ran some statistical analysis of a players runs scored as compared to their Stolen bases, OBP, and PA. When I did this analysis, I came up with the following equation to predict a players "skill runs": -90.241129 + (Plate Appearances x -90.241129) + (OBP x 200.8088179) + ( SB x 0.293131537 )

Using this equation, and my projections, here's a sample of what I came up with, for the leaders in runs scored next year in 2010, and I'm pretty happy with the results:


Player Skill Runs
Pujols 115
Ellsbury 111
Reyes 109
Figgins 108
Abreu 107


Now obviously there is a good chance that team factors will push these guys, and others, up and down in the list, but given skills alone, this is where I project them to be. Note: I have everyone set t0 700 Plate Appearances currently, so that also skews the results, more accurate PA projections will change this.

At the bottom end of the spectrum, is Benjie Molina with 80 Runs. Remember, that's with him projected to have 700 plate appearances, which he's not going to do, nor has he done at any point in his career. Interestingly, there is not a huge difference in runs scored between the top, and bottom players, this just shows that by a large margin, plate appearances are the biggest factor in a players ability to score runs (which makes sense).

Tuesday, October 27, 2009

Predicting Batting Average

I'm just coming off my third season of fantasy baseball, and I must say, I'm hooked! I haven't been this into baseball since I was a kid, collecting baseball cards, and watching all the cub games on WGN. It's interesting, in retrospect I now think to myself: "Boy, those stats on the back of the card actually mean something".

Anyway, I've had enough of picking through other people's rankings of players, this year I decided not to let them have all the fun, and I'll do it myself. First up, I'm going to put together my own projections. I know that projection systems exist, but I think it's fun to do my own, making the numbers better match my opinions of players.

So first up, is batting average. Batting Average is one of the more variable stats, and it's based pretty heavily on luck (which is why it fluctuates from year to year). So I don't expect my projected batting averages to be spot on, they will always fluctuate, rather I will try to project something in the middle. I do not have a highly scientific way of calculating batting average, but I'll just go over what I know about it, and give a rough idea of how I make my projections

Batting average is determined by a couple things: BABIP, and strikeout %. Strikeout % is something that's almost completely within a players control, so it's a good stat to look at. Strikeout rate is also something that players tend to improve on as they develop, I generally consider players under 27 still in development, and I'll be more likely to believe in or project improved strikeout rates in those younger players. BABIP is a highly complicated stat that takes into account a lot of stuff, I'll just briefly talk about it. First off, speed is a factor, faster players can post a higher BABIP, because they will beat out more infield grounders. LD%, GB%, FB% are all factors as well, as line drive's have the best chance of landing for a hit (by a long shot), ground balls are second most likely, and fly balls are least likely. So what does this mean? Fly ball hitters post worse BABIP's, and ground ball hitters post better ones. Line drives tend to be highly variable from year to year for most players. Generally speaking, ground ball hitters hit for better average. Anyway, one of the key things about BABIP, is that while it fluctuates a lot with luck, a career BABIP is usually a good indicator of a players future BABIP. That is, unless they suddenly turn from a fly ball hitter, into a ground ball hitter, or vice versa. As players get older, and slower, their BABIP will also fall a little, as they lose their speed. Young players are extremely hard to predict in terms of BABIP, and for that, I found this nifty BABIP calculator tool

Anyway, so to project a players batting average, I first look at their career numbers (BABIP, K%, .AVG). If their career numbers fall in line with what I would expect, then I go with their career batting average for my projection. If, over the course of their career, their strikeout rate has increased, then I will trend their batting average towards the upper end. Since young players can be extremely difficult to predict, I will actually try to predict their K%, and BABIP (using the calculator), and then using those 2 numbers, I hit the fan graphs leaderboard page, and find another player who posted similar numbers, and use his batting average.

There you have it, there's definitely some subjectivity built into my system. I don't expect people to use my projections as the word of god, rather I hope to find that people are interested in/learn from the process I use.