Wednesday, February 8, 2012

Project Context Neutral Runs and RBIs

Projecting a players Runs and RBI’s is a pain, and it’s largely considered contextual. So if you’ve got some good hitters hitting in front of you, you’re going to get more RBI opportunities, or good players hitting behind you, more Run opportunities.

The problem with this, is that context changes often. A team that previously stunk, may have a guy or two breaking out, and now suddenly another player is thrust into a situation where he can generate more runs. A key guy might get injured, traded, or simply moved around in the lineup.

For this reason, I’ve been working on a way to project a guys runs and rbi’s based on his skills alone. I’ve tweaked my process a bit over the years, and here’s what I’ve found the best method:

xRuns = HR + -.218 + .191 * (BB - .333 * CS) + .273 * (HBP + 1B - .666 CS) + .363 * 2B + 1.366 * 3B + .505 * SB

So just to summarize what this means, Extra base hits generate more runs, with triples generating bonus runs (because they indicate a speedy player, who’s going to score on more Singles then other guys), and net stolen base gains also improve your chance to score runs.

xRBIs = 2 * HR + .640 + .004 * (BB + HBP) + .234 * 1B + .427 * (2B + 3B)

With this one, again we see hits generate RBI’s, with extra bases generating more. Home runs generate bonus runs because you’re knocking yourself in, as well as anyone on base, and home run hitters generate more Sacrifice Flies.

Let’s look at some sample results from my 2012 projections:

name xRuns xRBI
Kemp 109 108
Ellsbury 103 97
Bautista 100 113
Bautista 100 113
Kemp 109 108
P Sandoval 89 105

Ellsbury is an interesting one on this list, Sitting in the #1 hole he traditionally had very few RBI’s historically, this system picked him for an RBI increase last year based on his budding power (and his lineup position changed to fit his new skillset).

Sandoval is also an interesting inclusion, he’s had budding HR and 2B power, and a change in his context could put him in line for a lot of RBI’s.

Obviously this method is not perfect, context does exist, I just find that it’s so fluid throughout the year, it’s fun to just ignore it, and project based on a batters skills. I find this particularly satisfying in fantasy baseball, because it’s a fun way to identify breakout players. A player with budding power (ellsbury, granderson), will eventually have their lineup position improved to take advantage of that power. These are two guys who I specifically drafted last year based on my projections, who both had their context improve, to match their skills.

Monday, January 31, 2011

2011 Pitcher BABIP Calculator

We all know that pitcher BABIP is a difficult thing to predict. In Fact, we’ve got things like xFIP, and FIP, and tRA to help mitigate the unpredictability of it. However, for certain uses, however, it’s beneficial for us to pay attention to it. Fantasy baseball is one example of a case where FIP doesn’t necessarily do us a lot of good. In this case we’d rather get an idea of what their real ERA is going to look like. To that end, I’ve developed a way to predict a pitchers BABIP given a few other statistics that a pitcher has some control over.

Specifically, what I’ve done, is take 3 years of team data to predict BABIP for the various batted ball types. This helps factor in things like a slow infield, a high outfield wall, and other park based factors. It also factors in things like infield defense (on ground balls), and outfield defense (on outfield fly balls). Using 3 years of data isn’t perfect, as teams change over time, but I think you’ll find that it does a pretty good job. Another problem with my implementation, is that I’m assuming that IFFB’s are all outs, since I don’t have a statistic for the BABIP of infield flyballs.

So when is this useful? Well, it’s extremely useful when you’ve got a pitcher with a small sample of data (or none at all) playing with a particular team. When a pitcher switches teams, this helps give you a fairly good idea of how their BABIP will be effected. For instance, a groundball pitcher will be helped greatly by moving to a more groundball friendly environment (better park, better defense).

Let’s use an example to illustrate. Let’s say Ricky Nolasco get’s traded to the Rays. Using his career batted ball profile, the calculator gives him a .305 BABIP for his current team. Switch his team to the Rays, and suddenly he’s a .291 BABIP pitcher. Now let’s delve into the details of why this happened. Ground balls pitching for the Rays have a .230 BABIP, while the marlins have a .252 BABIP. Outfield Fly balls with the Rays are at .127, while with the Marlins it’s at .147. A lot of this is probably based on the Ray’s having a better defensive team, but park factor’s could come into play as well. The bottom line, Nolasco’s batted ball luck should improve with a change in teams, and with the calculator we can take a good guess at by how much.

How about another quick, more relevant example, Matt Garza. With the Rays, he shows at about .270 BABIP. With his move to the Cubs, he’s shown as a .287 BABIP pitcher, still well below league average, but not quite as elite as it was with the Rays. This of course, isn’t the entire picture of the move to the Cub’s. His strikeout’s, and walks will probably improve, and there could be a change in his HR/FB ratio as well. That’s all beyond the scope of this particular article, but still important to keep in mind.

How to Use it:

Step 1: Using one of the link’s below, you can download the spreadsheet

Open Office Link:
Excel Link:

Step 2: Open the spreadsheet, and input the LD%, GB%, IFFB%, and HR/FB% for your pitcher (these stats are easily obtained from

Step 3: Set the pitchers team, using the following lookup table:

ARI -> Diamondbacks
ATL -> Braves
BAL -> Orioles
BOS -> Red Sox
CHC -> Cubs
CWS -> White Sox
CIN -> Reds
CLE -> Indians
COL -> Rockies
DET -> Tigers
FLA -> Marlins
HOU -> Astros
KCR -> Royals
LAA -> Angels
LAD -> Dodgers
MIL -> Brewers
MIN -> Twins
NYM -> Mets
NYY -> Yankees
OAK -> Athletics
PHI -> Phillies
PIT -> Pirates
SDP -> Padres
SEA -> Mariners
SFG -> Giants
STL -> Cardinals
TBR -> Rays
TEX -> Rangers
TOR -> Blue Jays
WSN -> Nationals

Friday, June 18, 2010

Please fire Lou Pinella

Fire Lou Pinella! He can't manage a bullpen. I can't count the number of times in the 7th and 8th inning, that a starter gets into a jam, and then an opposite handed batter comes to the plate..and Pinella has a same handed pitcher up and ready in the bullpen, but he elects to stick with his starter, who has already faced this guy 3 times in the game, with predictable results. He has no idea how to evaluate the skills of his players. And he has no clue how to put together a lineup. Why in the hell are Theriot and Baker hitting at the top of the lineup (.284/.319 wOBA) while Soriano, and Soto hit at or near the bottom (.383/.392), oh and I can't wait till Ramirez gets back from the DL so he and his .231 wOBA can get slotted back into the 4 hole. He made a comment in an interview recently about how "maybe I shouldn't have kept throwing Lee and Ramirez out there in the middle of the lineup while they were struggling". I'm sitting there thinking to myself "you think???". That day he sat all his regulars and played all the bench guys. Next day....Ramirez, and Lee back in the middle of the lineup . It's really getting to be laughable at this point.

Thursday, April 15, 2010

2010 Cubs Bullpen Round up

I thought it would be fun to do a quick round of predictions for the 2010 Chicago Cubs Bullpen.

Tier 1

Marmol is in a tier of his own because of his insane strikeout rates. If he can improve on his control (a lot) from last year, he can be one of the best relievers in the game. If he can't, he's at risk of being passed by the Tier2 guys.

Carlos Marmol: Extreme fly ball pitcher, with terrible control, and incredible strikeout rates.
Keeping down the walks will be key for Marmol to be successful. Walks and extreme fly ball tendencies (home runs) is a very dangerous combination.

Tier 2

These guys are all fairly interchangeable. Any of them could outperform Marmol with some big strides (or continued control problems from Marmol). None of these guys have had consistent numbers, they all have some risk

Sean Marshall: Ground ball Pitcher, with somewhat poor control, but descent strikeouts. If he can lower his Walks, he could be very effective, if it doesn't, he'll continue to be average, should perform very similar to Grabow, whichever of the two can keep their walks in check, will prevail as the better pitcher

John Grabow: Ground ball pitcher, with poor control, and descent strikeouts. I'm noticing a pattern here, like most of the other cubs relievers, he needs to keep down the walks. His groundball rate has also been on a downward trend the last couple years, something Grabow should hope to correct (or he'll be giving up more home runs).

Jeff Gray: Groundball pitcher with low/moderate strikeouts, and descent control. He showed very good control in 2009, if he can continue to limit the walks, he should be successful. He has the potential to be one of the Cubs better relief pitcher's this year.

Tier 3

The guys in this tier have some potential, but most likely still developing, and may need another year. There's a good chance either of these guys end up back in the minors

Esmailin Caridad: Slight flyball pitcher with questionable control, and descent strikeout rate. Caridad should be an interesting guy to watch, he was extremely good in his 19.1 innings in 2009 (5.67 K/BB), but his minor league numbers don't seem to imply that he'll continue that. If he can keep the walks down, he could be very good.

Jeff Samardzija: Slight groundball pitcher, with terrible control, and low to moderate strikeouts. He'll need to improve his strikeouts, and lower his walk rate to be effective.

Tier 4

I don't see these guys succeeding, most likely they will be demoted and replaced at some point.

James Russel: Flyball pitcher with good control, but low strikeouts. He doesn't strike out enough people to compensate for his flyball tenancies, he won't be anything special.

Justin Berg: Extreme ground ball pitcher, with terrible control, and terrible strikeouts. Nothing to see here, he won't last long.

Thursday, February 4, 2010

Predicting HR/FB Rates

A big part about knowing how a pitcher should do the following year, is knowing what his HR/FB rate will look like. It's understood in the sabermetrics community, that a pitchers HR/FB rate is mostly out of the pitchers control. This is to say, that it's mostly a factor of luck, and park based factors.

There are equations out there that try to normalize ERA based on a league average home run rate. This is not an accurate way to predict a future HR/FB rate. Someone pitching in a homerun friendly ballpark is obviously going to allow more home runs then someone pitching in a non-homerun friendly ballpark.

Likewise, some equations take into account park based factors as well. This is getting better, but it's still not perfect, because there are player based factors to factor in as well. What I mean by this is the following: Consider that ryan howard switches from the NL to the AL. Now a pitcher in the AL has to face ryan howard a few games a year, and his likelyhood to launch the ball over the fence is much higher then your average player. Now consider a change such is made to one of your division opponents, or better yet, consider that their roster is likely to change quite a bit. In reality, this is the case, and probably accounts for a lot of the variance in pitchers HR/FB rate differences from year to year.

I've attempted to determine a HR/FB rate for each ballclub. Theoretically, plugging this estimate in for each pitcher on a given club, should give you a good idea as to what their HR/FB rate should look like next year.

A achieved this by putting together a sample of data, and running some statistics against it. First I determined that using weights of 100, 66, and 33 for the previous 3 years respectively yielded the best results (the relevancy of HR/FB rate seems to fall off the further back you go). Then I took a group of players who had a significant amount of innings pitched, and played for the same club for the previous 3 years. Using this data, I attempted to determine what the most accurate way to predict the 2009 HR/FB would have been, using 2008 and older data.

My conclusion was that using a pitchers 2008 HR/FB as a predictor was poor. Using his previous 3 year average was equally poor. Using my "club factor" proved to be significantly more accurate at predicting the 2009 HR/FB rate.

Now without further a due here's the 2010 predicted HR/FB rates by ballclub:

Reds 11.82
brewers 11.54
Yankees 11.46
Astros 11.34
Orioles 11.29
Nationals 11.13
Phillies 11.04
Blue Jays 10.95
Tigers 10.81
Rays 10.78
Rangers 10.51
diamondbacks 10.3
Indians 10.26
Marlins 10.24
Rockies 10.21
White Sox 10.2
Twins 9.96
mariners 9.91
Padres 9.88
Cubs 9.86
Royals 9.8
Cardinals 9.79
pirates 9.71
Angels 9.62
Red Sox 9.36
braves 9.1
Mets 9.06
A's 8.81
giants 8.77
Dodgers 8.62

Tuesday, November 10, 2009

a new xBABIP Calculator

I've been a big fan of the hardball times xBABIP calculator over the last 6 months or so, but there were a couple of things that I didn't like about it. The first thing I didn't like, was having to stick in exact numbers for AB's, HR's, etc. When dealing with projections, I much prefer to work in percentages. With percentages you can see what their BABIP for a partial season, or even a span of several years, or a career much easier. I also am not so sure about the inclusion of stolen bases as a statistic.

I'm a big fan of the fangraphs website, and they provide a wide array of batted ball data for each player. I determined that BABIP is very strongly determined by a combination of LD%, GB%, FB%, IFFB%, HR/FB%, and IFH%. That is to say, as much as BABIP can be. This is right along with what the hardball times uses, except in my case, I'm dealing strictly with percentages, and I've substituted in IFH% as opposed to SB's. It's worth noting, that I'm not taking into account ballpark factors (which surely have some kind of effect on BABIP as well).

I came up with my numbers, plotting a large amount of data (3 years worth of individual player statistics), and doing a multi-variable regression analasys on it (I'm not sure if that's the right wording or not, I have no formal training in statistical analsys, just some stuff I've picked up).

Here's the equation I came up with:

xBABIP =0.391597252 + (LD% x 0.287709436 ) + ((GB% - (GB% * IFH%) ) x -0.151969035 ) + ((FB% - (FB% x HR/FB%) - (FB% x IFFB%)) x -0.187532776) + ((IFFB% * FB%) x -0.834512464) + ((IFH% * GB%) x 0.4997192 )

Here's a published view of a spreadsheet showing it in action:

Here's a download of the spreadsheet in open office (Forgive the lame hosting service, I wasn't sure where to upload):

I've been using the following calculator (along with a number of other equations) to build my own projections for 2010, and here are a few of the interesting things I've noticed.

First off, LD% has a very strong correlation to BABIP (not exactly a revolutionary statement), but it's also very hard to project it seems. There seems to be a lot of luck built into it, so even taking career LD% rates is still factoring in some luck, so I tend to trend them closer towards the league average (19.5).

GB% is a little easier to predict Higher GB% tend to yield higher BABIP's, but that's based on your IFH% as well. A player who can post high IFH% with a lot of ground balls will greatly increase their BABIP, while a slow player with a terrible IFH% with a lot of GB% won't increase their BABIP nearly as much (makes sense).

FB% is again easier to predict then LD% typically, and high FB% tend to yield lower BABIP's, as they are more likely to record outs. But you've got to look at HR/FB, and IFFB% as well to get an accurate picture. A player who hits a ton of fly balls, but has a very high HR/FB rate, with a very low IFFB% (ryan howard), can post more respectable BABIP's (they have a better shot of landing if they are getting out of the in field)

HR/FB is also a little easier to predict, and doesn't directly effect your BABIP, it's only used to take the home runs out of your fly balls (which in turn helps your BABIP). One thing that strikes me as problematic here, is line drive home runs.

IFFB% seems somewhat player controlled, but also has a large luck component to it from year to year (probably largely due to sample size). This has a definite impact on your BABIP, as fly balls on the infield are automatic outs.

IFH% seems very speed dependant. The more in field hits you have, the higher your BABIP as well. This can vary from year to year with luck, but generally speedy players will post better (there are a few notable exceptions, like jason bay's abnormally high IFH%, which I chalk up to some luck) numbers. Ballpark factors play a role here I'm sure as well (which I'm not accounting for).

So in the end, what we get, is a way to take numbers directly from fangraph (over the course of a career, full season, or even partial season), and get a descent idea of what their BABIP should be like, and how lucky they have been.

I'm very interested in any feedback/critique that anyone has to offer, or any ideas on improving it. I've also got a number of other calculators (one that does batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, xSLG, that I'd be willing to throw out there as well, but I figured before I went through the trouble, I'd see what kind of buzz I get from this one.

Wednesday, October 28, 2009

Predicting Runs and OBP

OK, so we've talked about Batting Average, now it's time to move on to OBP, and Runs. OBP is a measure of a hitter's ability to get on base, while runs measures the number of times he's crossed the plate. OBP is a product of a batters batting average, combined with his ability to take walks. Runs, is a product of a runners ability to get on base, run the bases, and ultimately get some help from his teammates.

First, how do we project On Base Percentage. Just like batting average, this will fluctuate a lot from year to year, based on a hitters luck with balls in play (BABIP). Since we've already determined batting average, the main thing to look at now, is a players walk percentage. Unlike batting average, this is more about the players skills, and thus, it's easier to predict. This is a stat that players tend to improve on as they develop, so it's easier to expect a player to repeat, or even improve upon last years walk% (and thus improve their OBP). If a player's walk% remains relatively constant over their career, I'll use that walk%. If they have shown improvements in recent years, I'll tend towards those numbers. Rarely does a player actually decline their walk% (though it does happen). Unfortunately, built into OBP is a player's sacrifice fly's, and bunt's. This makes it impossible to simply project a players OBP using their batting average, and BB% alone (Sac fly's, and bunt's will vary a lot from player to player, and even from year to year). So the way that I project it, is to pick a year in the players career that best represents the walk % I predict for that given player (if it exists), and I'll add or subtract from that years OBP based on the batting average I projected for them. So if their batting average was 20 points better in my projection, I'll add 20 points to that particular OBP, and use that as my projection. For younger players, this is more difficult, and I find myself often just taking an existing OBP (career, or even 1 year), and tweaking it upwards. For young players, I will look at their minor league numbers as well, for a point of reference, as they tend to move close to (and sometimes exceed) their minor league numbers as time goes on.

Alright, so there's OBP, my method's aren't highly mathematical, but I think taking into account trends in BB%, and taking out the batting average fluctuations, makes for a fairly accurate OBP projection. Now it's on to predicting a players runs, and this is where my research gets a little more interesting. Runs are based on a few things, some of those things (OBP, Speed, Plate Appearances), are statistical in nature, while others (where they hit in the lineup, and how well the people behind them in the lineup are knocking them in) are out of the players control, and difficult to project. So what I've done, is thrown out what's out of the players control, and figured out a way to predict a players runs, based on their skills alone. So what you get is "skill runs", that is, the number of runs that a players skills should allow him to score. In a better batting slot, they will perform better then their skill runs, while in a worse one, they will perform under it. But batting slot is very difficult to predict, so for the sake of our projections, let's throw that out entirely.

So how did I do it? I Took a large sample of data (3 years worth of player data, that I took from fangraphs), and I ran some statistical analysis of a players runs scored as compared to their Stolen bases, OBP, and PA. When I did this analysis, I came up with the following equation to predict a players "skill runs": -90.241129 + (Plate Appearances x -90.241129) + (OBP x 200.8088179) + ( SB x 0.293131537 )

Using this equation, and my projections, here's a sample of what I came up with, for the leaders in runs scored next year in 2010, and I'm pretty happy with the results:

Player Skill Runs
Pujols 115
Ellsbury 111
Reyes 109
Figgins 108
Abreu 107

Now obviously there is a good chance that team factors will push these guys, and others, up and down in the list, but given skills alone, this is where I project them to be. Note: I have everyone set t0 700 Plate Appearances currently, so that also skews the results, more accurate PA projections will change this.

At the bottom end of the spectrum, is Benjie Molina with 80 Runs. Remember, that's with him projected to have 700 plate appearances, which he's not going to do, nor has he done at any point in his career. Interestingly, there is not a huge difference in runs scored between the top, and bottom players, this just shows that by a large margin, plate appearances are the biggest factor in a players ability to score runs (which makes sense).