Wednesday, October 28, 2009

Predicting Runs and OBP

OK, so we've talked about Batting Average, now it's time to move on to OBP, and Runs. OBP is a measure of a hitter's ability to get on base, while runs measures the number of times he's crossed the plate. OBP is a product of a batters batting average, combined with his ability to take walks. Runs, is a product of a runners ability to get on base, run the bases, and ultimately get some help from his teammates.

First, how do we project On Base Percentage. Just like batting average, this will fluctuate a lot from year to year, based on a hitters luck with balls in play (BABIP). Since we've already determined batting average, the main thing to look at now, is a players walk percentage. Unlike batting average, this is more about the players skills, and thus, it's easier to predict. This is a stat that players tend to improve on as they develop, so it's easier to expect a player to repeat, or even improve upon last years walk% (and thus improve their OBP). If a player's walk% remains relatively constant over their career, I'll use that walk%. If they have shown improvements in recent years, I'll tend towards those numbers. Rarely does a player actually decline their walk% (though it does happen). Unfortunately, built into OBP is a player's sacrifice fly's, and bunt's. This makes it impossible to simply project a players OBP using their batting average, and BB% alone (Sac fly's, and bunt's will vary a lot from player to player, and even from year to year). So the way that I project it, is to pick a year in the players career that best represents the walk % I predict for that given player (if it exists), and I'll add or subtract from that years OBP based on the batting average I projected for them. So if their batting average was 20 points better in my projection, I'll add 20 points to that particular OBP, and use that as my projection. For younger players, this is more difficult, and I find myself often just taking an existing OBP (career, or even 1 year), and tweaking it upwards. For young players, I will look at their minor league numbers as well, for a point of reference, as they tend to move close to (and sometimes exceed) their minor league numbers as time goes on.

Alright, so there's OBP, my method's aren't highly mathematical, but I think taking into account trends in BB%, and taking out the batting average fluctuations, makes for a fairly accurate OBP projection. Now it's on to predicting a players runs, and this is where my research gets a little more interesting. Runs are based on a few things, some of those things (OBP, Speed, Plate Appearances), are statistical in nature, while others (where they hit in the lineup, and how well the people behind them in the lineup are knocking them in) are out of the players control, and difficult to project. So what I've done, is thrown out what's out of the players control, and figured out a way to predict a players runs, based on their skills alone. So what you get is "skill runs", that is, the number of runs that a players skills should allow him to score. In a better batting slot, they will perform better then their skill runs, while in a worse one, they will perform under it. But batting slot is very difficult to predict, so for the sake of our projections, let's throw that out entirely.

So how did I do it? I Took a large sample of data (3 years worth of player data, that I took from fangraphs), and I ran some statistical analysis of a players runs scored as compared to their Stolen bases, OBP, and PA. When I did this analysis, I came up with the following equation to predict a players "skill runs": -90.241129 + (Plate Appearances x -90.241129) + (OBP x 200.8088179) + ( SB x 0.293131537 )

Using this equation, and my projections, here's a sample of what I came up with, for the leaders in runs scored next year in 2010, and I'm pretty happy with the results:


Player Skill Runs
Pujols 115
Ellsbury 111
Reyes 109
Figgins 108
Abreu 107


Now obviously there is a good chance that team factors will push these guys, and others, up and down in the list, but given skills alone, this is where I project them to be. Note: I have everyone set t0 700 Plate Appearances currently, so that also skews the results, more accurate PA projections will change this.

At the bottom end of the spectrum, is Benjie Molina with 80 Runs. Remember, that's with him projected to have 700 plate appearances, which he's not going to do, nor has he done at any point in his career. Interestingly, there is not a huge difference in runs scored between the top, and bottom players, this just shows that by a large margin, plate appearances are the biggest factor in a players ability to score runs (which makes sense).

Tuesday, October 27, 2009

Predicting Batting Average

I'm just coming off my third season of fantasy baseball, and I must say, I'm hooked! I haven't been this into baseball since I was a kid, collecting baseball cards, and watching all the cub games on WGN. It's interesting, in retrospect I now think to myself: "Boy, those stats on the back of the card actually mean something".

Anyway, I've had enough of picking through other people's rankings of players, this year I decided not to let them have all the fun, and I'll do it myself. First up, I'm going to put together my own projections. I know that projection systems exist, but I think it's fun to do my own, making the numbers better match my opinions of players.

So first up, is batting average. Batting Average is one of the more variable stats, and it's based pretty heavily on luck (which is why it fluctuates from year to year). So I don't expect my projected batting averages to be spot on, they will always fluctuate, rather I will try to project something in the middle. I do not have a highly scientific way of calculating batting average, but I'll just go over what I know about it, and give a rough idea of how I make my projections

Batting average is determined by a couple things: BABIP, and strikeout %. Strikeout % is something that's almost completely within a players control, so it's a good stat to look at. Strikeout rate is also something that players tend to improve on as they develop, I generally consider players under 27 still in development, and I'll be more likely to believe in or project improved strikeout rates in those younger players. BABIP is a highly complicated stat that takes into account a lot of stuff, I'll just briefly talk about it. First off, speed is a factor, faster players can post a higher BABIP, because they will beat out more infield grounders. LD%, GB%, FB% are all factors as well, as line drive's have the best chance of landing for a hit (by a long shot), ground balls are second most likely, and fly balls are least likely. So what does this mean? Fly ball hitters post worse BABIP's, and ground ball hitters post better ones. Line drives tend to be highly variable from year to year for most players. Generally speaking, ground ball hitters hit for better average. Anyway, one of the key things about BABIP, is that while it fluctuates a lot with luck, a career BABIP is usually a good indicator of a players future BABIP. That is, unless they suddenly turn from a fly ball hitter, into a ground ball hitter, or vice versa. As players get older, and slower, their BABIP will also fall a little, as they lose their speed. Young players are extremely hard to predict in terms of BABIP, and for that, I found this nifty BABIP calculator tool

Anyway, so to project a players batting average, I first look at their career numbers (BABIP, K%, .AVG). If their career numbers fall in line with what I would expect, then I go with their career batting average for my projection. If, over the course of their career, their strikeout rate has increased, then I will trend their batting average towards the upper end. Since young players can be extremely difficult to predict, I will actually try to predict their K%, and BABIP (using the calculator), and then using those 2 numbers, I hit the fan graphs leaderboard page, and find another player who posted similar numbers, and use his batting average.

There you have it, there's definitely some subjectivity built into my system. I don't expect people to use my projections as the word of god, rather I hope to find that people are interested in/learn from the process I use.