View Full Version : Runs per game vs. runs per PA
jinaz
10-19-2007, 09:27 PM
I've been playing around with runs estimates and baselines lately. Tonight I was fiddling around with rate data, and I noticed some pretty large discrepancies between using runs per PA and runs per game (r/outs*25.5).
For example, comparing Jeff Keppinger and Scott Hatteberg:
Name R/G R/PA AVG OBP SLG
Keppinger 6.7 0.154 0.310 0.400 0.477
Hatteberg 6.3 0.149 0.332 0.394 0.474
So that's an 0.4 r/g difference compared to a 0.005 r/pa difference. If you divide 0.4 by 0.005, you get an estimate of 80 PA's per game! Clearly the r/PA figure is a better match to their very similar OBP and SLG, so I tend to trust it... and worry about the meaning of the r/g figures. R/g on the whole seems like a much more artificial measure than r/pa due to its use of outs in the denominator, which makes me want to use r/pa instead of r/g when calculating runs vs. average or vs. replacement.
However, I'm running into a problem when looking at replacement level--runs per PA changes more slowly than runs per game, such that groups of individuals who were hitting at 73% of league average as measured by runs per game (i.e. replacement level) tend to now hit at ~77% of league average when you use runs per PA.
Since I'm sure that some of you folks have looked at this before, I'm interested in your feedback on these questions:
* Is there a compelling argument to use a runs per game estimate in these kinds of calculations, despite its tendency to behave oddly on some players?
* Should I go ahead and use the 77% figure when estimating replacement level with runs/pa data? I'd think that I would be best served by comparing to the same sets of players as r/g, and that's what the 77% figure does...
Thanks for your help.
Justin
jinaz
10-21-2007, 12:04 AM
Ok, been working on this some more. Here are my findings, and what I'm planning to do with my own work moving forward. Feel free to comment, or not. :)
The biggest part of the discrepancy I noted between the above two players in R/PA and R/G was the treatment of sacrifice hits in my calculations. I was calculating PA with SH's included. But I was calculating outs without including sacrifice hits. Including SH's in both calculations makes the difference between Hatteberg and Keppinger almost zilch. Thank goodness, that was really bothering me!
Nevertheless, I'm still having concerns about the use of r/g in the calculation of runs above average (or runs above replacement, for that matter), because of the fact that players with high OBP get such better ratings under r/g than under r/pa because they make fewer outs. The differences can be huge. For example, this season, I have Barry Bonds (0.480 OBP) at ~41 runs above average this season using r/g, but "just" 28 runs above average using r/pa. Someone like Jimmy Rollins (0.344 OBP), however, is rated almost equally by the two systems, ~27 RAA using r/g, and 26 RAA using r/pa.
The question is, which approach is the most correct? wOBA is essentially just runs per PA. This seems like the most straightforward thing to do, as linear weights already includes terms that penalize players for creating outs. Doesn't using runs per game reduce the cost of producing outs for players who make relatively few of them? I'm all for rewarding players for not making outs, but it seems like this is a similar issue to applying the classic RC formula to players instead of teams: you get within-player interactions on statistics that don't have a strong basis in reality.
If R/PA is the better way to go when estimating runs above average, this has implications for how we calculate replacement level too. As I mentioned above, using a spreadsheet I constructed while doing a study on how players differ in performance based on playing time, I've found that groups of players who hit at 73% of league average under r/g hit at 77% of league average under r/pa. Furthermore, players at 73% of league average under r/pa hit at 68-69% of league average under r/g. So it seems to me that we'd need to use 77% as our baseline so that we're comparing to the same set of players under r/pa. ... at least that's what I'm planning to do at this point.
-j
Tango Tiger
10-21-2007, 08:41 AM
The best-fastest way is to do LWTS per PA. If you need to convert that to a RC-type figure, add in a league average Runs per PA constant for all players. So, a guy who is +60 runs above average with 600 PA, in a league of .12 runs per PA, would be +.10 LWTS per PA and .22 RC per PA.
Patriot
10-21-2007, 10:19 AM
As Jin implied, R/O is not the purely "correct" approach for individuals. However, R/PA is not either. R/PA does not incorporate the full negative effects of the out (what Tango has called the "inning killer" portion).
RAA/PA is the best rate stat from a linear perspective of offense. If you make the conversion to a RC-type number as Tango suggests, you will have what has also been called "RC +" in the past by Sibelius at the old fanhome.
However, for the purpose of calculating Runs Above Average, (RC+/PA - Lg(RC+/PA))*PA is equivalent to (RC/O - Lg(RC/O))*O. There will be minor discrepancies for non-average baselines, but the figures should be pretty close, nothing like the difference between using R/O or R/PA.
jinaz
10-21-2007, 10:25 AM
@Tango,
Thanks for the response. That all makes sense to me. I'm actually trying to use custom linear weights from base runs, rather than linear weights from your book, simply because I wanted to have the flexibility use custom weights on a per-team basis. I also don't have info on times players reached base on errors, which causes me to undershoot total league runs by a bit using your numbers.
Nevertheless, the effect is mostly the same I think. And it sounds like you agree that assessing player value vs. league average on a per-PA basis, rather than a per-out basis, is the way to go? A per-PA basis makes a lot more sense to me than a per-out basis, but it seems to go against convention a bit.
-j
jinaz
10-21-2007, 10:39 AM
As Jin implied, R/O is not the purely "correct" approach for individuals. However, R/PA is not either. R/PA does not incorporate the full negative effects of the out (what Tango has called the "inning killer" portion).
RAA/PA is the best rate stat from a linear perspective of offense. If you make the conversion to a RC-type number as Tango suggests, you will have what has also been called "RC +" in the past by Sibelius at the old fanhome.
However, for the purpose of calculating Runs Above Average, (RC+/PA - Lg(RC+/PA))*PA is equivalent to (RC/O - Lg(RC/O))*O. There will be minor discrepancies for non-average baselines, but the figures should be pretty close, nothing like the difference between using R/O or R/PA.
Hmm, ok, thanks. I'm going to have to mess around with this some more. I had been under the impression that r/pa was (raa/pa)+constant. But it does make sense that it might not be, since the only real difference in the linear weights is the outs term...
-j
jinaz
10-21-2007, 02:08 PM
Ok, awesome, I think I get it. I think my rationale for using PA made sense, but I didn't completely understand the numbers I was working with. I ran the numbers to confirm what Patriot said (helps me to actually "see" it), and then went back and re-read Tango's second article on run creation. Everything now makes sense, at least on a qualitative level. :) Thanks to both of you for helping me figure this out.
-j