PDA

View Full Version : Pythagorean formula usage question



weskelton
11-19-2007, 10:18 AM
I've recently been using the pythagorean formula by substituting (runs (for or against) /27outs) in the equation. It dawned on me that the better a team is, the less likely it is that they will bat in the bottom of the ninth of home games. This would then create a larger discrepancy between their runs/game and runs/27out figures, which would then create a difference in the expected winning percentage as projected by Pyth when using one or the other.

Now, I'm assuming that the Pythagorean exponent generators (i.e. Pythagenport and Pythagenpat) have been tuned to fit formulas which would be using runs/game (or season, which would yield the same results). Assuming that this is true, then I should expect to have projections that are less accurate using runs/27out vs runs/game.

Has anyone else given this any thought? Does this suggest that using runs/27outs in the Pythagorean formula is flat-out wrong or at least bad practice?

SABR Matt
11-19-2007, 10:45 AM
On the contrary...using run scoring rate (per 27 outs) will give you better results if you take the time to rerun the regressions that produced the PythagenPat exponent function: (RS + RA / Game) ^ 0.285...replace /Game with /27 Outs and run the analysis...you'll probably get a slightly different (but not hugely so) constant than .285.

weskelton
11-20-2007, 08:45 PM
Matt,

Thanks for the response. I thought I'd try to illustrate the effect that I'm talking about here. Then you and/or others can tell me whether or not you think the theory holds water.

I decided to pick two teams from the 2007 season, which will maximize the skewing of the numbers. The first is a very good team that plays well at home (Boston RedSox 96-66, 51-30 @ Home). The second team comes from the other end of the food chain (Tampa Bay D-Rays 66-96, 29-52 on Road).

Here are the details.

RedSox
scored 867 runs in 4268 outs, RPG : 5.35, R/27 : 5.48
allowed 657 runs in 4316 outs, RPG : 4.06, R/27 : 4.11

D-Rays
scored 782 runs in 4328 outs, RPG : 4.83, R/27 : 4.88
allowed 944 runs in 4289 outs, RPG : 5.83, R/27 : 5.94

I then calculated the Pyth Win Expectancy with both RPG and R/27 using PythagenPat (exp = .287)

Boston
RPG - Pyth Exp = 1.90, Pyth W% .6287, Pyth W-L 101.8-60.2
R/27 - Pyth Exp = 1.91, Pyth W% .6344, Pyth W-L 102.8-59.2

D-Rays
RPG - Pyth Exp = 1.97, Pyth W% .4083, Pyth W-L 66.1-95.9
R/27 - Pyth Exp = 1.98, Pyth W% .4035, Pyth W-L 65.3-96.7

What we're looking at here is a difference in projection of about a full win per season for two extreme but real teams using R/27 vs. RPG. I'm assuming that the preferred and time-tested projection is the one using RPG. So that would suggest that by using R/27 we have a potential to over-estimate good teams and underestimate bad teams.

Does that make sense?

SABR Matt
11-21-2007, 01:31 AM
But we know that good teams usually outperform PythagenPat win estimates and bad teams underperform. That suggests that a method that adds a win to the good teams and subtracts one from bad teams is probably making progress.

I believe that the .287 exponent might need to be recalculated to maximize the method's accuray, but the difference will be small and IMHO the results will be better.

Tango Tiger
11-21-2007, 03:56 AM
Bill, you are right. This simply means that you need a (slightly) different exponent.

This would actually explain something, as when I used the Tango Distribution (all teams using 27 outs, natch), I think the exponent that matched mine was .278, as opposed to the .287 when you use runs per game.

It's just a matter of using the empirical data, per 27 and per G, and seeing what best fits.

weskelton
11-21-2007, 05:29 AM
But we know that good teams usually outperform PythagenPat win estimates and bad teams underperform.
Matt, that's interesting. Is this a systematic error? Do you have any links to work that would support this?

weskelton
11-21-2007, 05:43 AM
This would actually explain something, as when I used the Tango Distribution (all teams using 27 outs, natch), I think the exponent that matched mine was .278, as opposed to the .287 when you use runs per game.
Cool. Now were on the same page. My revelation actually came about as part of an attempt to provide some value-add to the soon-but-yet-to-be-released Tango-WES win calculator. Having a known W% and R/27 for/against, I was backing into values for implied Pythagorean exponents. I kept seeing calculated exponents in the 1.7 range for run environments that PythagenPat would typically put in the 1.9 range.

Tango Tiger
11-21-2007, 05:52 AM
Well, there are other issues there. The Tango Distribution assumes that the runs scored and runs allowed in a game are independent. This is not true, chiefly due to the park. There is some relationship there. The "control value" in the Tango Distribution works better as .76ish if you are trying to model just one team scoring runs, but .85ish is you are trying to model two teams scoring runs in the same game.

SABR Matt
11-21-2007, 05:52 AM
I believe there have been articles written about the failure of Pythagoras at the extremes of performance. Tom Tango demonstrated the systematic error in Pythagorean W% estimators compared to the Tango distribution...he's got a nice chart on it.

And Tom said the same thing I did...a different exponent is required for different input data but the best solution is always the most accurate input data...which is the per-out data in this case.

Tango Tiger
11-21-2007, 05:55 AM
I'm not sure there is a bias at the quality of the team though. If there was, we could simply correct the exponent to remove that bias.

In other words, it is possible that a good team will overshoot what its random expectations are. I don't know. But, the pythag formula could "cheat" by changing the exponent to overcome this bias. All pythag does is try to best fit. It really doesn't tell you more than that.

But, if you looked at the underlying RS and RA distributions, and assumed those are random, THAT would tell you how many wins a team should have, than what they did have. And, it's plausible that good teams will win more than expected by random. I don't know.

SABR Matt
11-21-2007, 11:54 PM
Why would a good team win more games than the RS/RA terms say it should? Chemistry?