View Full Version : Correlation Between Stats and Runs, etc.
Mr. Red
08-10-2006, 02:51 PM
What are the best stats, in terms of their correlation to winning %, runs scored, or runs allowed? I.E. what batting stat most closely correlates with runs scored? What pitching stat most closely correlates with K rate, BB rate, HR rate, BF, etc? What fielding stat (preferably individual) that most closely correlates with balls in play converted to outs and thus runs allowed? I hope you understand where I'm going with this. Thanks in advance.
Ubiquitous
08-10-2006, 10:36 PM
Hopefully this will show
TB AVG OBP SLG SEC OPS SLOB AV+SE A*SEC OBP3+S OP1.8+S AV+SE/2 Eff RC OBP*1.8 ISO ISO+OBP R
TB 1.000
AVG 0.838 1.000
OBP 0.805 0.855 1.000
SLG 0.991 0.811 0.806 1.000
SEC 0.812 0.534 0.790 0.848 1.000
OPS 0.974 0.862 0.906 0.981 0.867 1.000
SLOB 0.969 0.862 0.915 0.975 0.868 0.999 1.000
AV+SE 0.913 0.747 0.901 0.933 0.961 0.964 0.965 1.000
A*SEC 0.908 0.732 0.894 0.929 0.966 0.959 0.962 0.999 1.000
OBP3+S 0.928 0.879 0.965 0.933 0.856 0.985 0.988 0.961 0.955 1.000
OP1.8+S 0.954 0.874 0.940 0.960 0.865 0.996 0.997 0.966 0.960 0.997 1.000
AV+SE/2 0.940 0.851 0.935 0.948 0.899 0.986 0.987 0.985 0.980 0.988 0.991 1.000
Eff 0.966 0.771 0.865 0.982 0.919 0.987 0.985 0.975 0.973 0.961 0.977 0.971 1.000
RC 0.977 0.885 0.907 0.970 0.837 0.992 0.993 0.948 0.944 0.981 0.990 0.979 0.970 1.000
OBP*1.8 0.805 0.855 1.000 0.806 0.790 0.906 0.915 0.901 0.894 0.965 0.940 0.935 0.865 0.907 1.000
ISO 0.920 0.580 0.653 0.947 0.888 0.893 0.884 0.888 0.892 0.816 0.856 0.853 0.944 0.864 0.653 1.000
ISO+OBP 0.958 0.752 0.863 0.976 0.930 0.983 0.981 0.978 0.977 0.957 0.973 0.968 0.999 0.964 0.863 0.946 1.000
R 0.929 0.834 0.904 0.926 0.853 0.960 0.962 0.944 0.940 0.960 0.964 0.963 0.944 0.962 0.904 0.831 0.941 1.000
Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.
Mr. Red
08-10-2006, 10:54 PM
What is SLOB? Thanks by the way.
Ubiquitous
08-10-2006, 10:57 PM
SLOB is slugging times on base percentage. SLOB is basically runs created.
Tango Tiger
08-11-2006, 07:03 AM
Ub, *great* presentation.
And, interestingly, 1.8 * OBP + SLG has the highest correlation to runs scored. But, as we can see, even the plain old OPS works fine.
Mr. Red
08-11-2006, 07:11 AM
How do you make these charts? And can you make one for pitching statistics?
538280
08-11-2006, 08:34 AM
Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!
Ubiquitous
08-11-2006, 08:38 AM
How do you make these charts? And can you make one for pitching statistics?
Lahman database, excel, and the code tagline for here.
I looked into this a long time ago, I think in 2004 or so and had saved it. Never looked into pitching but I would guess the same if available would apply to pitching. Teams with the lowest obp and slg allowed would allow the least runs.
Ubiquitous
08-11-2006, 08:41 AM
Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!
SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).
OPS correlating so high is one of the reasons why OPS got pushed and became popular. It is extrememly simply to find and for the amount of worked needed to obtain extremely accurate.
Tango Tiger
08-11-2006, 08:59 AM
Even if SLG correlates higher than OBP, it's 1.8 OBP + SLG that correlates the highest, not OBP + SLG.
Also note that there is a lack of understanding of correlation. If all the teams build their teams so that the team OBP is all around the .320 to .340 mark, the regression won't be able to approach 1.00 as well as a teams that are built where HR are clustered.
The reason that K/PA or K/BFP has a high year-to-year correlation is because the true distribution is so high.
SABR Matt
08-11-2006, 09:04 AM
Your point about clustering is definitely true Tango. I noticed when I was doing all of my work through correlations that if you had a distribution where there were clusters of data points at radically different places on your X/Y scatter, you'd get a line connecting your clusters with a very high correlation that meant nothing at all (because within any cluster the line tells you nothing.
Gotta watch out when you have huge ranges of possible outcomes...the range itself can determine your correlation.
Mr. Red
08-11-2006, 12:34 PM
SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).
Wouldn't it make the most sense to make a stat that was total bases (including BB, etc.) per plate appearance? And what is Lahman's database?
Tango Tiger
08-11-2006, 12:50 PM
What makes sense is Linear Weights and BaseRuns. Everything else doesn't. Lahman database is a free historical baseball database that you can download at http://www.baseball1.com , or you can see its implementation at http://www.baseball-reference.com .
pizzacutter
08-14-2006, 08:43 PM
Given the high correlation values across the board (and the statistical and conceptual colinearity), has anyone ever done a stepwise regression to start isolating which of the factors need to be isolated?
Mariano_Rivera
08-15-2006, 05:37 AM
This might be worthy of a new thread but What is the correlation between some pitching stats and ERA? (ie SO, BB, GB/FB/LD/IFFB%, HBP, etc.)
Tango Tiger
08-15-2006, 07:31 AM
pizzacutter, you should read more about Linear Weights.
Mariano_Rivera
09-15-2006, 02:04 PM
Hopefully this will show
TB AVG OBP SLG SEC OPS SLOB AV+SE A*SEC OBP3+S OP1.8+S AV+SE/2 Eff RC OBP*1.8 ISO ISO+OBP R
TB 1.000
AVG 0.838 1.000
OBP 0.805 0.855 1.000
SLG 0.991 0.811 0.806 1.000
SEC 0.812 0.534 0.790 0.848 1.000
OPS 0.974 0.862 0.906 0.981 0.867 1.000
SLOB 0.969 0.862 0.915 0.975 0.868 0.999 1.000
AV+SE 0.913 0.747 0.901 0.933 0.961 0.964 0.965 1.000
A*SEC 0.908 0.732 0.894 0.929 0.966 0.959 0.962 0.999 1.000
OBP3+S 0.928 0.879 0.965 0.933 0.856 0.985 0.988 0.961 0.955 1.000
OP1.8+S 0.954 0.874 0.940 0.960 0.865 0.996 0.997 0.966 0.960 0.997 1.000
AV+SE/2 0.940 0.851 0.935 0.948 0.899 0.986 0.987 0.985 0.980 0.988 0.991 1.000
Eff 0.966 0.771 0.865 0.982 0.919 0.987 0.985 0.975 0.973 0.961 0.977 0.971 1.000
RC 0.977 0.885 0.907 0.970 0.837 0.992 0.993 0.948 0.944 0.981 0.990 0.979 0.970 1.000
OBP*1.8 0.805 0.855 1.000 0.806 0.790 0.906 0.915 0.901 0.894 0.965 0.940 0.935 0.865 0.907 1.000
ISO 0.920 0.580 0.653 0.947 0.888 0.893 0.884 0.888 0.892 0.816 0.856 0.853 0.944 0.864 0.653 1.000
ISO+OBP 0.958 0.752 0.863 0.976 0.930 0.983 0.981 0.978 0.977 0.957 0.973 0.968 0.999 0.964 0.863 0.946 1.000
R 0.929 0.834 0.904 0.926 0.853 0.960 0.962 0.944 0.940 0.960 0.964 0.963 0.944 0.962 0.904 0.831 0.941 1.000
Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.
What is the correlation betweens runs and EQA and runs and VORP.
Tango Tiger
09-15-2006, 02:11 PM
They are all the same:
http://www.baseballprospectus.com/article.php?articleid=2596
Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
Mariano_Rivera
09-15-2006, 04:06 PM
They are all the same:
http://www.baseballprospectus.com/article.php?articleid=2596
Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
So if you compare .928 to .962 EQA is not the most accurate?
SABR Matt
09-15-2006, 04:12 PM
EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
Mariano_Rivera
09-15-2006, 04:44 PM
EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
Okay. Am I readimng this correctly when I say it can`t be compared to other stats mnetioned in the table above?
SABR Matt
09-15-2006, 05:07 PM
I believe so yes.
Mariano_Rivera
09-15-2006, 05:13 PM
I believe so yes.
So in other words we can`t figure out how accurate it is :rolleyes
SABR Matt
09-15-2006, 06:25 PM
Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
Mariano_Rivera
09-15-2006, 06:41 PM
Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
So is it the most accurate of all the staistics?
538280
09-15-2006, 07:16 PM
You can see how EqA correlates with runs. You just have to use raw EqA, or (H + TB + 1.5*(BB + HBP + SB) + SH + SF) divided by (AB + BB + HBP + SH + SF + CS + SB).
The EqA presented on the player pages at BP's website is then compared to league norms for each era and put through league quality adjustments. If you want to see how it correlates to run scoring though that is what you would do.
PhillyA_man
09-15-2006, 07:29 PM
Hey, for all us new guys just getting into the stats analysis thing, could you define the different stats are in the chart? We know what TB, AVG, OBP and such are. but most of the ones after SLOB are new. Thanks!
Mariano_Rivera
09-16-2006, 03:25 AM
You can see how EqA correlates with runs. You just have to use raw EqA, or (H + TB + 1.5*(BB + HBP + SB) + SH + SF) divided by (AB + BB + HBP + SH + SF + CS + SB).
The EqA presented on the player pages at BP's website is then compared to league norms for each era and put through league quality adjustments. If you want to see how it correlates to run scoring though that is what you would do.
So how does it correlate?
Tango Tiger
09-16-2006, 05:14 AM
Did you not see the chart in the BP article I linked? They are all the same.
Maybe where you are getting mixed-up is that you compare results from two different articles. Stick to the one.
Correl
OPS .922
Equivalent Average .928
BaseRuns .930
eXtrapolated Runs (per PA) .920
Runs Created (per PA) .928
Total Average .926
You see? Same.
(BP might be reported r-squared, not r. But, irrelevant for comparison purposes.)
Mariano_Rivera
09-16-2006, 06:03 AM
Did you not see the chart in the BP article I linked? They are all the same.
Maybe where you are getting mixed-up is that you compare results from two different articles. Stick to the one.
Correl
OPS .922
Equivalent Average .928
BaseRuns .930
eXtrapolated Runs (per PA) .920
Runs Created (per PA) .928
Total Average .926
You see? Same.
(BP might be reported r-squared, not r. But, irrelevant for comparison purposes.)
Okay, thanks. Sobasically their is very little difference between them? That's good.
Mariano_Rivera
09-23-2006, 01:39 PM
Ub, *great* presentation.
And, interestingly, 1.8 * OBP + SLG has the highest correlation to runs scored. But, as we can see, even the plain old OPS works fine.
Since OBP overvalues walks couldn`t you make it even better by changing it to ((OBP-BA) * 0.8 + BA + SLG) * 1.8
This would be in an attempt to correct the overvaluing of BB's/HBP by multiplying the non BA parts of OBP by 0.8 because I think BB's are 80% of a hit. Correct me if I`m wrong please.
Ubiquitous
11-01-2006, 07:15 PM
I added three more metrics to the test. With one of them being the best I have tested so far. The highest one now is SLOB*PA. Originally I only looked at SLOB which would only be a rate stat, but when coupled with total PA ends up giving you the best picture. Which is odd since that is basically RC yet it returns a higher correlation.
TB AVG OBP SLG SEC OPS SLOB AV+SE A*SEC OBP3+S OB1.8+S AV+SE/2 Eff RC OBP*1.8 ISO ISO+OBP OBP*Tb SLOB*PA SOB*AB R
TB 1.0000
AVG 0.8384 1.0000
OBP 0.8050 0.8549 1.0000
SLG 0.9914 0.8111 0.8063 1.0000
SEC 0.8116 0.5338 0.7902 0.8484 1.0000
OPS 0.9744 0.8616 0.9059 0.9809 0.8671 1.0000
SLOB 0.9692 0.8624 0.9150 0.9752 0.8682 0.9989 1.0000
AV+SE 0.9126 0.7473 0.9012 0.9326 0.9608 0.9639 0.9650 1.0000
A*SEC 0.9083 0.7319 0.8935 0.9291 0.9655 0.9588 0.9616 0.9987 1.0000
OBP3+S 0.9283 0.8790 0.9652 0.9329 0.8561 0.9851 0.9881 0.9609 0.9547 1.0000
OP1.8+S 0.9540 0.8739 0.9402 0.9596 0.8646 0.9960 0.9970 0.9659 0.9602 0.9966 1.0000
AV+SE/2 0.9395 0.8507 0.9348 0.9483 0.8986 0.9861 0.9872 0.9850 0.9800 0.9883 0.9909 1.0000
Eff 0.9656 0.7707 0.8649 0.9820 0.9193 0.9873 0.9850 0.9751 0.9733 0.9606 0.9771 0.9714 1.0000
RC 0.9770 0.8847 0.9070 0.9698 0.8372 0.9924 0.9934 0.9479 0.9437 0.9809 0.9901 0.9795 0.9705 1.0000
OBP*1.8 0.8050 0.8549 1.0000 0.8063 0.7902 0.9059 0.9150 0.9012 0.8935 0.9652 0.9402 0.9348 0.8649 0.9070 1.0000
ISO 0.9200 0.5799 0.6530 0.9469 0.8882 0.8925 0.8841 0.8881 0.8917 0.8161 0.8560 0.8531 0.9440 0.8644 0.6530 1.0000
ISO+OBP 0.9578 0.7520 0.8628 0.9763 0.9303 0.9825 0.9808 0.9776 0.9766 0.9568 0.9728 0.9685 0.9994 0.9642 0.8628 0.9463 1.0000
OBP*Tb 0.9788 0.8788 0.9069 0.9729 0.8424 0.9946 0.9957 0.9501 0.9463 0.9822 0.9919 0.9796 0.9750 0.9985 0.9069 0.8720 0.9692 1.0000
SLOB*PA 0.9625 0.8587 0.9284 0.9609 0.8694 0.9931 0.9961 0.9647 0.9621 0.9900 0.9952 0.9860 0.9780 0.9939 0.9284 0.8663 0.9746 0.9964 1.0000
SOB*AB 0.9788 0.8788 0.9069 0.9729 0.8424 0.9946 0.9957 0.9501 0.9463 0.9822 0.9919 0.9796 0.9750 0.9985 0.9069 0.8720 0.9692 1.0000 0.9964 1.0000
R 0.9287 0.8339 0.9042 0.9260 0.8532 0.9601 0.9621 0.9439 0.9398 0.9598 0.9635 0.9631 0.9441 0.9616 0.9042 0.8313 0.9409 0.9630 0.9674 0.9630 1.0000
Mariano_Rivera
11-02-2006, 04:23 AM
I added three more metrics to the test. With one of them being the best I have tested so far. The highest one now is SLOB*PA. Originally I only looked at SLOB which would only be a rate stat, but when coupled with total PA ends up giving you the best picture. Which is odd since that is basically RC yet it returns a higher correlation.
TB AVG OBP SLG SEC OPS SLOB AV+SE A*SEC OBP3+S OB1.8+S AV+SE/2 Eff RC OBP*1.8 ISO ISO+OBP OBP*Tb SLOB*PA SOB*AB R
TB 1.0000
AVG 0.8384 1.0000
OBP 0.8050 0.8549 1.0000
SLG 0.9914 0.8111 0.8063 1.0000
SEC 0.8116 0.5338 0.7902 0.8484 1.0000
OPS 0.9744 0.8616 0.9059 0.9809 0.8671 1.0000
SLOB 0.9692 0.8624 0.9150 0.9752 0.8682 0.9989 1.0000
AV+SE 0.9126 0.7473 0.9012 0.9326 0.9608 0.9639 0.9650 1.0000
A*SEC 0.9083 0.7319 0.8935 0.9291 0.9655 0.9588 0.9616 0.9987 1.0000
OBP3+S 0.9283 0.8790 0.9652 0.9329 0.8561 0.9851 0.9881 0.9609 0.9547 1.0000
OP1.8+S 0.9540 0.8739 0.9402 0.9596 0.8646 0.9960 0.9970 0.9659 0.9602 0.9966 1.0000
AV+SE/2 0.9395 0.8507 0.9348 0.9483 0.8986 0.9861 0.9872 0.9850 0.9800 0.9883 0.9909 1.0000
Eff 0.9656 0.7707 0.8649 0.9820 0.9193 0.9873 0.9850 0.9751 0.9733 0.9606 0.9771 0.9714 1.0000
RC 0.9770 0.8847 0.9070 0.9698 0.8372 0.9924 0.9934 0.9479 0.9437 0.9809 0.9901 0.9795 0.9705 1.0000
OBP*1.8 0.8050 0.8549 1.0000 0.8063 0.7902 0.9059 0.9150 0.9012 0.8935 0.9652 0.9402 0.9348 0.8649 0.9070 1.0000
ISO 0.9200 0.5799 0.6530 0.9469 0.8882 0.8925 0.8841 0.8881 0.8917 0.8161 0.8560 0.8531 0.9440 0.8644 0.6530 1.0000
ISO+OBP 0.9578 0.7520 0.8628 0.9763 0.9303 0.9825 0.9808 0.9776 0.9766 0.9568 0.9728 0.9685 0.9994 0.9642 0.8628 0.9463 1.0000
OBP*Tb 0.9788 0.8788 0.9069 0.9729 0.8424 0.9946 0.9957 0.9501 0.9463 0.9822 0.9919 0.9796 0.9750 0.9985 0.9069 0.8720 0.9692 1.0000
SLOB*PA 0.9625 0.8587 0.9284 0.9609 0.8694 0.9931 0.9961 0.9647 0.9621 0.9900 0.9952 0.9860 0.9780 0.9939 0.9284 0.8663 0.9746 0.9964 1.0000
SOB*AB 0.9788 0.8788 0.9069 0.9729 0.8424 0.9946 0.9957 0.9501 0.9463 0.9822 0.9919 0.9796 0.9750 0.9985 0.9069 0.8720 0.9692 1.0000 0.9964 1.0000
R 0.9287 0.8339 0.9042 0.9260 0.8532 0.9601 0.9621 0.9439 0.9398 0.9598 0.9635 0.9631 0.9441 0.9616 0.9042 0.8313 0.9409 0.9630 0.9674 0.9630 1.0000
SLOB*PA should be very useful because if it is basically RC it is in a runs form right? That can give you a better approximation of how much the player actually contributed.
Tango Tiger
11-02-2006, 07:51 AM
Ubi, can you also add: PA * (1.8*OBP+SLG) ?
Really, you can just run each of your metrics against:
R/PA
R/out (or, similarly, R/27 outs, or R/G)
The question is whether the metric in question is a "per PA"-type or "per out"-type metric, which of course makes a difference, especially in extreme cases. Likely, if a metric correlates higher with R/PA than R/out, then it's a "per PA"-type metric. Otherwise, it's likely a "per out"-type metric.
Ubiquitous
11-02-2006, 08:27 AM
I added it in and it comes back at .9645, about .001 more accurate then just 1.8OBP+SLG. The only problem with multiplying by PA's for this formula is that it spits out a number like 6091.
Tango Tiger
11-02-2006, 10:05 AM
No problem at all. It just means it needs a constant multiplier. After all, the 1.8*OBP+SLG just makes it "nice". The important thing is the ratio between the OBP and SLG being 1.8 to 1. I mean, something like (.216*OBP+.120*SLG)*PA will get you closer to actual runs, but the correlation will be exactly the same.
Thanks for running it.
Btw, one of the most interesting versions of combining OBP and SLG is on my site as:
.652 * pa * (obp ^ .85) * (slg ^ (1 - obp/2) )
It keeps the main part of RC, that is multiplying the OBP and SLG, but it changes the relative values between the two, as the OBP changes. Geeky stuff, but a few might be interested in it.
Ubiquitous
11-02-2006, 10:35 AM
I plugged in your stat and it rises to the top at .968.
Tango Tiger
11-02-2006, 12:19 PM
Cool, very nice! It was a reader named "dq" on my site who had the idea to put in the exponents, and mine to make it "dynamic". dq figured the "best-fit" for the exponents.
Gubanich Plague
11-02-2006, 05:59 PM
.652 * pa * (obp ^ .85) * (slg ^ (1 - obp/2) )
Wow. That is freaking insane. I love it.
SABR Matt
11-02-2006, 06:09 PM
LOL...pretty geeky indeed...it's almost easier to find linear weights than to do that thing...:D
pizzacutter
11-02-2006, 06:39 PM
LOL...pretty geeky indeed...it's almost easier to find linear weights than to do that thing...:D
What's wrong with being geeky? NERD PRIDE!
pizzacutter
11-02-2006, 06:45 PM
Instead of running just zero-order correlations, has anyone tried a stepwise model to see if any of these predict any unique variance that the other doesn't cover? I'd be interested to see that one.
Ubiquitous
11-02-2006, 09:25 PM
If you show me how to set it up on excel I can do it. Unfortunately for me I never took a stat class in college so everything I do has to be learned on the fly. For instance I understand regression but I would have to sit down and figure out how to actually do it on a computer. Much like trying to figure out coefficients with the lowest rms error. Also I have been trying to figure out the S curve modeling system's coefficients in the Path's to Glory with no luck.
dquinn1575
11-03-2006, 05:38 AM
.652 * pa * (obp ^ .85) * (slg ^ (1 - obp/2) )
It really isn't that nerdy; it's just an improvement of On Base times slugging which is a close shortcut at normal levels. The exponents smooth out the extremes of the curve.
One of the cool things about it is you remove pa and you get a rate stat - one that correlates well with runs scored.
All you need are 3 stats, pa, obp & slg. It can easily be put on the spreadsheet, and it is much lower math than some of the things I see around here.
Another thing is that with an obp of .300 both exponents are .85 -which was where I was until Tango suggested making the slug change with obp. That made a lot of sense with the data I had,as the increase in obp nullified a little of the base advancement factor of slug.
I called it OTSE - Onbase Times slugging exponential on TangoTiger's site. I guess you can call it the Quinn/Tango method if you need to.
Tango Tiger
11-03-2006, 06:38 AM
OTSE, or Quinn is fine by me. I had little to do with it, and I try not to link my name to various methods.
pizzacutter
11-03-2006, 06:49 AM
If you show me how to set it up on excel I can do it. Unfortunately for me I never took a stat class in college so everything I do has to be learned on the fly.
Ubi, I actually have never really worked with Excel, although I have a program (SPSS) that I use professionally in my "real job" (psychology) that does stepwise beautifully. And if you can understand correlation, regression is easy. Take the correlation graph and shoot a line through the middle of it. Stepwise is a technique that controls for the fact that most of the measures we use are intercorrelated with each other. If you're familiar with partial correlation, it's related to that.
I'm not very familiar with S-curve modeling either (it's more useful in economics) and I haven't yet read Paths to Glory, so I'm no help there. Maybe I'll pick that up at some point.