View Full Version : Historical WPA
Gubanich Plague
09-01-2006, 02:37 AM
Hey folks, I've been working on a project to compile historical WPA using retrosheet's pbp data. A couple weeks ago I put the site up with everything based on empirical WP values.
Thanks to some great conversations with tangotiger, I decided to revamp the whole thing with a new WP table, this time based on run frequency data derived from a one million inning Monte Carlo simulation.
http://www.gubanichplague.com/
Check it out, and enjoy. The numbers are almost exactly the same as Fangraphs, but with the added convenience of being highly searchable and sortable (and a couple extra years). Right now it has data for 2000-2005. I haven't gone back any further yet because I'm waiting for 1999 data from retrosheet, but soon I may just quit waiting and do the rest of the years back to 1974 and leave a gap at 1999.
Here's a preview of the batting leaders for 2000-2005:
PA WPA
1 Barry Bonds 3102 46.93
2 Todd Helton 4074 34.40
3 Lance Berkman 3707 29.70
4 Alex Rodriguez 4257 28.93
5 Carlos Delgado 3915 28.48
6 Gary Sheffield 3846 28.43
7 Jason Giambi 3581 28.34
8 Manny Ramirez 3662 27.10
9 Brian Giles 4000 26.04
10 Vladimir Guerrero 3762 25.02
Oh, and please leave comments. I'm looking for ways to make it better and for new ideas.
I hope you have as much fun with it as I have.
Gubanich Plague
09-02-2006, 08:01 PM
Let me say a little more about what I'm doing here. A couple months ago I went out looking for WPA for players throughout history, and couldn't find anything, so I decided to put it together myself. A little while later David at Fangraphs put up WPA for 2002-present, but I kept working on it anyway.
Here's how I decided to calculate the win expectancy table. Let me know what you think:
I parse through the play by play to get the 28*28=784 transition probabilities. If you don't know what I mean by that read this: http://www.pankin.com/markov/theory.htm.
Using those probabilities, for each of the 24 possible starting states I run a Monte Carlo simulation of a million innings and keep track of how many runs score. That gives me a run frequency table that I can apply to different scores in the bottom of the ninth to get win probability. And then again recursively in the top of the ninth back to the top of the first.
I have not included non-PA plays, that is, baserunning plays. This is probably a flaw, but it would be difficult because you have to add 28 more states. I wonder if this actually makes much difference?
So I get a new win expectancy table for each league in each season, and then go back through the play by play to calculate WPA. Again throwing out baserunning plays and giving the hitter credit for reaching on an error.
And of course none of this is original. And everyone has their own opinion of WPA. I think it's a lot of fun to look at retrospectively, sort of a point of view of individual player production. Not much, if any, predictive value though.
Here are a few interesting things I've found in the numbers along the way.
In 2003, Bill Mueller hit .326 and won the batting title, but had a bad season by WPA standards (-0.42).
The ten worst batters by WPA for 2000-2005 are (PA, WPA):
1 Neifi Perez 3289 -13.44
2 Brad Ausmus 2969 -12.74
3 Royce Clayton 3202 -12.38
4 Doug Glanville 2256 -12.37
5 Alex Gonzalez 2799 -12.25
6 Rey Sanchez 2244 -10.64
7 Mike Matheny 2653 -9.47
8 Dan Wilson 1829 -8.70
9 Einar Diaz 1763 -8.39
10 Tony Batista 3289 -8.26
4 catchers, 4 shortstops, 1 center fielder, and 1 third baseman. No surprise that the important defensive positions show up there. (No explanation for Tony Batista though. :))
I think it's a lot of fun to look at the top single game performances, too. Best single game hitting performances for 2000-2005 (PA, date, WPA):
1 Brian Daubach BOS 8/21/2000 5 1.25
2 Ryan Langerhans ATL 9/7/2005 5 1.14
3 Brandon Inge DET 8/24/2003 5 1.09
4 Todd Helton COL 6/11/2004 5 1.04
5 Midre Cummings MIN 5/10/2000 2 1.03
6 Jose Guillen ANA 7/31/2004 6 1.01
7 Ken Griffey CIN 5/13/2000 5 1.00
8 Brian Giles PIT 7/28/2001 5 0.97
9 Matt Lawton CLE 5/14/2002 5 0.97
10 Carlos Beltran KCA 5/31/2002 6 0.95
Daubach went 3-5 with 4 RBIs in a 7-6 win over the Angels at Fenway.
http://retrosheet.org/boxesetc/B08210BOS2000.htm
He hit a game tying two run homer in the bottom of the 9th with two outs. And then in the bottom of the 11th with two outs, down by a run, he hit a game winning two run single.
And the top ten pitchers of 2005 (PA, WPA):
1 Pedro Martinez 4555 25.75
2 Randy Johnson 5403 21.49
3 Roger Clemens 5158 19.85
4 Mariano Rivera 1707 19.36
5 Armando Benitez 1601 18.27
6 Tim Hudson 5387 17.93
7 John Smoltz 2050 17.58
8 Eric Gagne 2112 17.05
9 Curt Schilling 4901 15.62
10 Johan Santana 3480 15.09
I must admit I did not expect to see Benitez there.
Eric Gagne's numbers are amazing. He did all his damage in three seasons (2002-2004) and he's still 8th place in this six year span. He led all of baseball each of those three seasons.
So have at it.
SABR Matt
09-02-2006, 09:16 PM
Great work...glad you decided to post that here.
Could you explain to me what a Monte Carlo simulation is and how I might repeat your work if I choose to pursue potential improvements? I'm not a particularly skilled programmer so if you have working code, it might be helpful to the community (myself included) if you made the source code available...I'm sure many (not just myself) would find it useful to have access to that data.
I am fascinated by WPA and leverage index, but it's a little inaccessible to the poor idiots who can't code well enough to do recursive functions and object orientation. :\
Gubanich Plague
09-02-2006, 09:32 PM
I am not a programmer by any stretch of the imagination, either. I just basically know how to write loops in C, and that's pretty much all there is to it. And C chews through huge loops like it's nothing, which is great. I'm sure my code could be written in about a third the number of lines by a real programmer.
First you have to get all the transition probabilities, that you get simply by reading through all the play by play data.
And to do the simulation, first pick your starting state. Then assign the probabilities of all the 28 possible transitions from that state each a range between 0 and 1, and then pick a random number between 0 and 1 to determine the new state. Keep doing that until the inning is over. Keeping track of how many runs score after each play is easy because each transition carries with it a specific number of runs scored (assuming only PA-plays).
Do like a million innings, and now you know the run frequency from that starting state.
Now do that whole thing starting with the other 23 possible starting states, and you have all the run frequency information you need.
Then apply that recursively starting with the bottom of the ninth, and you have your win expectancy.
No need for object oriented programming, and no need to apply any functions recursively, you can just do it as a big loop.
SABR Matt
09-02-2006, 09:49 PM
Hmm...that sounds surprisingly simple.
BTW including non-PA-changing plays wouldn't complicate your basic strategy much I don't think. I didn't even think of the idea of choosing a random number between 0 and 1...clever...
After play #1 though...you're in a new base/out state...I guess you save the active B/O state and pass it back to the same loop that does the random number generation...
Gubanich Plague
09-02-2006, 09:58 PM
Hmm...that sounds surprisingly simple.
BTW including non-PA-changing plays wouldn't complicate your basic strategy much I don't think. I didn't even think of the idea of choosing a random number between 0 and 1...clever...
The complication is that you have another 784 different transitions. And it was so tiring for me to go through and write down the number of runs scored for each of the first 784 transitions that I didn't want to do it again. Although now I realize that I could have just automated it by subtracting numbers of baserunners before and after and such.
But that's not the real problem. The real problem is that I'm not entirely sure that each of the 784 non-PA transitions happen often enough to get a good enough sample size to get accurate transition probabilities.
After play #1 though...you're in a new base/out state...I guess you save the active B/O state and pass it back to the same loop that does the random number generation...
Right.
SABR Matt
09-02-2006, 10:02 PM
The way I handled non-PA plays in my markov matrices (I did markov matrices to calculate RE already...did it entirely in MySQL) was to include them in with PA changing plays and to calcualte the average RS on each transition rather than using integer assumptions.
If runner at first and none out becomes runner at second and none out 15% of the time because of a SB and 85% of the time because of PA-changing pla (mostly doubles)...then the average RS for transition is 0.85...not 1.
Gubanich Plague
09-02-2006, 10:10 PM
That's a really good idea.
SABR Matt
09-03-2006, 07:09 AM
That works if you're doing whole leagues and not worried about things like line-up position and specific players. If you want to analyze a specific team, you need to know more about which players and line-ups slots are more prone to baserunning plays impacting the scoring...
Tango Tiger
09-03-2006, 08:56 AM
Gub's process is how I did the Batting Order in The Book. I did it Markov for the basic WE tables.
I did consider both PA and non-PA, and had separate state-transiton for each. I also had a frequency rate for the PA and non-PA for each starting state.
And of course the number of runs scored can be calculated by the number of runners left on base plus outs generated.
Gubanich Plague
09-03-2006, 12:36 PM
I'm trying to figure out how to get run frequency only using Markov, and not doing any simulations. And I have to admit I haven't been able to wrap my brain around the matrix math well enough to figure that out.
Gubanich Plague
09-03-2006, 12:44 PM
The way I handled non-PA plays in my markov matrices (I did markov matrices to calculate RE already...did it entirely in MySQL) was to include them in with PA changing plays and to calcualte the average RS on each transition rather than using integer assumptions.
If runner at first and none out becomes runner at second and none out 15% of the time because of a SB and 85% of the time because of PA-changing pla (mostly doubles)...then the average RS for transition is 0.85...not 1.
I just realized that this wouldn't actually work for calculating run frequency, because, the runs that score in an inning are by definition an integer!
Actually, I could probably just round to the nearest integer and that would be just fine.
SABR Matt
09-03-2006, 02:30 PM
Yeah..if you want to count runs at the end of an inning, you would just round to the nearest whole run.
SABR Matt
09-04-2006, 05:10 PM
OK...here's what I have for the pseudocode that would outline the process for calculating Run Expectency using a monte carlo simulation technique
Read in a struct[28] data structure (an array of 28 structs) containing for each starting base/out state the transition state (the "ending" BO State), the running probability of reaching that state in one play, and the average RS on that transition for every league in the PBP database (there are 96 PBP leagues...we need 96 arrays of structs...one for each league...for easy indexing the overall data structure might need to be an array of arrays of structs)
FOR EACH league {
- Get the array of structs that matches the league
FOR EACH starting BO State {
FOR EACH simulation cycle {
FOR EACH simulated inning {
- Get a random floating point number between zero and one
- Find the position of that random number on the running transition probability spectrum pertaining to the starting base/out state
- Add the average RS associated with that transition to a running RS tally for this simulated inning
- Reset the starting BO State variable
- re-zero the variable holding the random floating point
- EXIT simulated inning loop when starting BO State is a three-out state
}
- Add the simulated inning's RS count to a running RS count for the entire simulation
- re-zero the variable used for storing the inning RS count
- EXIT the simulation cycle after 1,000,000 repetitions
}
- Calculate RE from this simulation cycle (running RS / 1,000,000)
- Store RE in a struct containing leagueID, StartBO, RE variables
- re-zero the simulation cycle running RS total
- EXIT Starting BO Loop when all 24 non-three-out BO states have been simulated
}
- EXIT leagues loop when all leagues have been calculated
}
Output RE data structure to a csv file for later use
That sound right?
Gubanich Plague
09-04-2006, 06:04 PM
Yep, that should do it.
SABR Matt
09-05-2006, 04:30 PM
I promised myself I wasn't going to do function writing or object orientation...
I lied. LOL
With a little coaching from Randy Fiato (who works with C++ code for a living) to remind me of some of the rules I came up with a little function that does a nice streamlined binary search when passed a double (the random double between 0 and 1) and a vector of structs (the inner layer of a four dimensional array that will contain all of the transition data).
My primary data structure for holding info from the PBP database gathered on probability transitions looks like this:
typedef std::vector<std::vector<std::vector<trans>>> PBP;
trans is a struct data type containing four variables (transition probability, average RS on transition, a running probability sum (which keeps track of the running sum of probabilities for all of the structs within each unique inner vector (each league and starting Base/Out state combination) and a pre-state running sum (the running sum of all of the states except the most recent one). I need the last two to establish a range of values associated with each ending base/out state (so that when I pick a random double between 0 and 1, it will fall in a range and get an ending state attached to it).
A vector (of leagues) of vectors (of starting base/out states) of vectors of structs (containing data on each ending base/out state). Tastey ain't it? :D
for those who might be interested in following along and duplicating my work...this is the function I wrote to do a binary search on the inner vector looking for the vector index containing the ending base/out state which will become the new starting base/out state within each simulation:
int BOEndSearch (double key, const std::vector <trans> &range) {
int Min = 0;
int Max = range.size();
bool check = false;
int flag = 0;
while (!check) {
flag = (Max - Min) / 2;
if (key < range[flag].preProb)
Max = flag;
else if (key > range[flag].runningProb)
Min = flag;
else
check = true;
};
return flag;
};
Those of you with any experience in C++ will I'm sure find a type-o in there...LOL I'll need to debug just in case i screwed up some logical point, but this should work.
Gubanich Plague
09-05-2006, 05:12 PM
Yeah, this looks like a real programmer's work, unlike mine. Here's the way I implemented it in C:
All of my transition probabilities are in an array: prob[28][28].
Say I'm currently in state: (int) i, and I've picked a random number between 0 and 1: (float) rand.
float x = 0;
int n = -1;
while (rand > x) {
n++;
x = x + prob[i][n];
}
And once the while loop exits, n is the new state. The only potential problem with this is if all your probabilities all sum to like 0.999998 or something because of rounding, and your random number is 0.999999. Then it would try to throw you into state #29 And you can correct for that easily if you want. It's not very elegant though. Whatever.
SABR Matt
09-05-2006, 05:51 PM
that may not be all that "elegant" but it is the general way in which I intend to do the simulator part of this program. The difference between my data structure and your 2D array is that I want to do all 96 leagues rather than just one at a time (hence the extra dimension), and I want to store average RS and transition probability on the same data structure (hence the struct inside the 3D vector construction) and I want the data to be put on the heap (general non-allocated random access memory) and not the stack (a special subset of RAM allocated to a C++ compiler instance) because there is more room on the heap to store large amounts of data and a structure of the size I want to store and use might be too large for the stack (overflow = crash).
The standard vector object gets stored on the heap...an array gets stored on the stack. And no...i didn't know all of this when I started. Randy helped clarify things for me yesterday. :D
the general form of the simulator will be:
- FOR EACH league
- FOR EACH starting base/out state
- FOR EACH simulation run
- FOR EACH inning
- pick a random double, search the relevant portion of my data structure for which ending BO state that double keys to, reset the starting BO state to the new state, store the RS on the play
- punch out of the inning loop when there are three outs
- sum up all of the inning RS counts
- punch out of the simulation run after a million of the inning runs
- save the RE for the simulation to the output array
- punch out of the BO State loop after each starting BO has been simulated
- punch out of the league loop when all leagues are done
Gubanich Plague
09-05-2006, 05:56 PM
Cool. I've never bothered to take that step from C to C++, maybe I should. I wonder if you can that kind of memory finagling and operations on data structures in C.
SABR Matt
09-05-2006, 07:32 PM
C does not have object oriented programming...you can't use things like the standard vector (an object that is part of the standard library for C++)
I highly HIGHLY recommend you take a little time to learn the things that are different about C++...it's infinitely more powerful and not that difficult to learn once you understand C...object oriented programming gave me fits, but the basics of C++ really help a lot.
SABR Matt
09-07-2006, 08:23 PM
Minor update...realized my previous version of the binary search had an important logical hole which is now fixed:
int BOEndSearch (double key, const std::vector <trans> &range) {
int Min = 0;
int Max = range.size();
bool check = false;
int index = 0;
do {
index = (Max - Min) / 2;
if (fabs(key - range[index].preProb) < 0.0000000001 && fabs(key - range[index].runningProb) < 0.0000000001) {
if (runningProb > 0.9999999999) {
do {
--index;
} while (fabs(range[index].preProb - range[index].runningProb) < 0.0000000001);
check = true;
};
else {
do {
++index;
} while (fabs(range[index].preProb - range[index].runningProb) < 0.0000000001);
check = true;
};
};
else if (key < range[index].preProb)
Max = index;
else if (key > range[index].runningProb)
Min = index;
else
check = true;
} while (!check);
return index;
};
The way preProb (the running probability before the current state) and runningProb (the running probability after the current state) are going to be calculated...impossible transitions will have preProb and runningProb values that are the same and if the random number generator happens to pick one of these repeated values that borders between a possible state and a series of impossible ones, there could be problems doing it the old way. Had to add some checks to plug that hole.
SABR Matt
09-10-2006, 01:06 PM
You guys ready for this?
This code has been thoroughly debugged and the results it generates are reasonable (and will be presented below).
This...is the entire Monte-Carlo simulation program. It takes a table of transition probabilities and expected run scoring from each transition and outputs run expectency from each starting base/out state in each league in the PBP database.
#include <vector>
#include <cstdlib>
#include <time.h>
#include <fstream>
#include <cmath>
#include <iostream>
#include <string>
using namespace std;
struct trans {
trans();
double prob;
double RS;
double preProb;
double runningProb;
};
struct output {
output();
size_t league;
size_t BOStart;
double RE;
};
output::output() {
league = 0;
BOStart = 0;
RE = 0.0;
}
trans::trans() {
prob = 0.0;
RS = 0.0;
preProb = 0.0;
runningProb = 0.0;
}
typedef std::vector<std::vector<std::vector<trans>>> PBP;
typedef std::vector<output> results;
size_t BOEndSearch (double key, const std::vector<trans> &range);
void reader (PBP &frame);
void writer (const results &holder);
int main() {
PBP frame;
frame.resize(96);
for(size_t i(0); i < 96; ++i) {
frame[i].resize(24);
for(size_t j(0); j < 24; ++j)
frame[i][j].resize(28);
};
cout << "Reading Data Source..." << endl;
reader(frame);
cout << "Data Source Read - Calculating Probabilities..." << endl;
double runProb;
for (size_t i(0); i < 96; ++i) {
for (size_t j(0); j < 24; ++j) {
runProb = 0.0;
for (size_t k(0); k < 28; ++k) {
frame[i][j][k].preProb = runProb;
runProb += frame[i][j][k].prob;
frame[i][j][k].runningProb = runProb;
};
};
};
results holder;
srand(static_cast <unsigned int> (time(NULL)));
cout << "Beginning Simulation Run..." << endl;
for (int league = 0; league < 96; ++league) {
double simRS, inningRS, key;
size_t simmedBOStart, BOEnd;
for (size_t BOStart(0); BOStart < 24; ++BOStart) {
cout << "Simulating League #" << league << " State #" << BOStart << endl;
simRS = 0;
for (int sim = 0; sim < 100000; ++sim) {
inningRS = 0;
simmedBOStart = BOStart;
do {
key = (rand() / (static_cast <double> (RAND_MAX) + 1));
BOEnd = BOEndSearch(key, frame[league][simmedBOStart]);
inningRS += frame[league][simmedBOStart][BOEnd].RS;
simmedBOStart = BOEnd;
} while(simmedBOStart < 24);
simRS += inningRS;
};
holder.resize(holder.size() + 1);
holder.back().league = league;
holder.back().BOStart = BOStart;
holder.back().RE = (simRS / 100000);
};
};
cout << "Writing Output File..." << endl;
writer(holder);
cout << "Program Complete!" << endl;
return 0;
}
size_t BOEndSearch (double key, const std::vector <trans> &range) {
size_t Min = 0;
size_t Max = range.size() - 1;
bool check = false;
size_t index = 0;
do {
index = (Max + Min) / 2;
if (fabs(key - range[index].preProb) < 0.00000001 && fabs(key - range[index].runningProb) < 0.00000001) {
if (range[index].runningProb > 0.99999999) {
do {
--index;
cout << index << endl;
} while (fabs(range[index].prob) < 0.00000001);
check = true;
}
else {
do {
++index;
} while (fabs(range[index].prob) < 0.00000001);
check = true;
};
}
else if (key < range[index].preProb)
Max = index - 1;
else if (key > range[index].runningProb)
Min = index + 1;
else
check = true;
} while (!check);
return index;
}
void reader(PBP &frame) {
ifstream infile("D:/My Documents/transitions.txt");
int leagueIndex = -1;
int BOStart = -1;
int BOEnd = -1;
std::string line;
getline(infile, line);
do {
infile >> leagueIndex;
infile >> BOStart;
infile >> BOEnd;
infile >> frame[leagueIndex][BOStart][BOEnd].prob;
infile >> frame[leagueIndex][BOStart][BOEnd].RS;
getline(infile, line);
} while (!infile.eof());
infile.close();
}
void writer (const results &holder) {
ofstream outfile("D:/My Documents/RE.csv");
outfile << "leagueID,BOStart,RE" << endl;
for (size_t i(0); i < holder.size(); i++) {
outfile << holder[i].league << "," << holder[i].BOStart << "," << holder[i].RE << endl;
};
outfile.close();
}
I'm very glad I did this project and that Gubanich Plague posted here and fired up my interest in pursuing it because now that I have written this program in C++ with some coaching and debugging help from Randy Fiato, I have a solid foundation upon which to build other baseball related programs...a better idea of what is required. I am working toward some more complicated problems in the future, but this was a good place to start.
Here is the output file (formatted in Excel)
lgID BO RE
0 0 0.479
0 1 0.258
0 2 0.093
0 3 0.833
0 4 0.496
0 5 0.203
0 6 1.046
0 7 0.682
0 8 0.327
0 9 1.281
0 10 0.949
0 11 0.389
0 12 1.452
0 13 0.896
0 14 0.416
0 15 1.672
0 16 1.078
0 17 0.460
0 18 1.843
0 19 1.293
0 20 0.546
0 21 2.446
0 22 1.571
0 23 0.707
1 0 0.480
1 1 0.259
1 2 0.102
1 3 0.855
1 4 0.495
1 5 0.213
1 6 1.130
1 7 0.690
1 8 0.341
1 9 1.316
1 10 0.946
1 11 0.366
1 12 1.455
1 13 0.915
1 14 0.398
1 15 1.776
1 16 1.207
1 17 0.494
1 18 1.891
1 19 1.334
1 20 0.568
1 21 2.152
1 22 1.592
1 23 0.702
2 0 0.465
2 1 0.249
2 2 0.095
2 3 0.825
2 4 0.516
2 5 0.215
2 6 1.063
2 7 0.658
2 8 0.315
2 9 1.274
2 10 0.903
2 11 0.381
2 12 1.444
2 13 0.915
2 14 0.482
2 15 1.735
2 16 1.119
2 17 0.487
2 18 1.863
2 19 1.364
2 20 0.610
2 21 2.247
2 22 1.488
2 23 0.760
3 0 0.498
3 1 0.274
3 2 0.103
3 3 0.846
3 4 0.528
3 5 0.219
3 6 1.060
3 7 0.695
3 8 0.322
3 9 1.348
3 10 0.913
3 11 0.356
3 12 1.453
3 13 0.927
3 14 0.409
3 15 1.739
3 16 1.193
3 17 0.533
3 18 1.958
3 19 1.383
3 20 0.571
3 21 2.433
3 22 1.561
3 23 0.698
4 0 0.489
4 1 0.258
4 2 0.094
4 3 0.871
4 4 0.512
4 5 0.224
4 6 1.130
4 7 0.690
4 8 0.328
4 9 1.390
4 10 0.944
4 11 0.390
4 12 1.529
4 13 0.948
4 14 0.431
4 15 1.811
4 16 1.153
4 17 0.489
4 18 1.961
4 19 1.363
4 20 0.611
4 21 2.366
4 22 1.607
4 23 0.792
5 0 0.499
5 1 0.267
5 2 0.107
5 3 0.880
5 4 0.508
5 5 0.231
5 6 1.099
5 7 0.696
5 8 0.352
5 9 1.328
5 10 0.958
5 11 0.383
5 12 1.530
5 13 0.897
5 14 0.468
5 15 1.746
5 16 1.061
5 17 0.478
5 18 1.983
5 19 1.382
5 20 0.608
5 21 2.420
5 22 1.521
5 23 0.739
6 0 0.488
6 1 0.266
6 2 0.102
6 3 0.857
6 4 0.524
6 5 0.216
6 6 1.108
6 7 0.685
6 8 0.345
6 9 1.324
6 10 0.913
6 11 0.400
6 12 1.459
6 13 0.925
6 14 0.425
6 15 1.738
6 16 1.183
6 17 0.540
6 18 1.889
6 19 1.435
6 20 0.642
6 21 2.355
6 22 1.580
6 23 0.797
7 0 0.476
7 1 0.248
7 2 0.095
7 3 0.832
7 4 0.497
7 5 0.215
7 6 1.067
7 7 0.683
7 8 0.329
7 9 1.238
7 10 0.915
7 11 0.331
7 12 1.460
7 13 0.938
7 14 0.444
7 15 1.712
7 16 1.188
7 17 0.525
7 18 1.924
7 19 1.415
7 20 0.520
7 21 2.242
7 22 1.533
7 23 0.713
8 0 0.510
8 1 0.271
8 2 0.104
8 3 0.889
8 4 0.528
8 5 0.220
8 6 1.110
8 7 0.705
8 8 0.333
8 9 1.429
8 10 0.989
8 11 0.374
8 12 1.571
8 13 1.010
8 14 0.431
8 15 1.757
8 16 1.160
8 17 0.478
8 18 2.037
8 19 1.431
8 20 0.642
8 21 2.465
8 22 1.694
8 23 0.753
9 0 0.508
9 1 0.274
9 2 0.110
9 3 0.891
9 4 0.520
9 5 0.233
9 6 1.140
9 7 0.710
9 8 0.336
9 9 1.386
9 10 0.977
9 11 0.425
9 12 1.582
9 13 0.922
9 14 0.485
9 15 1.845
9 16 1.159
9 17 0.475
9 18 2.056
9 19 1.454
9 20 0.704
9 21 2.497
9 22 1.624
9 23 0.827
10 0 0.498
10 1 0.264
10 2 0.103
10 3 0.879
10 4 0.523
10 5 0.224
10 6 1.081
10 7 0.677
10 8 0.325
10 9 1.304
10 10 0.913
10 11 0.421
10 12 1.567
10 13 0.946
10 14 0.431
10 15 1.765
10 16 1.202
10 17 0.497
10 18 2.003
10 19 1.396
10 20 0.634
10 21 2.463
10 22 1.545
10 23 0.754
11 0 0.505
11 1 0.267
11 2 0.101
11 3 0.875
11 4 0.526
11 5 0.220
11 6 1.142
11 7 0.701
11 8 0.328
11 9 1.353
11 10 0.928
11 11 0.404
11 12 1.519
11 13 0.937
11 14 0.444
11 15 1.776
11 16 1.193
11 17 0.534
11 18 2.014
11 19 1.416
11 20 0.623
11 21 2.414
11 22 1.635
11 23 0.682
12 0 0.455
12 1 0.239
12 2 0.091
12 3 0.829
12 4 0.476
12 5 0.194
12 6 1.053
12 7 0.642
12 8 0.302
12 9 1.287
12 10 0.914
12 11 0.372
12 12 1.390
12 13 0.869
12 14 0.413
12 15 1.723
12 16 1.153
12 17 0.491
12 18 1.843
12 19 1.303
12 20 0.622
12 21 2.274
12 22 1.451
12 23 0.724
13 0 0.428
13 1 0.230
13 2 0.088
13 3 0.782
13 4 0.472
13 5 0.199
13 6 1.014
13 7 0.653
13 8 0.316
13 9 1.254
13 10 0.878
13 11 0.358
13 12 1.424
13 13 0.870
13 14 0.423
13 15 1.677
13 16 1.113
13 17 0.492
13 18 1.980
13 19 1.264
13 20 0.538
13 21 2.317
13 22 1.372
13 23 0.723
14 0 0.452
14 1 0.250
14 2 0.095
14 3 0.793
14 4 0.495
14 5 0.210
14 6 1.017
14 7 0.646
14 8 0.326
14 9 1.233
14 10 0.872
14 11 0.318
14 12 1.380
14 13 0.902
14 14 0.419
14 15 1.706
14 16 1.163
14 17 0.492
14 18 1.896
14 19 1.327
14 20 0.606
14 21 2.255
14 22 1.504
14 23 0.733
15 0 0.452
15 1 0.232
15 2 0.082
15 3 0.821
15 4 0.498
15 5 0.202
15 6 1.069
15 7 0.670
15 8 0.307
15 9 1.310
15 10 0.931
15 11 0.410
15 12 1.454
15 13 0.916
15 14 0.415
15 15 1.746
15 16 1.107
15 17 0.491
15 18 1.977
15 19 1.302
15 20 0.586
15 21 2.483
15 22 1.526
15 23 0.728
16 0 0.436
16 1 0.241
16 2 0.094
16 3 0.775
16 4 0.493
16 5 0.215
16 6 1.012
16 7 0.644
16 8 0.325
16 9 1.239
16 10 0.868
16 11 0.375
16 12 1.387
16 13 0.897
16 14 0.431
16 15 1.620
16 16 1.067
16 17 0.502
16 18 1.845
16 19 1.257
16 20 0.502
16 21 2.226
16 22 1.367
16 23 0.678
17 0 0.452
17 1 0.233
17 2 0.091
17 3 0.821
17 4 0.494
17 5 0.199
17 6 1.043
17 7 0.650
17 8 0.313
17 9 1.249
17 10 0.881
17 11 0.370
17 12 1.478
17 13 0.871
17 14 0.377
17 15 1.757
17 16 1.163
17 17 0.521
17 18 1.934
17 19 1.334
17 20 0.571
17 21 2.281
17 22 1.502
17 23 0.675
18 0 0.435
18 1 0.231
18 2 0.083
18 3 0.792
18 4 0.480
18 5 0.197
18 6 1.030
18 7 0.630
18 8 0.296
18 9 1.206
18 10 0.860
18 11 0.305
18 12 1.435
18 13 0.890
18 14 0.416
18 15 1.682
18 16 1.131
18 17 0.454
18 18 1.847
18 19 1.274
18 20 0.483
18 21 2.248
18 22 1.608
18 23 0.732
19 0 0.455
19 1 0.241
19 2 0.088
19 3 0.819
19 4 0.494
19 5 0.203
19 6 1.044
19 7 0.664
19 8 0.338
19 9 1.248
19 10 0.856
19 11 0.400
19 12 1.435
19 13 0.899
19 14 0.442
19 15 1.690
19 16 1.122
19 17 0.456
19 18 1.902
19 19 1.295
19 20 0.576
19 21 2.375
19 22 1.446
19 23 0.681
20 0 0.418
20 1 0.219
20 2 0.079
20 3 0.771
20 4 0.456
20 5 0.191
20 6 0.989
20 7 0.622
20 8 0.311
20 9 1.247
20 10 0.855
20 11 0.378
20 12 1.389
20 13 0.863
20 14 0.424
20 15 1.691
20 16 1.096
20 17 0.501
20 18 1.871
20 19 1.306
20 20 0.611
20 21 2.200
20 22 1.493
20 23 0.793
21 0 0.425
21 1 0.221
21 2 0.083
21 3 0.796
21 4 0.471
21 5 0.194
21 6 1.041
21 7 0.648
21 8 0.330
21 9 1.222
21 10 0.891
21 11 0.396
21 12 1.415
21 13 0.839
21 14 0.424
21 15 1.644
21 16 1.086
21 17 0.450
21 18 1.886
21 19 1.275
21 20 0.605
21 21 2.228
21 22 1.317
21 23 0.678
22 0 0.384
22 1 0.195
22 2 0.073
22 3 0.724
22 4 0.422
22 5 0.156
22 6 0.921
22 7 0.573
22 8 0.281
22 9 1.129
22 10 0.811
22 11 0.314
22 12 1.277
22 13 0.824
22 14 0.361
22 15 1.608
22 16 1.023
22 17 0.417
22 18 1.689
22 19 1.198
22 20 0.532
22 21 2.006
22 22 1.339
22 23 0.664
23 0 0.379
23 1 0.198
23 2 0.072
23 3 0.736
23 4 0.431
23 5 0.182
23 6 0.963
23 7 0.580
23 8 0.279
23 9 1.170
23 10 0.842
23 11 0.342
23 12 1.316
23 13 0.779
23 14 0.385
23 15 1.615
23 16 1.088
23 17 0.478
23 18 1.731
23 19 1.231
23 20 0.530
23 21 2.106
23 22 1.382
23 23 0.673
24 0 0.458
24 1 0.241
24 2 0.093
24 3 0.819
24 4 0.487
24 5 0.205
24 6 1.041
24 7 0.646
24 8 0.324
24 9 1.233
24 10 0.884
24 11 0.336
24 12 1.464
24 13 0.895
24 14 0.438
24 15 1.677
24 16 1.097
24 17 0.452
24 18 1.866
24 19 1.294
24 20 0.519
24 21 2.269
24 22 1.482
24 23 0.780
25 0 0.450
25 1 0.243
25 2 0.091
25 3 0.839
25 4 0.498
25 5 0.202
25 6 1.019
25 7 0.646
25 8 0.332
25 9 1.298
25 10 0.892
25 11 0.361
25 12 1.490
25 13 0.910
25 14 0.436
25 15 1.741
25 16 1.133
25 17 0.481
25 18 1.914
25 19 1.355
25 20 0.621
25 21 2.415
25 22 1.516
25 23 0.739
26 0 0.466
26 1 0.247
26 2 0.089
26 3 0.839
26 4 0.488
26 5 0.196
26 6 1.038
26 7 0.675
26 8 0.323
26 9 1.244
26 10 0.891
26 11 0.364
26 12 1.484
26 13 0.864
26 14 0.399
26 15 1.705
26 16 1.115
26 17 0.433
26 18 1.938
26 19 1.359
26 20 0.532
26 21 2.315
26 22 1.505
26 23 0.687
27 0 0.507
27 1 0.264
27 2 0.099
27 3 0.892
27 4 0.538
27 5 0.218
27 6 1.109
27 7 0.709
27 8 0.350
27 9 1.334
27 10 0.933
27 11 0.414
27 12 1.541
27 13 0.950
27 14 0.439
27 15 1.752
27 16 1.180
27 17 0.498
27 18 1.993
27 19 1.409
27 20 0.597
27 21 2.464
27 22 1.610
27 23 0.716
28 0 0.435
28 1 0.234
28 2 0.085
28 3 0.798
28 4 0.453
28 5 0.192
28 6 1.025
28 7 0.638
28 8 0.323
28 9 1.242
28 10 0.878
28 11 0.311
28 12 1.450
28 13 0.846
28 14 0.411
28 15 1.663
28 16 1.091
28 17 0.416
28 18 1.898
28 19 1.236
28 20 0.532
28 21 2.287
28 22 1.470
28 23 0.603
29 0 0.433
29 1 0.231
29 2 0.084
29 3 0.817
29 4 0.481
29 5 0.199
29 6 1.017
29 7 0.635
29 8 0.318
29 9 1.242
29 10 0.858
29 11 0.361
29 12 1.424
29 13 0.861
29 14 0.429
29 15 1.730
29 16 1.075
29 17 0.411
29 18 1.976
29 19 1.276
29 20 0.576
29 21 2.375
29 22 1.457
29 23 0.733
30 0 0.384
30 1 0.203
30 2 0.072
30 3 0.742
30 4 0.429
30 5 0.170
30 6 0.946
30 7 0.595
30 8 0.286
30 9 1.146
30 10 0.834
30 11 0.353
30 12 1.336
30 13 0.761
30 14 0.381
30 15 1.571
30 16 1.011
30 17 0.401
30 18 1.824
30 19 1.225
30 20 0.542
30 21 2.174
30 22 1.371
30 23 0.626
31 0 0.433
31 1 0.228
31 2 0.082
31 3 0.804
31 4 0.465
31 5 0.190
31 6 1.000
31 7 0.599
31 8 0.292
31 9 1.247
31 10 0.882
31 11 0.349
31 12 1.435
31 13 0.837
31 14 0.390
31 15 1.671
31 16 1.105
31 17 0.454
31 18 1.871
31 19 1.295
31 20 0.596
31 21 2.362
31 22 1.446
31 23 0.601
32 0 0.482
32 1 0.258
32 2 0.098
32 3 0.849
32 4 0.517
32 5 0.215
32 6 1.055
32 7 0.684
32 8 0.340
32 9 1.269
32 10 0.881
32 11 0.384
32 12 1.445
32 13 0.927
32 14 0.449
32 15 1.734
32 16 1.180
32 17 0.532
32 18 1.917
32 19 1.368
32 20 0.650
32 21 2.365
32 22 1.582
32 23 0.852
33 0 0.457
33 1 0.245
33 2 0.089
33 3 0.844
33 4 0.493
33 5 0.205
33 6 1.041
33 7 0.668
33 8 0.329
33 9 1.268
33 10 0.915
33 11 0.389
33 12 1.539
33 13 0.892
33 14 0.421
33 15 1.725
33 16 1.128
33 17 0.475
33 18 1.926
33 19 1.317
33 20 0.570
33 21 2.391
33 22 1.435
33 23 0.676
34 0 0.460
34 1 0.250
34 2 0.090
34 3 0.835
34 4 0.490
34 5 0.218
34 6 1.041
34 7 0.654
34 8 0.327
34 9 1.296
34 10 0.934
34 11 0.366
34 12 1.432
34 13 0.881
34 14 0.437
34 15 1.686
34 16 1.135
34 17 0.483
34 18 1.849
34 19 1.312
34 20 0.639
34 21 2.356
34 22 1.495
34 23 0.820
35 0 0.467
35 1 0.236
35 2 0.088
35 3 0.853
35 4 0.498
35 5 0.204
35 6 1.050
35 7 0.641
35 8 0.303
35 9 1.275
35 10 0.867
35 11 0.358
35 12 1.541
35 13 0.911
35 14 0.401
35 15 1.758
35 16 1.166
35 17 0.467
35 18 1.976
35 19 1.380
35 20 0.613
35 21 2.321
35 22 1.610
35 23 0.763
36 0 0.485
36 1 0.257
36 2 0.093
36 3 0.847
36 4 0.509
36 5 0.218
36 6 1.086
36 7 0.683
36 8 0.335
36 9 1.268
36 10 0.893
36 11 0.375
36 12 1.440
36 13 0.921
36 14 0.458
36 15 1.729
36 16 1.177
36 17 0.514
36 18 1.882
36 19 1.355
36 20 0.612
36 21 2.326
36 22 1.514
36 23 0.738
37 0 0.463
37 1 0.250
37 2 0.089
37 3 0.831
37 4 0.504
37 5 0.205
37 6 1.028
37 7 0.627
37 8 0.312
37 9 1.214
37 10 0.869
37 11 0.348
37 12 1.445
37 13 0.874
37 14 0.434
37 15 1.747
37 16 1.134
37 17 0.450
37 18 1.883
37 19 1.287
37 20 0.576
37 21 2.306
37 22 1.371
37 23 0.725
38 0 0.445
38 1 0.232
38 2 0.080
38 3 0.817
38 4 0.489
38 5 0.199
38 6 1.045
38 7 0.632
38 8 0.291
38 9 1.249
38 10 0.902
38 11 0.353
38 12 1.445
38 13 0.898
38 14 0.422
38 15 1.714
38 16 1.128
38 17 0.508
38 18 1.908
38 19 1.394
38 20 0.624
38 21 2.206
38 22 1.548
38 23 0.769
39 0 0.445
39 1 0.226
39 2 0.079
39 3 0.831
39 4 0.487
39 5 0.193
39 6 1.020
39 7 0.652
39 8 0.301
39 9 1.293
39 10 0.932
39 11 0.354
39 12 1.456
39 13 0.892
39 14 0.409
39 15 1.728
39 16 1.168
39 17 0.470
39 18 1.899
39 19 1.324
39 20 0.549
39 21 2.297
39 22 1.592
39 23 0.742
40 0 0.510
40 1 0.277
40 2 0.103
40 3 0.878
40 4 0.540
40 5 0.232
40 6 1.127
40 7 0.707
40 8 0.332
40 9 1.328
40 10 0.929
40 11 0.389
40 12 1.484
40 13 0.959
40 14 0.434
40 15 1.766
40 16 1.214
40 17 0.524
40 18 1.998
40 19 1.422
40 20 0.577
40 21 2.234
40 22 1.584
40 23 0.780
41 0 0.494
41 1 0.261
41 2 0.098
41 3 0.875
41 4 0.528
41 5 0.225
41 6 1.097
41 7 0.677
41 8 0.320
41 9 1.288
41 10 0.916
41 11 0.391
41 12 1.463
41 13 0.926
41 14 0.417
41 15 1.776
41 16 1.186
41 17 0.493
41 18 1.966
41 19 1.389
41 20 0.618
41 21 2.267
41 22 1.612
41 23 0.724
42 0 0.474
42 1 0.253
42 2 0.089
42 3 0.849
42 4 0.500
42 5 0.198
42 6 1.064
42 7 0.657
42 8 0.316
42 9 1.299
42 10 0.917
42 11 0.370
42 12 1.512
42 13 0.880
42 14 0.440
42 15 1.716
42 16 1.135
42 17 0.487
42 18 1.978
42 19 1.350
42 20 0.536
42 21 2.383
42 22 1.553
42 23 0.749
43 0 0.450
43 1 0.230
43 2 0.086
43 3 0.832
43 4 0.487
43 5 0.194
43 6 1.032
43 7 0.645
43 8 0.313
43 9 1.224
43 10 0.917
43 11 0.367
43 12 1.394
43 13 0.877
43 14 0.387
43 15 1.672
43 16 1.140
43 17 0.461
43 18 1.856
43 19 1.328
43 20 0.567
43 21 2.226
43 22 1.458
43 23 0.733
44 0 0.530
44 1 0.280
44 2 0.104
44 3 0.917
44 4 0.548
44 5 0.233
44 6 1.144
44 7 0.717
44 8 0.331
44 9 1.409
44 10 0.982
44 11 0.394
44 12 1.557
44 13 0.996
44 14 0.468
44 15 1.844
44 16 1.200
44 17 0.532
44 18 2.114
44 19 1.453
44 20 0.624
44 21 2.451
44 22 1.719
44 23 0.810
45 0 0.475
45 1 0.252
45 2 0.094
45 3 0.841
45 4 0.524
45 5 0.222
45 6 1.045
45 7 0.673
45 8 0.322
45 9 1.307
45 10 0.888
45 11 0.375
45 12 1.475
45 13 0.931
45 14 0.435
45 15 1.709
45 16 1.164
45 17 0.517
45 18 1.929
45 19 1.365
45 20 0.638
45 21 2.250
45 22 1.606
45 23 0.803
46 0 0.499
46 1 0.271
46 2 0.100
46 3 0.877
46 4 0.528
46 5 0.230
46 6 1.115
46 7 0.679
46 8 0.342
46 9 1.410
46 10 0.957
46 11 0.360
46 12 1.497
46 13 0.906
46 14 0.466
46 15 1.811
46 16 1.201
46 17 0.517
46 18 2.048
46 19 1.404
46 20 0.627
46 21 2.424
46 22 1.616
46 23 0.803
47 0 0.453
47 1 0.235
47 2 0.086
47 3 0.827
47 4 0.491
47 5 0.200
47 6 1.039
47 7 0.635
47 8 0.323
47 9 1.259
47 10 0.933
47 11 0.336
47 12 1.406
47 13 0.858
47 14 0.392
47 15 1.696
47 16 1.155
47 17 0.470
47 18 1.844
47 19 1.274
47 20 0.549
47 21 2.322
47 22 1.504
47 23 0.733
48 0 0.453
48 1 0.232
48 2 0.088
48 3 0.837
48 4 0.481
48 5 0.207
48 6 1.051
48 7 0.647
48 8 0.308
48 9 1.253
48 10 0.929
48 11 0.374
48 12 1.478
48 13 0.905
48 14 0.419
48 15 1.739
48 16 1.162
48 17 0.476
48 18 1.989
48 19 1.390
48 20 0.594
48 21 2.412
48 22 1.551
48 23 0.725
49 0 0.442
49 1 0.223
49 2 0.080
49 3 0.793
49 4 0.468
49 5 0.200
49 6 1.022
49 7 0.637
49 8 0.314
49 9 1.350
49 10 0.891
49 11 0.348
49 12 1.395
49 13 0.850
49 14 0.409
49 15 1.681
49 16 1.131
49 17 0.488
49 18 1.913
49 19 1.276
49 20 0.533
49 21 2.178
49 22 1.522
49 23 0.749
50 0 0.497
50 1 0.274
50 2 0.099
50 3 0.894
50 4 0.528
50 5 0.225
50 6 1.099
50 7 0.699
50 8 0.328
50 9 1.391
50 10 0.952
50 11 0.406
50 12 1.474
50 13 0.913
50 14 0.432
50 15 1.832
50 16 1.256
50 17 0.531
50 18 1.954
50 19 1.434
50 20 0.616
50 21 2.285
50 22 1.528
50 23 0.795
51 0 0.457
51 1 0.239
51 2 0.083
51 3 0.840
51 4 0.497
51 5 0.196
51 6 1.047
51 7 0.661
51 8 0.311
51 9 1.238
51 10 0.902
51 11 0.382
51 12 1.455
51 13 0.904
51 14 0.396
51 15 1.683
51 16 1.198
51 17 0.490
51 18 1.869
51 19 1.326
51 20 0.606
51 21 2.174
51 22 1.527
51 23 0.727
52 0 0.502
52 1 0.267
52 2 0.100
52 3 0.898
52 4 0.527
52 5 0.229
52 6 1.134
52 7 0.698
52 8 0.339
52 9 1.381
52 10 0.975
52 11 0.377
52 12 1.515
52 13 0.910
52 14 0.444
52 15 1.840
52 16 1.206
52 17 0.505
52 18 2.005
52 19 1.433
52 20 0.579
52 21 2.467
52 22 1.679
52 23 0.824
53 0 0.460
53 1 0.236
53 2 0.087
53 3 0.837
53 4 0.475
53 5 0.201
53 6 1.030
53 7 0.636
53 8 0.308
53 9 1.278
53 10 0.897
53 11 0.356
53 12 1.405
53 13 0.884
53 14 0.405
53 15 1.670
53 16 1.161
53 17 0.484
53 18 1.912
53 19 1.351
53 20 0.589
53 21 2.192
53 22 1.487
53 23 0.713
54 0 0.497
54 1 0.272
54 2 0.103
54 3 0.881
54 4 0.527
54 5 0.224
54 6 1.125
54 7 0.709
54 8 0.333
54 9 1.422
54 10 0.980
54 11 0.355
54 12 1.487
54 13 0.920
54 14 0.447
54 15 1.751
54 16 1.193
54 17 0.511
54 18 1.933
54 19 1.409
54 20 0.616
54 21 2.417
54 22 1.614
54 23 0.781
55 0 0.458
55 1 0.235
55 2 0.084
55 3 0.829
55 4 0.499
55 5 0.210
55 6 1.084
55 7 0.653
55 8 0.304
55 9 1.348
55 10 0.936
55 11 0.373
55 12 1.426
55 13 0.865
55 14 0.401
55 15 1.681
55 16 1.147
55 17 0.484
55 18 1.939
55 19 1.370
55 20 0.621
55 21 2.246
55 22 1.563
55 23 0.722
56 0 0.517
56 1 0.273
56 2 0.105
56 3 0.906
56 4 0.537
56 5 0.230
56 6 1.158
56 7 0.715
56 8 0.342
56 9 1.404
56 10 0.990
56 11 0.382
56 12 1.495
56 13 0.947
56 14 0.462
56 15 1.789
56 16 1.205
56 17 0.523
56 18 2.035
56 19 1.416
56 20 0.655
56 21 2.329
56 22 1.600
56 23 0.806
57 0 0.457
57 1 0.234
57 2 0.088
57 3 0.817
57 4 0.483
57 5 0.199
57 6 1.103
57 7 0.673
57 8 0.306
57 9 1.341
57 10 0.966
57 11 0.398
57 12 1.381
57 13 0.883
57 14 0.382
57 15 1.761
57 16 1.175
57 17 0.491
57 18 2.049
57 19 1.373
57 20 0.642
57 21 2.267
57 22 1.546
57 23 0.738
58 0 0.519
58 1 0.281
58 2 0.108
58 3 0.882
58 4 0.552
58 5 0.233
58 6 1.127
58 7 0.707
58 8 0.329
58 9 1.383
58 10 0.971
58 11 0.394
58 12 1.512
58 13 0.963
58 14 0.465
58 15 1.805
58 16 1.181
58 17 0.517
58 18 2.037
58 19 1.397
58 20 0.658
58 21 2.524
58 22 1.604
58 23 0.811
59 0 0.468
59 1 0.241
59 2 0.088
59 3 0.833
59 4 0.500
59 5 0.211
59 6 1.076
59 7 0.650
59 8 0.316
59 9 1.312
59 10 0.889
59 11 0.357
59 12 1.462
59 13 0.858
59 14 0.415
59 15 1.726
59 16 1.159
59 17 0.496
59 18 2.043
59 19 1.377
59 20 0.596
59 21 2.382
59 22 1.485
59 23 0.758
60 0 0.546
60 1 0.299
60 2 0.111
60 3 0.941
60 4 0.574
60 5 0.249
60 6 1.181
60 7 0.742
60 8 0.347
60 9 1.427
60 10 1.009
60 11 0.400
60 12 1.544
60 13 1.012
60 14 0.481
60 15 1.827
60 16 1.260
60 17 0.532
60 18 2.071
60 19 1.481
60 20 0.662
60 21 2.347
60 22 1.603
60 23 0.850
61 0 0.507
61 1 0.273
61 2 0.101
61 3 0.889
61 4 0.528
61 5 0.236
61 6 1.115
61 7 0.688
61 8 0.324
61 9 1.381
61 10 0.959
61 11 0.406
61 12 1.437
61 13 0.910
61 14 0.434
61 15 1.697
61 16 1.179
61 17 0.524
61 18 1.956
61 19 1.376
61 20 0.594
61 21 2.224
61 22 1.568
61 23 0.709
62 0 0.485
62 1 0.262
62 2 0.098
62 3 0.890
62 4 0.523
62 5 0.213
62 6 1.154
62 7 0.697
62 8 0.331
62 9 1.381
62 10 0.994
62 11 0.370
62 12 1.482
62 13 0.935
62 14 0.411
62 15 1.751
62 16 1.191
62 17 0.511
62 18 1.998
62 19 1.469
62 20 0.585
62 21 2.342
62 22 1.636
62 23 0.728
63 0 0.432
63 1 0.222
63 2 0.084
63 3 0.804
63 4 0.475
63 5 0.201
63 6 1.054
63 7 0.652
63 8 0.311
63 9 1.287
63 10 0.918
63 11 0.351
63 12 1.376
63 13 0.810
63 14 0.419
63 15 1.662
63 16 1.117
63 17 0.458
63 18 1.937
63 19 1.346
63 20 0.594
63 21 2.282
63 22 1.477
63 23 0.762
64 0 0.490
64 1 0.253
64 2 0.091
64 3 0.872
64 4 0.520
64 5 0.207
64 6 1.104
64 7 0.686
64 8 0.311
64 9 1.348
64 10 0.937
64 11 0.355
64 12 1.518
64 13 0.917
64 14 0.422
64 15 1.778
64 16 1.164
64 17 0.503
64 18 1.973
64 19 1.384
64 20 0.547
64 21 2.380
64 22 1.518
64 23 0.738
65 0 0.438
65 1 0.229
65 2 0.084
65 3 0.821
65 4 0.481
65 5 0.196
65 6 1.066
65 7 0.653
65 8 0.303
65 9 1.298
65 10 0.949
65 11 0.359
65 12 1.426
65 13 0.859
65 14 0.391
65 15 1.771
65 16 1.173
65 17 0.430
65 18 1.965
65 19 1.374
65 20 0.534
65 21 2.099
65 22 1.542
65 23 0.707
66 0 0.490
66 1 0.264
66 2 0.095
66 3 0.858
66 4 0.509
66 5 0.220
66 6 1.113
66 7 0.681
66 8 0.317
66 9 1.346
66 10 0.954
66 11 0.402
66 12 1.458
66 13 0.924
66 14 0.425
66 15 1.733
66 16 1.126
66 17 0.518
66 18 1.977
66 19 1.401
66 20 0.586
66 21 2.266
66 22 1.607
66 23 0.723
67 0 0.473
67 1 0.243
67 2 0.090
67 3 0.868
67 4 0.505
67 5 0.210
67 6 1.097
67 7 0.653
67 8 0.304
67 9 1.353
67 10 0.944
67 11 0.369
67 12 1.469
67 13 0.888
67 14 0.410
67 15 1.731
67 16 1.174
67 17 0.513
67 18 1.898
67 19 1.392
67 20 0.539
67 21 2.260
67 22 1.573
67 23 0.684
68 0 0.505
68 1 0.273
68 2 0.101
68 3 0.885
68 4 0.534
68 5 0.235
68 6 1.165
68 7 0.693
68 8 0.332
68 9 1.403
68 10 0.963
68 11 0.390
68 12 1.514
68 13 0.966
68 14 0.471
68 15 1.839
68 16 1.188
68 17 0.541
68 18 2.019
68 19 1.411
68 20 0.571
68 21 2.456
68 22 1.667
68 23 0.780
69 0 0.459
69 1 0.232
69 2 0.085
69 3 0.837
69 4 0.488
69 5 0.208
69 6 1.120
69 7 0.647
69 8 0.319
69 9 1.360
69 10 0.937
69 11 0.353
69 12 1.469
69 13 0.903
69 14 0.427
69 15 1.798
69 16 1.188
69 17 0.454
69 18 1.893
69 19 1.351
69 20 0.556
69 21 2.278
69 22 1.555
69 23 0.746
70 0 0.482
70 1 0.253
70 2 0.097
70 3 0.863
70 4 0.506
70 5 0.214
70 6 1.129
70 7 0.669
70 8 0.300
70 9 1.354
70 10 0.959
70 11 0.365
70 12 1.509
70 13 0.921
70 14 0.441
70 15 1.804
70 16 1.175
70 17 0.481
70 18 1.985
70 19 1.393
70 20 0.575
70 21 2.334
70 22 1.554
70 23 0.785
71 0 0.432
71 1 0.219
71 2 0.078
71 3 0.808
71 4 0.464
71 5 0.187
71 6 1.029
71 7 0.625
71 8 0.287
71 9 1.334
71 10 0.862
71 11 0.315
71 12 1.429
71 13 0.839
71 14 0.365
71 15 1.659
71 16 1.079
71 17 0.427
71 18 1.871
71 19 1.320
71 20 0.499
71 21 2.279
71 22 1.470
71 23 0.638
72 0 0.528
72 1 0.279
72 2 0.101
72 3 0.926
72 4 0.554
72 5 0.235
72 6 1.184
72 7 0.725
72 8 0.341
72 9 1.403
72 10 0.998
72 11 0.388
72 12 1.582
72 13 0.981
72 14 0.470
72 15 1.838
72 16 1.240
72 17 0.541
72 18 2.057
72 19 1.463
72 20 0.661
72 21 2.471
72 22 1.657
72 23 0.795
73 0 0.507
73 1 0.263
73 2 0.095
73 3 0.904
73 4 0.534
73 5 0.222
73 6 1.135
73 7 0.701
73 8 0.323
73 9 1.352
73 10 0.974
73 11 0.366
73 12 1.505
73 13 0.954
73 14 0.455
73 15 1.744
73 16 1.224
73 17 0.531
73 18 2.008
73 19 1.412
73 20 0.617
73 21 2.280
73 22 1.653
73 23 0.800
74 0 0.582
74 1 0.315
74 2 0.118
74 3 1.002
74 4 0.607
74 5 0.259
74 6 1.252
74 7 0.783
74 8 0.371
74 9 1.504
74 10 1.038
74 11 0.431
74 12 1.657
74 13 1.033
74 14 0.515
74 15 1.917
74 16 1.273
74 17 0.592
74 18 2.255
74 19 1.538
74 20 0.696
74 21 2.499
74 22 1.799
74 23 0.879
75 0 0.522
75 1 0.282
75 2 0.106
75 3 0.895
75 4 0.541
75 5 0.238
75 6 1.134
75 7 0.708
75 8 0.343
75 9 1.376
75 10 0.997
75 11 0.392
75 12 1.500
75 13 0.920
75 14 0.446
75 15 1.760
75 16 1.113
75 17 0.532
75 18 2.001
75 19 1.428
75 20 0.687
75 21 2.316
75 22 1.421
75 23 0.724
76 0 0.574
76 1 0.300
76 2 0.116
76 3 0.990
76 4 0.568
76 5 0.247
76 6 1.230
76 7 0.740
76 8 0.346
76 9 1.493
76 10 1.010
76 11 0.404
76 12 1.669
76 13 0.995
76 14 0.477
76 15 1.885
76 16 1.256
76 17 0.576
76 18 2.099
76 19 1.463
76 20 0.656
76 21 2.530
76 22 1.673
76 23 0.902
77 0 0.517
77 1 0.275
77 2 0.107
77 3 0.900
77 4 0.536
77 5 0.242
77 6 1.156
77 7 0.720
77 8 0.337
77 9 1.394
77 10 0.957
77 11 0.387
77 12 1.542
77 13 0.947
77 14 0.488
77 15 1.859
77 16 1.141
77 17 0.513
77 18 2.068
77 19 1.368
77 20 0.593
77 21 2.353
77 22 1.550
77 23 0.799
78 0 0.600
78 1 0.324
78 2 0.123
78 3 1.029
78 4 0.612
78 5 0.258
78 6 1.252
78 7 0.759
78 8 0.372
78 9 1.480
78 10 1.035
78 11 0.406
78 12 1.680
78 13 1.042
78 14 0.512
78 15 1.932
78 16 1.284
78 17 0.581
78 18 2.205
78 19 1.479
78 20 0.670
78 21 2.578
78 22 1.763
78 23 0.879
79 0 0.524
79 1 0.281
79 2 0.104
79 3 0.929
79 4 0.551
79 5 0.244
79 6 1.177
79 7 0.729
79 8 0.326
79 9 1.419
79 10 0.985
79 11 0.385
79 12 1.555
79 13 0.976
79 14 0.433
79 15 1.804
79 16 1.217
79 17 0.507
79 18 2.047
79 19 1.412
79 20 0.558
79 21 2.357
79 22 1.707
79 23 0.768
80 0 0.560
80 1 0.301
80 2 0.117
80 3 0.956
80 4 0.567
80 5 0.243
80 6 1.182
80 7 0.742
80 8 0.339
80 9 1.421
80 10 1.012
80 11 0.400
80 12 1.610
80 13 1.007
80 14 0.468
80 15 1.888
80 16 1.231
80 17 0.540
80 18 2.060
80 19 1.419
80 20 0.579
80 21 2.537
80 22 1.678
80 23 0.816
81 0 0.515
81 1 0.269
81 2 0.104
81 3 0.896
81 4 0.536
81 5 0.226
81 6 1.124
81 7 0.705
81 8 0.320
81 9 1.415
81 10 0.982
81 11 0.375
81 12 1.499
81 13 0.927
81 14 0.432
81 15 1.802
81 16 1.174
81 17 0.501
81 18 1.980
81 19 1.389
81 20 0.546
81 21 2.328
81 22 1.604
81 23 0.735
82 0 0.571
82 1 0.304
82 2 0.114
82 3 0.968
82 4 0.585
82 5 0.254
82 6 1.192
82 7 0.723
82 8 0.336
82 9 1.446
82 10 0.980
82 11 0.376
82 12 1.614
82 13 1.008
82 14 0.471
82 15 1.883
82 16 1.261
82 17 0.565
82 18 2.058
82 19 1.423
82 20 0.634
82 21 2.459
82 22 1.660
82 23 0.834
83 0 0.513
83 1 0.271
83 2 0.104
83 3 0.906
83 4 0.542
83 5 0.238
83 6 1.146
83 7 0.689
83 8 0.340
83 9 1.417
83 10 0.931
83 11 0.370
83 12 1.484
83 13 0.934
83 14 0.451
83 15 1.843
83 16 1.186
83 17 0.531
83 18 2.080
83 19 1.386
83 20 0.553
83 21 2.346
83 22 1.593
83 23 0.746
84 0 0.602
84 1 0.316
84 2 0.125
84 3 1.017
84 4 0.605
84 5 0.260
84 6 1.219
84 7 0.752
84 8 0.345
84 9 1.494
84 10 1.030
84 11 0.425
84 12 1.678
84 13 1.046
84 14 0.499
84 15 1.929
84 16 1.314
84 17 0.546
84 18 2.186
84 19 1.526
84 20 0.687
84 21 2.595
84 22 1.781
84 23 0.811
85 0 0.565
85 1 0.304
85 2 0.120
85 3 0.961
85 4 0.585
85 5 0.251
85 6 1.201
85 7 0.733
85 8 0.345
85 9 1.400
85 10 1.012
85 11 0.400
85 12 1.605
85 13 0.992
85 14 0.464
85 15 1.851
85 16 1.257
85 17 0.515
85 18 2.106
85 19 1.458
85 20 0.570
85 21 2.431
85 22 1.670
85 23 0.809
86 0 0.539
86 1 0.298
86 2 0.115
86 3 0.941
86 4 0.572
86 5 0.248
86 6 1.212
86 7 0.706
86 8 0.349
86 9 1.464
86 10 1.012
86 11 0.369
86 12 1.577
86 13 0.958
86 14 0.461
86 15 1.856
86 16 1.267
86 17 0.525
86 18 1.976
86 19 1.402
86 20 0.625
86 21 2.492
86 22 1.690
86 23 0.866
87 0 0.531
87 1 0.282
87 2 0.107
87 3 0.919
87 4 0.546
87 5 0.227
87 6 1.159
87 7 0.700
87 8 0.323
87 9 1.441
87 10 0.989
87 11 0.374
87 12 1.540
87 13 0.948
87 14 0.426
87 15 1.830
87 16 1.230
87 17 0.533
87 18 2.075
87 19 1.421
87 20 0.574
87 21 2.354
87 22 1.591
87 23 0.750
88 0 0.547
88 1 0.285
88 2 0.105
88 3 0.957
88 4 0.568
88 5 0.241
88 6 1.211
88 7 0.724
88 8 0.326
88 9 1.443
88 10 0.973
88 11 0.380
88 12 1.610
88 13 1.002
88 14 0.472
88 15 1.877
88 16 1.248
88 17 0.561
88 18 2.058
88 19 1.467
88 20 0.631
88 21 2.453
88 22 1.688
88 23 0.796
89 0 0.492
89 1 0.268
89 2 0.100
89 3 0.859
89 4 0.518
89 5 0.210
89 6 1.107
89 7 0.682
89 8 0.322
89 9 1.405
89 10 0.945
89 11 0.350
89 12 1.437
89 13 0.918
89 14 0.420
89 15 1.765
89 16 1.160
89 17 0.493
89 18 1.948
89 19 1.344
89 20 0.599
89 21 2.236
89 22 1.489
89 23 0.755
90 0 0.547
90 1 0.295
90 2 0.114
90 3 0.954
90 4 0.570
90 5 0.252
90 6 1.195
90 7 0.731
90 8 0.343
90 9 1.399
90 10 1.032
90 11 0.393
90 12 1.604
90 13 1.022
90 14 0.483
90 15 1.841
90 16 1.271
90 17 0.546
90 18 2.065
90 19 1.498
90 20 0.605
90 21 2.433
90 22 1.748
90 23 0.844
91 0 0.514
91 1 0.277
91 2 0.105
91 3 0.899
91 4 0.530
91 5 0.230
91 6 1.144
91 7 0.699
91 8 0.324
91 9 1.395
91 10 0.956
91 11 0.397
91 12 1.551
91 13 0.914
91 14 0.443
91 15 1.808
91 16 1.155
91 17 0.504
91 18 1.996
91 19 1.408
91 20 0.547
91 21 2.401
91 22 1.491
91 23 0.739
92 0 0.563
92 1 0.302
92 2 0.113
92 3 0.969
92 4 0.585
92 5 0.245
92 6 1.202
92 7 0.748
92 8 0.345
92 9 1.475
92 10 1.009
92 11 0.378
92 12 1.591
92 13 1.030
92 14 0.497
92 15 1.893
92 16 1.284
92 17 0.573
92 18 2.072
92 19 1.459
92 20 0.675
92 21 2.410
92 22 1.696
92 23 0.885
93 0 0.520
93 1 0.283
93 2 0.108
93 3 0.886
93 4 0.532
93 5 0.241
93 6 1.126
93 7 0.687
93 8 0.331
93 9 1.414
93 10 0.974
93 11 0.389
93 12 1.490
93 13 0.932
93 14 0.443
93 15 1.750
93 16 1.122
93 17 0.489
93 18 2.002
93 19 1.391
93 20 0.593
93 21 2.308
93 22 1.530
93 23 0.763
94 0 0.530
94 1 0.277
94 2 0.104
94 3 0.932
94 4 0.560
94 5 0.240
94 6 1.186
94 7 0.728
94 8 0.341
94 9 1.435
94 10 1.008
94 11 0.379
94 12 1.510
94 13 0.970
94 14 0.448
94 15 1.824
94 16 1.252
94 17 0.542
94 18 2.067
94 19 1.424
94 20 0.613
94 21 2.477
94 22 1.622
94 23 0.796
95 0 0.505
95 1 0.275
95 2 0.107
95 3 0.888
95 4 0.533
95 5 0.238
95 6 1.121
95 7 0.698
95 8 0.337
95 9 1.405
95 10 0.980
95 11 0.377
95 12 1.468
95 13 0.915
95 14 0.452
95 15 1.810
95 16 1.148
95 17 0.501
95 18 2.004
95 19 1.360
95 20 0.604
95 21 2.369
95 22 1.545
95 23 0.736
With the simulation number set to 100,000, this program took less than 2 minutes (!) to run and compute all of those REs. What I need to do now is calculate the transition probabilities for every pitcher/season (using their defense independent statistical lines)...if I have that, I can Monte-Carlo simulate REs for every pitcher/season in the database in no time flat and if I have those REs I can find custom linear weights for pitchers. THAT would make this project a valuable one.
SABR Matt
09-10-2006, 09:43 PM
A further update...I increased the simulated inning count to 1,000,000 (requiring the program to simulate 2.4 billion innings approximately) and it ran in 14 minutes and 8 seconds.
The answer it output were pretty much the same...maybe one or two changes were as much as .001
I'm going to modify the code you see above to have the program also store a vector of structs containing counts of the number of times an inning resulted in 0,1,2,3...runs scoring...this will give me MonteCarlo probabilities of run scoring which can be used to start the WPA project.
SABR Matt
09-11-2006, 12:32 PM
even though I'm practically talking to myself it seems...I'll continue posting progress in case someone out there might have intererst...I want to make whatever work I do available whenever possible.
I successfully added run scoring probabilities to the MonteCarlo program last night...in a new 1,000,000-inning trial set (which took 15 minutes and 3 seconds to run)...implying 2.3 billion simulated innings, I get this as the frequency and probability distribution for all states combined:
RS Frequency Probability
0 1249441554 0.54229
1 506331221 0.21976
2 269204013 0.11684
3 145219672 0.06303
4 77414270 0.03360
5 33048902 0.01434
6 13931535 0.00605
7 5688346 0.00247
8 2272014 0.00099
9 892886 0.00039
10 344676 0.00015
11 132016 0.00006
12 49097 0.00002
13 18653 0.00001
14 6964 0.00000
15 2726 0.00000
16 877 0.00000
17 367 0.00000
18 130 0.00000
19 54 0.00000
20 21 0.00000
21 4 0.00000
22 1 0.00000
23 0 0.00000
24 1 0.00000
There is a probability, no matter how remote that some day...some time...a team will score 24 runs in one inning. The most I can remember seeing was 17 back in a game in like 1995 between the Rangers and Orioles (27-10 Baltimore I believe...17 runs in the 8th)...but recall...in the entire PBP database there are only about 1.6 Million innings...in the history of the game...maybe 3 million innings...so we haven't gotten nearly as large a sample as my simulation.
Granted...this is with all starting scenarios...the 24 run inning probably started with bases loaded and no one out or something. :) Breaking that information up into starting states all time:
BOS RS Freq prob
0 0 70624931 0.73568
0 1 13857835 0.14435
0 2 6401468 0.06668
0 3 2911379 0.03033
0 4 1284351 0.01338
0 5 544992 0.00568
0 6 225503 0.00235
0 7 90949 0.00095
0 8 35852 0.00037
0 9 14028 0.00015
0 10 5446 0.00006
0 11 2069 0.00002
0 12 758 0.00001
0 13 277 0.00000
0 14 107 0.00000
0 15 29 0.00000
0 16 17 0.00000
0 17 6 0.00000
0 18 2 0.00000
0 19 1 0.00000
1 0 81472203 0.84867
1 1 8675060 0.09037
1 2 3590498 0.03740
1 3 1398070 0.01456
1 4 540262 0.00563
1 5 204513 0.00213
1 6 76259 0.00079
1 7 27567 0.00029
1 8 9905 0.00010
1 9 3644 0.00004
1 10 1309 0.00001
1 11 451 0.00000
1 12 169 0.00000
1 13 57 0.00000
1 14 24 0.00000
1 15 7 0.00000
1 16 1 0.00000
1 17 1 0.00000
2 0 90007882 0.93758
2 1 4027508 0.04195
2 2 1351560 0.01408
2 3 416452 0.00434
2 4 133778 0.00139
2 5 42832 0.00045
2 6 13604 0.00014
2 7 4330 0.00005
2 8 1394 0.00001
2 9 432 0.00000
2 10 148 0.00000
2 11 50 0.00000
2 12 22 0.00000
2 13 4 0.00000
2 14 2 0.00000
2 15 1 0.00000
2 16 1 0.00000
3 0 55982880 0.58316
3 1 17611351 0.18345
3 2 11750088 0.12240
3 3 5832509 0.06076
3 4 2738331 0.02852
3 5 1209976 0.01260
3 6 518689 0.00540
3 7 214027 0.00223
3 8 86322 0.00090
3 9 34339 0.00036
3 10 13381 0.00014
3 11 5030 0.00005
3 12 1894 0.00002
3 13 748 0.00001
3 14 266 0.00000
3 15 109 0.00000
3 16 37 0.00000
3 17 14 0.00000
3 18 6 0.00000
3 19 2 0.00000
3 20 1 0.00000
4 0 70939390 0.73895
4 1 11335861 0.11808
4 2 8150742 0.08490
4 3 3360416 0.03500
4 4 1360879 0.01418
4 5 531237 0.00553
4 6 202534 0.00211
4 7 75777 0.00079
4 8 27429 0.00029
4 9 10133 0.00011
4 10 3641 0.00004
4 11 1264 0.00001
4 12 469 0.00000
4 13 141 0.00000
4 14 61 0.00000
4 15 18 0.00000
4 16 5 0.00000
4 17 3 0.00000
5 0 85147750 0.88696
5 1 4923668 0.05129
5 2 4126692 0.04299
5 3 1220558 0.01271
5 4 394763 0.00411
5 5 127000 0.00132
5 6 40706 0.00042
5 7 12870 0.00013
5 8 4124 0.00004
5 9 1276 0.00001
5 10 393 0.00000
5 11 142 0.00000
5 12 34 0.00000
5 13 15 0.00000
5 14 4 0.00000
5 15 4 0.00000
5 16 1 0.00000
6 0 37582056 0.39148
6 1 33546396 0.34944
6 2 13043707 0.13587
6 3 6552773 0.06826
6 4 3009762 0.03135
6 5 1321684 0.01377
6 6 560732 0.00584
6 7 231270 0.00241
6 8 92374 0.00096
6 9 36580 0.00038
6 10 14129 0.00015
6 11 5350 0.00006
6 12 1963 0.00002
6 13 787 0.00001
6 14 274 0.00000
6 15 91 0.00000
6 16 43 0.00000
6 17 21 0.00000
6 18 7 0.00000
6 19 1 0.00000
7 0 57534100 0.59931
7 1 23264286 0.24234
7 2 8818191 0.09186
7 3 3899078 0.04062
7 4 1534019 0.01598
7 5 594264 0.00619
7 6 224816 0.00234
7 7 83597 0.00087
7 8 30557 0.00032
7 9 10976 0.00011
7 10 3932 0.00004
7 11 1375 0.00001
7 12 522 0.00001
7 13 199 0.00000
7 14 50 0.00000
7 15 25 0.00000
7 16 6 0.00000
7 17 5 0.00000
7 18 2 0.00000
8 0 75069090 0.78197
8 1 14518208 0.15123
8 2 4206796 0.04382
8 3 1508553 0.01571
8 4 474501 0.00494
8 5 151782 0.00158
8 6 48417 0.00050
8 7 15455 0.00016
8 8 4886 0.00005
8 9 1589 0.00002
8 10 486 0.00001
8 11 166 0.00000
8 12 45 0.00000
8 13 15 0.00000
8 14 3 0.00000
8 15 4 0.00000
8 16 3 0.00000
8 17 1 0.00000
9 0 16426377 0.17111
9 1 53340080 0.55563
9 2 13850534 0.14428
9 3 6846254 0.07132
9 4 3140992 0.03272
9 5 1393352 0.01451
9 6 594249 0.00619
9 7 246044 0.00256
9 8 98677 0.00103
9 9 38867 0.00040
9 10 15200 0.00016
9 11 5799 0.00006
9 12 2243 0.00002
9 13 823 0.00001
9 14 306 0.00000
9 15 132 0.00000
9 16 43 0.00000
9 17 13 0.00000
9 18 11 0.00000
9 19 3 0.00000
9 20 1 0.00000
10 0 32641666 0.34002
10 1 47580121 0.49563
10 2 9101633 0.09481
10 3 4059472 0.04229
10 4 1611470 0.01679
10 5 626652 0.00653
10 6 239413 0.00249
10 7 88611 0.00092
10 8 32538 0.00034
10 9 11854 0.00012
10 10 4211 0.00004
10 11 1562 0.00002
10 12 515 0.00001
10 13 186 0.00000
10 14 66 0.00000
10 15 20 0.00000
10 16 6 0.00000
10 17 2 0.00000
10 18 2 0.00000
11 0 70077142 0.72997
11 1 19423085 0.20232
11 2 4277729 0.04456
11 3 1521621 0.01585
11 4 476128 0.00496
11 5 152891 0.00159
11 6 48710 0.00051
11 7 15430 0.00016
11 8 4994 0.00005
11 9 1574 0.00002
11 10 444 0.00000
11 11 172 0.00000
11 12 54 0.00000
11 13 17 0.00000
11 14 8 0.00000
11 15 1 0.00000
12 0 35163565 0.36629
12 1 21336282 0.22225
12 2 16217227 0.16893
12 3 12073462 0.12577
12 4 6155965 0.06412
12 5 2866500 0.02986
12 6 1273513 0.01327
12 7 541139 0.00564
12 8 224296 0.00234
12 9 90203 0.00094
12 10 35497 0.00037
12 11 13935 0.00015
12 12 5196 0.00005
12 13 2061 0.00002
12 14 723 0.00001
12 15 287 0.00000
12 16 85 0.00000
12 17 37 0.00000
12 18 17 0.00000
12 19 4 0.00000
12 20 5 0.00000
12 24 1 0.00000
13 0 54706841 0.56986
13 1 16327871 0.17008
13 2 11093352 0.11556
13 3 8270177 0.08615
13 4 3383411 0.03524
13 5 1369016 0.01426
13 6 532303 0.00554
13 7 200845 0.00209
13 8 73944 0.00077
13 9 27075 0.00028
13 10 9782 0.00010
13 11 3507 0.00004
13 12 1205 0.00001
13 13 422 0.00000
13 14 145 0.00000
13 15 67 0.00000
13 16 16 0.00000
13 17 14 0.00000
13 18 3 0.00000
13 19 4 0.00000
14 0 73576806 0.76643
14 1 11022757 0.11482
14 2 5555444 0.05787
14 3 4032220 0.04200
14 4 1232591 0.01284
14 5 396136 0.00413
14 6 125601 0.00131
14 7 40061 0.00042
14 8 12425 0.00013
14 9 4027 0.00004
14 10 1317 0.00001
14 11 417 0.00000
14 12 135 0.00000
14 13 44 0.00000
14 14 11 0.00000
14 15 8 0.00000
15 0 13634795 0.14203
15 1 41585742 0.43318
15 2 16856029 0.17558
15 3 12366290 0.12882
15 4 6301740 0.06564
15 5 2972212 0.03096
15 6 1325234 0.01380
15 7 566267 0.00590
15 8 235533 0.00245
15 9 94937 0.00099
15 10 37588 0.00039
15 11 14594 0.00015
15 12 5597 0.00006
15 13 2114 0.00002
15 14 848 0.00001
15 15 310 0.00000
15 16 109 0.00000
15 17 44 0.00000
15 18 9 0.00000
15 19 4 0.00000
15 20 3 0.00000
15 21 1 0.00000
16 0 33855601 0.35266
16 1 36782969 0.38316
16 2 11522570 0.12003
16 3 8087205 0.08424
16 4 3474509 0.03619
16 5 1398646 0.01457
16 6 547507 0.00570
16 7 209260 0.00218
16 8 77328 0.00081
16 9 28305 0.00029
16 10 10280 0.00011
16 11 3721 0.00004
16 12 1381 0.00001
16 13 468 0.00000
16 14 166 0.00000
16 15 57 0.00000
16 16 12 0.00000
16 17 7 0.00000
16 18 5 0.00000
16 19 2 0.00000
16 20 1 0.00000
17 0 70145918 0.73069
17 1 13700297 0.14271
17 2 6294050 0.06556
17 3 3989414 0.04156
17 4 1271563 0.01325
17 5 407385 0.00424
17 6 130636 0.00136
17 7 41435 0.00043
17 8 13230 0.00014
17 9 4193 0.00004
17 10 1280 0.00001
17 11 409 0.00000
17 12 128 0.00000
17 13 45 0.00000
17 14 9 0.00000
17 15 7 0.00000
17 17 1 0.00000
18 0 14314542 0.14911
18 1 25457899 0.26519
18 2 29218007 0.30435
18 3 13563675 0.14129
18 4 7398729 0.07707
18 5 3424064 0.03567
18 6 1524334 0.01588
18 7 650071 0.00677
18 8 269440 0.00281
18 9 109208 0.00114
18 10 42913 0.00045
18 11 16924 0.00018
18 12 6259 0.00007
18 13 2464 0.00003
18 14 925 0.00001
18 15 363 0.00000
18 16 110 0.00000
18 17 46 0.00000
18 18 13 0.00000
18 19 9 0.00000
18 20 4 0.00000
18 21 1 0.00000
19 0 31247744 0.32550
19 1 26467919 0.27571
19 2 21743510 0.22649
19 3 8769673 0.09135
19 4 4711615 0.04908
19 5 1871790 0.01950
19 6 740111 0.00771
19 7 282450 0.00294
19 8 105010 0.00109
19 9 38513 0.00040
19 10 13972 0.00015
19 11 5001 0.00005
19 12 1691 0.00002
19 13 640 0.00001
19 14 242 0.00000
19 15 71 0.00000
19 16 25 0.00000
19 17 14 0.00000
19 18 4 0.00000
19 19 3 0.00000
19 20 2 0.00000
20 0 69608988 0.72509
20 1 5905432 0.06151
20 2 14023256 0.14608
20 3 4031229 0.04199
20 4 1674497 0.01744
20 5 513969 0.00535
20 6 165462 0.00172
20 7 52859 0.00055
20 8 16609 0.00017
20 9 5172 0.00005
20 10 1690 0.00002
20 11 559 0.00001
20 12 192 0.00000
20 13 62 0.00000
20 14 16 0.00000
20 15 5 0.00000
20 16 1 0.00000
20 17 2 0.00000
21 0 12741119 0.13272
21 1 23759580 0.24750
21 2 20408932 0.21259
21 3 14892406 0.15513
21 4 12614966 0.13141
21 5 6305161 0.06568
21 6 2982574 0.03107
21 7 1330694 0.01386
21 8 570111 0.00594
21 9 237078 0.00247
21 10 95292 0.00099
21 11 38128 0.00040
21 12 14716 0.00015
21 13 5639 0.00006
21 14 2213 0.00002
21 15 902 0.00001
21 16 310 0.00000
21 17 113 0.00000
21 18 43 0.00000
21 19 18 0.00000
21 20 3 0.00000
21 21 1 0.00000
21 22 1 0.00000
22 0 31843472 0.33170
22 1 23763975 0.24754
22 2 16092455 0.16763
22 3 10384799 0.10817
22 4 8288539 0.08634
22 5 3376911 0.03518
22 6 1386431 0.01444
22 7 539364 0.00562
22 8 204303 0.00213
22 9 76051 0.00079
22 10 28136 0.00029
22 11 10043 0.00010
22 12 3508 0.00004
22 13 1302 0.00001
22 14 447 0.00000
22 15 190 0.00000
22 16 43 0.00000
22 17 21 0.00000
22 18 6 0.00000
22 19 2 0.00000
22 20 1 0.00000
22 21 1 0.00000
23 0 65096696 0.67809
23 1 8117039 0.08455
23 2 11509543 0.11989
23 3 5231987 0.05450
23 4 4206909 0.04382
23 5 1245937 0.01298
23 6 404197 0.00421
23 7 127974 0.00133
23 8 40733 0.00042
23 9 12832 0.00013
23 10 4209 0.00004
23 11 1348 0.00001
23 12 401 0.00000
23 13 123 0.00000
23 14 48 0.00000
23 15 18 0.00000
23 16 3 0.00000
23 17 2 0.00000
23 19 1 0.00000
The 24 run inning happened with first and second and no one out.
The top chunk represents the probability at the start of an inning which is the most interesting part of this (though it's interesting to see how getting the lead-off guy in schoring position or moving runners over changes things etc)...in any given inning throughout the PBP era, you have a 73.5% chance of not scoring.
With runner at first and none out it's a 58.3% chance of not scoring and with a runner at second and one out it's a 59.9% chance of failing to score.
With a runner at third and no one out, you have merely a 14% chance of not scoring...etc
Pretty interesting.
Gubanich Plague
09-11-2006, 02:59 PM
Great work, Matt. You've rendered my work more than obsolete. I absolutely love this stuff.
I've gotten really busy at work lately, so I haven't had any time to play with any of this stuff, which sucks, and there really isn't any letup in sight.
SABR Matt
09-11-2006, 03:24 PM
I haven't done WPA yet. :) You're still ahead of me on that...I'm bad at recursive logic too so I'm going to struggle with that one.
I do highly hope the code I've written gets used my some of the folks around here though...I'm quite proud of how efficient it is (granted...I have a very powerful computer...Athlon64, dual-core 3500+, 400 MHz FSB, 2 GB SDRAM (PC3200 with enhanced onboard dual accessing capabilities), but I've monitored system usage and the code never uses more than a quarter of a Gig of RAM so although it might run slower if you have a lesser motherboard and chipset, it is not RAM intensive despite the enormous number of calculations. I've been able to run several other programs while calculating simulations and never had a problem. :)
I'm sure a professional coder would find ways to squeeze a little more efficiency and functionality out of my little application, but I'm just a layman and did most of this work without outside help aside from the debugging.
I hope you get some free time in the near future to return to this stuff because if you enjoy working with the data as much as I do...it would suck not to be able to spend a little time with it.
SABR Matt
09-11-2006, 03:27 PM
I also want to calculate leverage indices with the PBP data too...it's going to be important for the careful evaluation of bullpens as they relate to impacting pythagorean W% and actual W%...so that's two more major tasks ahead of me...
Tango Tiger
09-13-2006, 07:41 AM
Matt, just want to say I appreciate your efforts! The worst part about programming is debugging, and it's not fun spending hours looking for something that looks so benign, but is critical.
SABR Matt
09-13-2006, 10:49 AM
An example...
My searching algorithm was going into an infinite loop for high random doubles (high within the range of 0 to 1) and for a while I couldn't figure out why. I was fixated on what happens within the IF statement responsible for handling edge cases near the top of the probability ranks (the high base/out states...2 outs bases loaded for example) when it turned out the problem was that the way I selected the next index to search was by doing this:
(Max + Min) / 2
Max is the highest index remaining in the range that possibly contains the right index...Min is the lowest. They're both integers and the integer division makes it impossible to move on with the search if Max is one greater than Min unless each time you unsuccessfully search an index, you remove it from the range (it and everything beyond it (either above or below) that cannot posibly contain the key).
I was re-setting the range by saying:
IF the key is less than the lowest number within the probability range of the search index, set the min EQUAL to the search index. I needed to set the min to the integer ABOVE the index.
This was extremely hard to find. Debugging is a royal pain when you've got sneaky logical errors hiding in there.
SABR Matt
09-13-2006, 11:02 AM
I had a bit of an epiphony today when I was thinking about how to do WPA with a recursive function and hating the whole idea of trying to code recursively.
What if instead of simulating each play and by so doing calculating probability of run scoring in an inning or Run Expectency...we simulated (using our RS probabilities) each half inning and in so doing calculated a simulated game RS total or probability of winning the game?
I can adapt the logic of my MonteCarlo simulator for RE(Inning), use almost exactly the same structure and come up with a MonteCarlo simulator for RE(Game) that accepts a starting HalfInning/Run Differential/Base/Out state, calculates simulated RS totals based on my RS probabilities for the active halfInning and all of the halfInnings that follow, tracks the visitor and home team's score through each game, and tallies wins for each side over say a million simulated games. Would that not give me a powerful and very stable simulated win probability from that starting state?
Tango Tiger
09-13-2006, 12:11 PM
The recursiveness is really simple, and you shouldn't try to avoid it. Instead of going from half-inning 1 to half-inning 18, you work backwards, from half-inning 18 to half-inning 1.
You start with half-inning 18, with run differentials of -14 to +14 (or whatever range you want). Since you have the run frequency distribution for each of the 24 base-out states, figuring out if the team won, lost, or goes to extra innings is a snap. (For now, assume extra innings is half a win).
So, now you have an array based on inning, run diff, base/out, and a win probability for each node in the index (for half-inning 18 only, and all other values are still null).
Then, you start at half-inning 17, and again use the same run frequency distribution. You will now know the frequency at the end of every 17th half-inning, from -14 to +14. Since you know the win probability for each of these end-states (i.e., the start state of the 18th-half inning), you now know the win prob for each starting state of the 17th inning.
And, you just keep going backwards until you get to the 1st inning. You know you are doing it right if the bases empty/0 outs, 1st, 3rd, 5th... 17th half inning gives you a win prob of exactly .500.
You can then change this to do two things:
1 - HFA. Have a separate run frequency distribution for the home and away team, so that it mimics reality. You can simply get this by looking at the empirical data, and tracking home and away separately.
2 - Two different teams. You can get a win prob table for any two teams, given their runs scored and runs allowed.
You now have to worry about the extra innings scenario not being exactly .500 for each side.
But, you've done 90% of the hard work already. The rest should be a walk for you.
Gubanich Plague
09-13-2006, 12:18 PM
Yeah, you don't even need to do any recursion, either. You can just do the bottom of the ninth manually, and then write a loop to take care of the rest.
SABR Matt
09-13-2006, 12:35 PM
OK...what is confusing me is the interplay between home half and road half of each inning...and exactly how you connect each side and each inning...also WPA is about the starting state...not just the probabilities at the start of the inning.
It's the top of 4th inning with the home team batting and trailing by one. They have runners at the corners and two outs...what is their chance of winning? How can I start with the 18th half inning and work backward to answer that question when I'm starting with known information for half inning #12.
It seems to me that the logic is inherently easier to follow and less likely to go awry if you go forward through the game than if you go backward. Start at half inning #12 as described above. If this is 2005 NL, I know that your chance of scoring 0 runs in this base/out state in this half inning is 74%...1 run is 12% etc. Given the initial condition I can directly forecast a RS total for that half inning...for all innings that follow that half inning I can forecast RS totals based on the starting state of bases empty none out. Given the probabilities of each RS total occuring in each half inning...random simulation should produce reasonable game final scores that proceed THROUGH extra innings when necessary at a realistic rate of tie games and return results that are actually based on the likelihood of either side winning in XI rather than just assuming that XI games are 50/50 (I suspect that the home team wins a little more than half of XI games and not just because they have HFA but because there is strategic benefit to being last to hit in sudden death).
The recursive solution artificially stops the game in the 9th inning...
Also given that I've already written a functional MonteCarlo...there is less difficulty in adapting that model to a new job (MonteCarlo game simulation) than to writing whole new logic to handle WPA.
The MonteCarlo solution can't really do team vs team because it requires a lot of input data and there's not enough of a sampling in any one season of how teams fair against each other (in terms of RS and RA) but I wouldn't simulate for that anyway really...I would use something like the log5 method to make pretty close approximations of the likelihood of one team beating another except rather than using W% I'd use pythagorean offensive and defensive W%s and project RS and RA for each match-up.
However MonteCarlo can handle HFA...I would simply need to break the PBP data into home scoring probabilities and road scoring probabilities.
SABR Matt
09-13-2006, 12:37 PM
I'm not understanding how the bottom of the 9th can be used to project the top of the 9th...or any part of the gaem before the bottom of the 9th.
If I understand correctly, the idea is to take the probability of winning in the bottom of the ninth given a starting state...and then for earlier half innings estimate the probability of eventually reach a starting state in the bottom of the 9th and do some kind of weighted average to find WPA for earlier half innings. That sounds needlessly bloated logically.
Gubanich Plague
09-13-2006, 01:12 PM
Yeah, that's exactly right. It's not nearly as bad as you make it sound, either. I'll try to walk you through my code for this.
double wp[18][24][31];
This is where all the win probability information will accumulate. It's a 3D array, with the first index for the 18 half innings, the second index for the 24 base/out states, and the third index for the 31 different possible scores (I did it for -15 to +15), where -15 is 0, -14 is 1 ... +14 is 29, and +15 is 30.
Note that that is the declaration of the size, but the index starts at 0, so the highest score index is 30, the bottom of the ninth is 17, etc.
Now on to do the bottom of the ninth.
rf[][] is a 2D array that contains the run frequency information. The first index is the base/out state, and the second index is the number of runs (up to 15 runs).
The innermost loop (over c) here loops over all the possible number of runs that would win the game and adds the probabilities up. And the next statement gives half credit for the case where the game is tied. And the next statement puts the answer in the correct place in the wp[][][] array.
The loop around that (over b) loops over the possible scores. And the loop around that (over a) loops over the base/out states.
double pct;
for (a=0;a<24;a++) {
for (b=0;b<16;b++) {
pct = 0;
for (c=abs(b-16);c<16;c++) {
pct += rf[a][c];
}
pct +=(0.5*rf[a][abs(b-15)]);
wp[17][a][b] = pct;
}
}
And here's the loop that takes care of the rest of the half innings. I don't have time to explain it now, but it's similar in structure to the bottom of the ninth, just more general, and needs to reference all the win probabilities at the start of the next half inning (which is why you need to do it backwards).
I'll try to come back a little later and explain it.
int next_sc,g,h;
for (g=17;g>0;g--) {
for (a=0;a<24;a++) {
for (b=0;b<31;b++) {
pct = 0;
for (c=0;c<16;c++) {
if (g%2 == 0) {h=b+c;}
else {h=b-c;}
if (h>30) {next_sc=30;}
else if (h<0) {next_sc=0;}
else {next_sc=h;}
pct += rf[a][c]*wp[g][0][next_sc];
}
wp[g-1][a][b] = pct;
}
}
}
Tango Tiger
09-13-2006, 01:44 PM
I'm not going to code-inspect, but Guban's methodology is as simple as it looks.
Matt, the reason to do it this way is because it's the right thing to do. I understand your point, but, this process gives you an exact answer, while the Monte Carlo process will give you an estimate. I can guarantee you that the mathematical process will get you a win prob of .5000000 to start the game. You'll be "close" with Monte Carlo.
If it makes you feel better, I went through what you've gone through. And I decided to discard that process for this one.
Tango Tiger
09-13-2006, 01:47 PM
We can also get an estimate of the Monte Carlo fairly easily. 1 SD = sqrt(.5*.5/1,000,000) = .0005.
So, 95% of the time, you'll be within .001 wins. However, that's just for a single state-to-state transition.
SABR Matt
09-13-2006, 02:22 PM
I still think it's a very bad decision to cut off a game at the bottom of the ninth, wave your hands and say "50/50"...because I doubt that's true.
And I can force the logic to begin at .5000000 for top of the first with a 0 base/out state...or .540 for that matter (the home team has that advantage from the first pitch).
SABR Matt
09-13-2006, 02:51 PM
Don't get me wrong...I understand your point...random generation will always have a very small error associated with it...I do understand why in an ideal world the solution begins with the mathematical rather than the empirical derivation. I think we can do better than calling XI games a coin flip, but if I could implement the recursive version of this...perhaps I could simulate (the empirical way) the extra inning results or something...it just strikes me as inherently logically wrong to presume that the home team doesn't have an advantage in sudden death.
SABR Matt
09-13-2006, 02:59 PM
Gubanich...in the "recursive" part of your program...the part that goes back through a game...what is the c loop counting?
Tip for anyone writing code...please give your variables descriptive names.
You'll notice in my posted code for the MonteCarlo RE simulator that all of my variables have names that do about as good a job as I could manage describing to you what they mean...some of them could probably be improved I'm sure, but variable names help people make sense of your code.
SABR Matt
09-13-2006, 03:06 PM
Let me logic this out a bit with some english pseudocode again...
FOR EACH half inning (in descending order and other than the 18th)
FOR EACH BO State
FOR EACH Run Differential
And here's where I'm stuck...I don't see how you can pick out which RS counts matter...you need to know how many runs are going to score in all of the innings that follow...
skyking162
09-13-2006, 03:27 PM
You know the win-probabilities for every run-differential going into the bottom of the ninth. For any state in the top of the ninth, you know the expected distribution of runs scoring. Your win-probability for any of those states is the sum of each probability of X runs scoring times the win-probability in the bottom of the ninth if that many runs score. In other words, it's the expected value of the win-probability of the current situation (which sounds kind of redundant, but that's exactly what it is both in English and statistics-speak.) WPnow=SUM[P(x runs score)*WPnextInn(if x runs score)]
You run that calculation for every possible state in the top of the ninth, which conveniently includes all the states going into the top of the ninth. Repeat for all the states in the bottom of the eighth using beginning of top of ninth as your baseline, then use those to compute top of eighth and so on.
Then yeah, if you want to get tricky, you can start implementing modified run-expectancy tables that use score and inning as input parameters (mostly to account for smallball tactics in close games or to represent increased performance-levels by home-team players.
Tango Tiger
09-13-2006, 06:24 PM
The XI component is probably easily incorporated, and you should not use that as a reason not to do it this way. I never needed it for my purposes, which is why I use 50/50.
I'm sure what we can do is use the tied in the top of the 9th state to get the win probability. That value can also be estimated as win prob at top of the 1st + .500 divided by 2.
So, if the home team has a .540 chance of winning in the top of the 1st, they have a .520 chance of winning in the top of the 9th, tied, and therefore, would have a .520 chance of winning in the top of any extra inning.
Regardless, I'm sure there's an easy solution.
Gubanich Plague
09-13-2006, 08:06 PM
I still think it's a very bad decision to cut off a game at the bottom of the ninth, wave your hands and say "50/50"...because I doubt that's true.
And I can force the logic to begin at .5000000 for top of the first with a 0 base/out state...or .540 for that matter (the home team has that advantage from the first pitch).
You have to realize though that by choosing to use this particular model, by definition the start of every inning in a tie game is 50/50. It's not hand-waving, it's inherent in the model. Why? Because you have chosen to treat every single half inning, bottom or top, close or blowout, late or early, with the same run frequency table.
Like Tango said, if you decide to calculate different RF tables for home and away, then you can go ahead and give the home team 54% (or whatever it is) credit for tying the game after the bottom of the ninth. But as long as you're using the same RF table for every half inning, you have no choice but to say it's 50/50 at the start of every inning in a tie game.
Gubanich Plague
09-13-2006, 08:13 PM
Gubanich...in the "recursive" part of your program...the part that goes back through a game...what is the c loop counting?
It's looping over all the different numbers of runs that can score in the inning (0 through 15, since that's as far as my RF table goes).
Tip for anyone writing code...please give your variables descriptive names.
Yeah, sorry about that. I just cut and pasted, I didn't really have time to go make it nice and readble. When I code for myself I tend not to give my variables descriptive names because I like to use as few characters as possible so it's easier to look at. Helps me while debugging, but it obviously makes it pretty impossible for anyone else to read.
Tango Tiger
09-13-2006, 08:17 PM
Yup, I'm the same as gub. I rarely publish my code, since doing so means I have to explain and support my code. More power to Matt for being more descriptive, but, that's not what most programmers do.
Tango Tiger
09-13-2006, 08:22 PM
And just to backup on what gub said: yes, the 50/50 rule is inherent if you use the same run frequency table.
However, if you don't use the same run frequency table, it is a snap to figure out the chance of one team beating another, if there's only one inning to go. So, you can predetermine the chance of a team beating another in XI.
Rather than starting with the 18th half-inning, you instead start with the 17th-half inning, bases empty, 0 outs, and you just need the run frequency table for that state, for both teams. Simple math will tell you how often each team will win. Save that as your XI probabilities.
***
I would bet that the result will be similar to what I said. Bill James looked at this once I believe, and came up with a rather simple win prob: RS/(RS+RA). Not sure how "real" that is, but, if you don't want to do what I just said, using this simple estimate should suffice.
Gubanich Plague
09-13-2006, 08:30 PM
Okay, so here's the meat of that "recursive" part. The "c" loop that loops over all possible number of runs that could score (0 through 15).
pct = 0;
for (c=0;c<16;c++) {
if (g%2 == 0) {h=b+c;}
else {h=b-c;}
if (h>30) {next_sc=30;}
else if (h<0) {next_sc=0;}
else {next_sc=h;}
pct += rf[a][c]*wp[g][0][next_sc];
}
wp[g-1][a][b] = pct;
if (g%2 == 0) {h=b+c;}
else {h=b-c;}
The variable g is the half inning (0 through 17). If g is even, the home team is up, and if g is odd, the away team is up.
The variable b is the score diff. (0 through 30, where 0 means -15 and 30 means +15). And c is the number of runs scoring in the half inning. So then h is the new score diff. So h=b+c if the home team is batting, and h=b-c if the away team is batting.
if (h>30) {next_sc=30;}
else if (h<0) {next_sc=0;}
else {next_sc=h;}
This just sets h=next_sc, unless the new score diff. is out of range (ony keeping tracj of up to 15 run leads), in which case I tweak it a little to get it back into range. This only occurs when there are huge blowouts anyway, so it isn't gonna affect the win probabilities really at all.
pct += rf[a][c]*wp[g][0][next_sc];
And the win probabilities for a given half inning for a given base/out state is the sum (over all possible run outcomes) of the run frequency times the win probability at the start of the next half inning for whatever the new score is.
wp[g-1][a][b] = pct;
And one you've summed them all up, throw the result in the wp table.
Ack, I think I probably did a horrendous job explaining all that.
SABR Matt
09-13-2006, 08:43 PM
Yup, I'm the same as gub. I rarely publish my code, since doing so means I have to explain and support my code. More power to Matt for being more descriptive, but, that's not what most programmers do.
Not really true. Most programmers have to hand their code to other programmers and therefore follow naming conventions that make the code readable. If you want others to follow in your research footsteps, you want them to use your code, and if you want them to use your code...you need to make it legible. Don't get me wrong...I understand the impulse to keep the lables short and simple to debug, and I'm not trying to ride Gubanich or Tango here...just telling you what I was always told in C++ classes and by professional computer programmers I know.
SABR Matt
09-13-2006, 08:45 PM
If your half-inning variable index goes from 0 to seventeen...isn't the home team up when g is ODD? 0 is the first half of the first inning (road team up).
Gubanich Plague
09-13-2006, 08:52 PM
Sorry. See that in this loop I decided to go from g=17 (top 9) back to g=1 (top 1). So the home team is up when g is even.
I probably should have gone from 16 to 0 instead of 17 to 1 instead to stay consistent with the other defnition, but apparently I didn't. Oh well, still works this way.
SABR Matt
09-13-2006, 08:54 PM
Oh OK...I got you now...sorry for the confusion. :)
SABR Matt
09-13-2006, 08:58 PM
Okay, so here's the meat of that "recursive" part. The "c" loop that loops over all possible number of runs that could score (0 through 15).
pct = 0;
for (c=0;c<16;c++) {
if (g%2 == 0) {h=b+c;}
else {h=b-c;}
if (h>30) {next_sc=30;}
else if (h<0) {next_sc=0;}
else {next_sc=h;}
pct += rf[a][c]*wp[g][0][next_sc];
}
wp[g-1][a][b] = pct;
if (g%2 == 0) {h=b+c;}
else {h=b-c;}
The variable g is the half inning (0 through 17). If g is even, the home team is up, and if g is odd, the away team is up.
The variable b is the score diff. (0 through 30, where 0 means -15 and 30 means +15). And c is the number of runs scoring in the half inning. So then h is the new score diff. So h=b+c if the home team is batting, and h=b-c if the away team is batting.
if (h>30) {next_sc=30;}
else if (h<0) {next_sc=0;}
else {next_sc=h;}
This just sets h=next_sc, unless the new score diff. is out of range (ony keeping tracj of up to 15 run leads), in which case I tweak it a little to get it back into range. This only occurs when there are huge blowouts anyway, so it isn't gonna affect the win probabilities really at all.
pct += rf[a][c]*wp[g][0][next_sc];
And the win probabilities for a given half inning for a given base/out state is the sum (over all possible run outcomes) of the run frequency times the win probability at the start of the next half inning for whatever the new score is.
wp[g-1][a][b] = pct;
And one you've summed them all up, throw the result in the wp table.
Ack, I think I probably did a horrendous job explaining all that.
No, you did a good job with the explanation...that makes things a lot clearer...
I can probably adapt this general approach to C++ in an hour or two of coding and another hour or so of debugging. :)
Logically, every extra inning behaves like a tie game in the ninth, so it makes sense that whatever rules you set for the 9th should continue for the 10th.
Gubanich Plague
09-13-2006, 08:59 PM
Ha, yeah actually that's why when I put the data in the wp table:
wp[g-1][a][b] = pct;
I had to write the index as g-1 to make it consistent with my old definition. Basically I'm an idiot is what it amounts to.
SABR Matt
09-13-2006, 09:03 PM
LOL...if you look closely at any code I write that hasn't been extensively cleaned and debugged, I'm sure you can find silly redundancies and extra steps that aren't necessary. :) Even the professionals can relate (I know a few pro coders and they are well aware of the problem with imperfect humans writing all of the code on which we all rely)
Tango Tiger
09-14-2006, 04:15 AM
Not really true. Most programmers have to hand their code to other programmers ...just telling you what I was always told in C++ classes and by professional computer programmers I know.
Ah, hahahaha. That's funny. "Do as we say, not as we do."
Most others' programming logic is horrible to debug. The way it works in the real world, and I've worked at around 8 companies, all diverse, all of them huge corporations, is you do your work, and hope in heck that you do not have to alter someone else's code.
I was at one company that insisted on proper naming conventions, and, those were perfectly fine to debug. They did code inspections, etc. But, that's the exception, not the rule.
***
Also remember that each of those 8 comapnies is made up of programmers who themselves came from 5-10 different companies each, and so on. It's a huge quiet conspiracy, that the less you document and make this clear, the more secure your job.
SABR Matt
09-14-2006, 09:24 AM
Then I must have had some incredibly good fortune meeting only programmers who actually live by the rules they're supposed to.
Tango Tiger
09-14-2006, 10:05 AM
Yes, selective sampling. Get back to me in 10 years and 5 companies, and let me know what you find.
Tango Tiger
09-14-2006, 10:26 AM
I'm looking at the code that guban and Matt have provided in this thread. I find them both typical of what I come across, and fairly the way I've coded, when I wasn't too concerned about documentation.
SABR Matt
09-14-2006, 10:28 AM
I didn't leave comments...as far as I know...that's the only thing I did not do that should be done if I decide to bring my code over to one like...the retrolist group. The variable names are descriptive and the syntax and formatting of the code are correct and readable.
Tango Tiger
09-14-2006, 10:35 AM
Documentation in terms of self-documenting with variables and functions.
Sorry Matt, but they are not up to standards. If you want to learn, you can send me a private PM.
SABR Matt
09-18-2006, 12:26 AM
OK...I've added WPA to my MonteCarlo simulator program...haven't tested it yet, but that's on the agenda shortly.
Next task is leverage index.
This is how I would calculate leverage index...Tango...correct me if I'm missing something.
Rather than focusing on events (per say)...since I have transition probabilities already embedded in the MonteCarlo program, why not ask the question "if transition X occurs (given the average run scoring rate on transition X and the current run differential), how does that effect the win probability?"
my formula would be:
SUM(probability of any possible transition from this state * absolute value of WPA for transition from this state) / SUM(probability of any transition * absolute value of WPA for any transition at any time))
Is that right?
Tango Tiger
09-18-2006, 05:18 AM
Right.
That denominator will be around .03-.04, each year. (Correctly, it should be for each run environment, not each league or each year. i.e., park matters).
For those who want to learn more, see here:
http://www.hardballtimes.com/main/article/crucial-situations
Matt, I sent Fangraphs.com a generic Win prob and LI charts for a 5 RPG environment. However, it would be better to have one for each league-year. If you are going to generate such a beast, please let me know, so I won't bother. Fangraphs has incredible data, going back to 2002, but he uses just the one win prob table.
SABR Matt
09-18-2006, 08:15 AM
I'm generating the full beast, Tango...I'll be happy to provide you with the data when I am finished.
So I should be breaking it down into run-environment subgroups? (for both WPA and LI?) Park factoring is not a simple matter here...I'd rather not use a ratio factor (it doesn't work and makes no logical sense)...I could use park and league factors from the FSIA database, but that doesn't have 2005 yet. :\
Perhaps it can be used to fill in the fangraphs data back as far as necessary to pick up every active player's full career (sans 1999...dangit, I hope retrosheet fills in that last missing year soon).
I like the "Clutchiness" toy at fangraphs...gives some interesting data.
SABR Matt
09-18-2006, 08:33 AM
Funny thing is...I don't think it's going to be that hard for me to add LI to my monte carlo program.
I would just calculate the denominator for each league first (calculate all WPAs for any given league, take the absolute value and find the average)..store that figure for each league...and then loop through each league/inning/baseOutStart/runDifferential and multiply (abs(WPA) * prob of transition)...surprisingly simple logic.
Tango Tiger
09-18-2006, 08:38 AM
It is! That's why I like it.
Personally, what I was going to do for Fangraphs was provide generic Win prob charts (at 3.0 to 7.0, at intervals of 0.5). And then, Fangraphs could have used those to extrapolate into whatever environment they needed. Say, if a league-year-park one year was 4.315 RPG, then take parts of the 4.0 chart and part of the 4.5 chart, and get the win prob and LI.
But, if you are going to generate something league-year specific (sans park), that'd be great. Ideally, you can generate somethign like this:
http://www.insidethebook.com/li.shtml
for every league-year.
The reason I don't do it, is because (real) work gets in the way. How do you find time as a student? I remember those years, and, you put all hobbies aside for a few years.
SABR Matt
09-18-2006, 08:49 AM
How do I find time as a student? Simple...I skip classes. :D No, in all seriousness, I tell the guys I hang out with ahead of time when I want a quiet room for a few hours to do some research work and I MAKE the time because it is a very high priority for me. I still find a good several horus a week to go out with friends, but rather than partying all night every single weekend day...I try to limit it to one day per weekend. :) I am approaching sabermetrics as a worthwhile endeavor because I hope to one day work in baseball and because all of the skills I am learning to help me answer baseball-related questions keep popping up in meteorology as well (you need to be good with programming languages, statistics, calculus, linear algebra, logic, and such to be a meteorologist too...and of course both sabermetrics and the weather require a lot of patience and an understanding of the limitations of the available data).
The chart I produce probably won't have all those cool colors and will cover more than just +/- 3 runs of differential (the monte-carlo simulator goes out to 30 run innings (though that has never occured in any similuation) so all of the rest of the logic that follows handles +/- 30 runs of differential...if you want though, I can limit the data I send to you to something a little less cumbersome (+/- 5 runs maybe?)...just name your conditions and I'll set something up.
Tango Tiger
09-18-2006, 09:16 AM
Actually, I have it at +/-14 runs, and I only limit it to whatever I did on the website for easy of reading. For Fangraphs purposes, +/-14 runs should suffice. After all, anything outside of that, and the LI will be well under 0.1. Best thing is to supply a csv file that looks like:
year,league,halfinning,score,base,out,winprob,LI
If you merge base,out into one, that's fine too. If you split halfinning into inning,homeaway, that's fine too.
SABR Matt
09-18-2006, 09:17 AM
Have you done any studies on leverage indices for part time offensive players? Are their managers who do significantly better jobs leveraging their reserve players well? I'm convinced after a season watching Hargrove butcher the Mariners that he is the worst in recent memory at handling his bench...but I'd like to see that studied.
SABR Matt
09-18-2006, 09:20 AM
Actually, I have it at +/-14 runs, and I only limit it to whatever I did on the website for easy of reading. For Fangraphs purposes, +/-14 runs should suffice. After all, anything outside of that, and the LI will be well under 0.1. Best thing is to supply a csv file that looks like:
year,league,halfinning,score,base,out,winprob,LI
If you merge base,out into one, that's fine too. If you split halfinning into inning,homeaway, that's fine too.
I merge base/out (I have state labels 0-23), but use halfInning (0-17...it's 1-18 in the MySQL database but 0-17 in the output file...I'm now thinking I should tweak that so it outputs halfInning in a form compatable with the database).
year/league are merged presently into one leagueID (0-95...covering the 96 leagues in the PBP database), but I can unmerge that once I get results and give you year/league/halfinning/differential/baseOut/W%Home/LI without much difficulty.
SABR Matt
09-18-2006, 09:23 AM
That would be killing two birds with one stone too...WPA and LI in one table...one pretty huge table (96*18*29*24 or 1202688 rows!) but one table.
Tango Tiger
09-18-2006, 09:24 AM
Actually, whatever you have is fine. Just also provide a "readme" file that tells me how to understand the codes.
SABR Matt
09-18-2006, 09:26 AM
I'll do that...wow...I started work on the PBP database in late July and I've already gotten a lot done and am close on other items...feeling rather accomplished right now. :)
Tango Tiger
09-18-2006, 09:38 AM
Yes, it's pretty cool, isn't it?
I also agree that working on baseball-stuff is a great way to learn stuff you wouldn't learn otherwise (or with as much interest). Best way to learn something is to work on a project that interests you.
As for working in baseball eventually, I'd set my hopes alot higher, and work in corporate America. MLB is nickel and dime in comparison.
SABR Matt
09-18-2006, 09:42 AM
Corporate America doesn't interest me. Baseball does. Corporate America is like the ultimate nightmare scenario for what my line of work would be for me. Working your ass off to climb the ladder and make a decent living doing something you hate for years and years so that you can retire at the age 75 when your kids' student loans are paid off and die never having done anything you consider worth your efforts.
I'd gladly take a 25-30 thousand dollar a year job working as a statistician for a baseball franchise over a 6 figure job in corporate america. UGH
SABR Matt
09-18-2006, 09:44 AM
On the other hand...I'm getting a degree in meteorology and it's just as likely I will end up being a research meteorologist and college professor (ack! anything but that!) with all of the skills I've picked up for doing advanced research...and I'd be relatively pleased if I ended up a career meteorologist as long as I never stopped with the baseball research in my spare time. :)
Tango Tiger
09-18-2006, 10:04 AM
I'd gladly take a 25-30 thousand dollar a year job working as a statistician for a baseball franchise over a 6 figure job in corporate america. UGH
I'll assume you don't have a mortgage, a kid to feed, or need for health benefits?
Being a professor with tenure would be an excellent goal.
skyking162
09-18-2006, 10:14 AM
Best way to learn something is to work on a project that interests you.
And yet we wonder what's wrong with the American educational system...
SABR Matt
09-18-2006, 10:33 AM
SK...LOL!! Indeed...the way we teach kids these days is pathetic. I hated...HATED...school all through childhood because they never let me learn MY way...never let me do things I wanted to do to learn. It's especially bad for boys now.
The educational establishment is run by women now...the teachers union is female dominated and they can only conceive of how best to teach children in the ways that are best for there perspective. The result is that boys aren't getting to read literature that appeals to them...they don't get to do hands-on scientific learning...they're learning math in the abstract very female way (and before I get yelled at for being sexist...I'm speaking in biological truths here...men and women learn in totally different ways...it's been more or less preoven empirically) and they're falling way behind.
SABR Matt
09-18-2006, 10:37 AM
I'll assume you don't have a mortgage, a kid to feed, or need for health benefits?
Being a professor with tenure would be an excellent goal.
I understand why people choose corporate america...the family concerns force them to consider finances first...but I'll be blunt here. If having a family means I have to be miserable for forty years, I have no interest whatsoever in doing that. It's weird because philosophically I'm a capitalist, but personally, I'm not interested in grubbing for capital unless it comes from doing something I find rewarding. The most realistic goal probably is some kind of higher educational or think tank position.
Tango Tiger
09-18-2006, 10:41 AM
Right, according to the brain segregation, 75% of the course material for boys should involve cars, sports or sex in some way. Girls' course material should involve shoes and cute-bad-boys.
Seriously, I've never been through the American education system, so I have no opinion. I would be very skeptical of Matt's points in his post #79 though.
Tango Tiger
09-18-2006, 10:43 AM
If having a family means I have to be miserable for forty years, I have no interest whatsoever in doing that. It's weird because philosophically I'm a capitalist, but personally, I'm not interested in grubbing for capital unless it comes from doing something I find rewarding. The most realistic goal probably is some kind of higher educational or think tank position.
Isn't that what Comic Book Guy says? Believe me, every man who gets married gets told by the other married guys "Don't do it!", only to dismiss the suggestion, and then be the one front and center to tell the next guy. This has been going on since forever. Matt, get back to us when you are 35.
What is the world's longest sentence?
SABR Matt
09-18-2006, 10:52 AM
Right, according to the brain segregation, 75% of the course material for boys should involve cars, sports or sex in some way. Girls' course material should involve shoes and cute-bad-boys.
Seriously, I've never been through the American education system, so I have no opinion. I would be very skeptical of Matt's points in his post #79 though.
Um...did I ever say anything about specifically what material should be included?
No...no I didn't. Men and women do learn in different ways...it requires a different approach to teach a boy than a girl (no...we don't have to focus on stereotypically boyish subjects...we do however have to recognize that boys have a tendency to be overall more "proactive"...more impatient....more inclined toward learning through doing than girls, who do better with verbal (written or spoken) learning...especially ta younger ages.
SABR Matt
09-18-2006, 10:54 AM
Isn't that what Comic Book Guy says? Believe me, every man who gets married gets told by the other married guys "Don't do it!", only to dismiss the suggestion, and then be the one front and center to tell the next guy. This has been going on since forever. Matt, get back to us when you are 35.
What is the world's longest sentence?
Obviously, if I get with the brain fever that is love, the equations change. :)
Tango Tiger
09-18-2006, 11:26 AM
Um...did I ever say anything about specifically what material should be included?
No...no I didn't....
I was being facetious, since I started my next sentence with "seriously...".
BillyF29
09-18-2006, 12:56 PM
Has anyone figured out how to determine defensive value in WPA?
Tango Tiger
09-18-2006, 01:34 PM
This is not a problem specific to WPA. But, the process is rather simple, once you cracked the overall nut.
On a PBP basis, you have to determine the chance of a play resulting if you had an average fielder out there, and you credit that to the pitcher. Then, the difference between what "should" have happened, and what did happen, goes to the fielder.
Practically speaking, if you have a runner on base with two outs, the home team at bat in the bottom of the 9th, down by 1 run, the win prob is something like .100. The batter hits a long fly ball that is about to clear the fence, but Gary Pettis scales the wall to make the final out of the game. Win prob gives the defense a total of +.100 wins (making it go from .100 to .000 for the home team).
To break that up into pitching/fielding, the pitcher get -.900, and the fielder get +1.000.
If you don't have it that detailed in the PBP, then you make estimates. If it's marked as "long fly to deep center", then maybe the average fielder makes it 30% out, and 30% single, 30% double, 10% HR. So, you apply the likely win probability of each of those events, and you credit that to the pitcher.
If you have it in even less detail, say "flyball to OF", then again, you'd have a different set of breakdowns, which you credit to the pitcher.
The less detail, the less sure you are of any single play. But, over several hundred plays, things start to even up.
If you wanted to add even other details, like "flyball to OF, from a FB pitcher to a FB hitter", you can do that too.
Tango Tiger
09-18-2006, 01:36 PM
I can also tell you that I've run WPA from 1999-2002, where I first gave 100% of the credit to the pitcher, and 0 to the fielder, and then compared that if I had given zero credit to the pitcher, and 100% to the fielder.
Fans of DIPS will not be surprised by the differences in results.
SABR Matt
09-18-2006, 01:46 PM
That is precisely how I would have attacked it...calculate the average WPA that occurs for each trajectory type. Multiply that WPA by the leverage index of the situation. Give that credit to the pitcher. Give the difference to the fielder.
SABR Matt
09-18-2006, 01:48 PM
And considering I am a "fan of DIPS" in that I believe it is correct that pitchers control only the trajectory types and the DIPS events by in large (and that pitchers have only minor influence on the probabilities associated with each trajectory type), it should surprise no one here that that is how I would approach it.
BillyF29
09-18-2006, 01:58 PM
OK, that's pretty much what I was thinking so thanks :clapping
Tango Tiger
09-18-2006, 02:02 PM
That is precisely how I would have attacked it...calculate the average WPA that occurs for each trajectory type. Multiply that WPA by the leverage index of the situation. Give that credit to the pitcher. Give the difference to the fielder.
That part in bold is not quite right. Each EVENT has its own LI.
Go down to the section titled
Birnbaum
http://www.hardballtimes.com/main/article/crucial-situations-part-2/
So, you would in fact calculate the new WPA for each trajectory type for that situation. This WPA divided by the average WPA will give you a number that is close to the LI for that situation, but it won't be exactly. Please read the above link.
SABR Matt
09-18-2006, 02:19 PM
Tango...we're not dealing with one event though are we? I didn't consider the problem of events have different leverages serious for this purpose because we're asking "how likely is every possible event in this situation" aren't we?
A deep flyball could be an out, a SF, a DP, a HR, a 3B, 2B, or if it's Edgar Martinez running, a 1B...it could be an error that results in an out or does not result in an out...it could be a broad range of possible outcomes.
Even a sac bunt attempt could be a single, a SH, a force out, an error that allows various different advances or a double play.
How will that range of events impact the leverage of the situation?
SABR Matt
09-18-2006, 02:22 PM
I guess I understand the logical stickiness of using a number that is calculated based on all events occuring to get an expected WPA for a play which has a range of events that does not include ALL events...calculating LI event by event though is a pain...way more complex than general LI.
SABR Matt
09-18-2006, 03:45 PM
I'm trying to wrap my head around how to calculate (for every play and quickly in logic is feasable to code) a "new WPA expectation" for a given game situation and trajectory. You would need to figure out the probability of each transition occurring given a trajectory and a starting base/out state...multiply that by the implied WPA given the inning/RD/etc and add them all up...but...I don't think there's enough data even in a whole league for unique trajectory/base/out combinations to return data that makes sense.
How often does a line drive to left field occur with two outs and the bases loaded? How often does that result in a HR, 1B, 2B, 3B, E, Out, etc...the number of times the LD to left specifically occurs is not that great...it's even less frequent when combined with a rare base/out state, and the data on what % of those rare events result in each event type will be meaninglessly small.
Tango Tiger
09-18-2006, 04:58 PM
I'd use overall data first as a base. So, look at how often all events happen at each of the 24 base-out state. If the % of doubles is the same, regardless of base-out state, then don't use the base/out as a parameter. However, if there is a link, then use that link to adjust the LD rates.
So, start with your base LD rates, and adjust them in the same direction as your overall rates. Hope that makes sense.
SABR Matt
09-18-2006, 06:33 PM
I'd bet good money there is a link between base/out and LD rates or even if not line drive rates...rates of base hits on balls in play. Worse...I'd bet better money that there's a substantial link for BOTH of those things with BO...(not only does the BO state impact the disposition of trajectories because it changes how pitchers pitch, but it impacts the success rates on the same ball in play)
In which case, I would want to find the average rate of all events in each base/out state and the average rate of each trajectory type on BIP and then try to form a mathematical link (regression perhaps) between them so that I would not need empirical data to form my samples.
Tango Tiger
09-18-2006, 07:02 PM
Well, there's always a link, since nothing outside of a coin flip is random. The question is how much of it is there, and how much time do you want to spend to adjust for it.
SABR Matt
09-18-2006, 09:10 PM
Well but this is a case where there is a logical link...not just an empirical one. With guys on base, pitchers change to the slide step and it totally changes the distirbution of balls in play for many of them. With guys on base, the defense is also forced to change how it reacts to a ball in play, which is borne out in the BABIP increasing with runners on.
antihipster
09-18-2006, 09:53 PM
Um...did I ever say anything about specifically what material should be included?
No...no I didn't. Men and women do learn in different ways...it requires a different approach to teach a boy than a girl (no...we don't have to focus on stereotypically boyish subjects...we do however have to recognize that boys have a tendency to be overall more "proactive"...more impatient....more inclined toward learning through doing than girls, who do better with verbal (written or spoken) learning...especially ta younger ages.
Is politics involved in historical wpa? LOL.
SABR Matt
09-18-2006, 09:55 PM
No...but the subject came up and I tend to be very gregarious online...I speak when I have something to say wherever the subject comes up.
SABR Matt
09-18-2006, 11:02 PM
I think I can avoid some of the small-sample-size issues if instead of breaking things down into event types (SH, 1B, SB..etc) I break things down by transition probabiltiies...specifically transition probabilities on trajectory type X.
If we're in BO #23 (bases loaded two outs), and a line drive to the outfield is hit (probably happens at least one or two hundred times per league), there are only a few transitions that are possible (all of the two out states plus the third out states) and we can probably get a halfway decent ballpark estimate of the likelihood of getting each transition on this specific trajectory type.
I haven't looked at the trajectory data from early in the PBP database but I suspect fielding analysis will become more and more crude the further back you go...we'll have to resort to just knowing who fields the ball and going from there.
SABR Matt
09-19-2006, 03:22 AM
Got WPA done and tested (there is one minor logical error in Gub's code that I'm sure he has a fix for because he got answers that made sense...I'm just missing some of the initial code I think...the win probability of the run differential states that give the home team the lead in the bottom of the ninth is 1...Gub never specified that and I initialize my arrays to all zeroes by default so that was causing problems but I figured it out).
I'll post that code tomorrow...
Leverage Index will be its' own separate script and I think it can (and should) be done almost entirely in MySQL...
Script 1 will load the WP data into my main database
Script 2 will insert all of the WP values into the play by play super-table (all 7.4 million lines of it) for the starting and ending base/out/halfInning/score states for each play and calculate WPA for each play
Script 3 will calculate the denominator (average of the absolute value of WPA for every event in a league) of LI
I'd then need to write a small C++ script (because I need to loop through leagues and starting base/out/inning/runDifferential states quickly) that takes in the denominator of LI for each league, the transition probabilities from any starting base/out state, and the win probability data and multiplies the probability of a transition occurring by the implied WPA for that transition given conditions set by the loops...and adds them up for each starting state to get the numerator...and then spits out a table in the form Tango would like.
Sounds like fun. :)
Gubanich Plague
09-19-2006, 09:53 AM
Yes, of course you have to do that. I did not post that section of my code here, before. Sorry if that cost you a lot of time to debug:
for (a=0;a<18;a++) {
for (b=0;b<24;b++) {
for (c=0;c<16;c++) {
wp[a][b][c] = 0;
}
for (c=16;c<31;c++) {
wp[17][b][c] = 1.0;
}
}
}
SABR Matt
09-19-2006, 02:44 PM
That's OK Gub...I'm used to having to do some debugging and that wasn't the only problem with my code...I had some conflicting variable types and a logical mistake with where I was resetting my counter of probabilities that give you a win too...total debugging time was about 140 minutes...that's pretty normal for something I write.
The code I've written thus far is not what I would call EEE compliant yet (Tango and I spoke about what I could do to improve the form and readability of it), but it works and is still quite fast (WP only added like half a minute to the total running time!).
SABR Matt
09-19-2006, 09:00 PM
Hmm...Slight technical problem.
I have a very powerful computer and I don't think even this machine is capable of merging a 2.5 million row table with a 7.5 million row table on 4 join conditions (even with proper indexing) in MySQL. the win probability table is about a third the size of the PBP table...that's a huge problem.
Tango Tiger
09-20-2006, 07:44 AM
Can you do a describe on the two tables, as well as the primary keys. Show also your SQL.
You can also try downloading Oracle. They have different versions, from lite to enterprise-level. Some are free with no restrictions, and others are free as long as it's for personal use.
SABR Matt
09-20-2006, 10:26 AM
Oracle? That's a whole new (and from what I understand, significantly more complicated) language isn't it? I mean correct me if I'm wrong and it's just as simple to learn as MySQL was because I know for a fact that it's more powerful and more widely used in the corporate world.
I'll do the explain command shortly if my next test doesn't solve the problem and see what's going on.
Tango Tiger
09-20-2006, 11:08 AM
No, they all use the same core SQL syntax. (Oracle, SQL Server, DB2, MySQL). They each have their own little flavor in some little things, like CASE/DECODE, etc.
If you are writing stored procedures, they are all different. But, you are, I think, only using regular SQL.
The "complication" might be in the installation. Oracle is far more powerful than MySQL, though, I don't know if you'll feel that power with your single-CPU desktop machine.
SABR Matt
09-20-2006, 11:32 AM
Yeah, my understanding of Oracle was that it was designed to optimize networks of servers and the like for the functioning of a major database handling high traffic volume. This, BTW, is the query I would attempt to run to get the WP information (a 2.5 million record table) tacked on to the allpbp table (a 7.4 million record table)
UPDATE allpbp INNER JOIN WP
ON (allpbp.lgCode = WP.lgCode AND allpbp.halfInning = WP.halfInning AND allpbp.PreDiff = WP.score AND allpbp.BOS = WP.baseOut)
SET allpbp.PreWP = WP.WP
That does the win probability before each event, and the next one will do the win probability after each event...which I would just change PreDiff to PostDiff, BOS to BOE and PreWP to PostWP...plus I'd need to make some special case handling in either case for halfInnings greater than 18 (XI...each extra frame is treated like the 9th inning) and for ending baseOut states larger than 23 (inning over...we would need to recognize that that has the same WP as the beginning of the next inning).
The problem is the four join conditions. I've run large queries joining two huge tables including the 7.4 million record PBP table that worked fine with two or even three join conditions, but as soon as I've hit four in the past it's been too much.
SABR Matt
09-20-2006, 12:02 PM
You know what...this is stupid.
Why am I trying to do this in MySQL...it has no built in tools for reading a table as though it were a data structure...the best way to calculate WPA is to read the current situation and the next situation...not treat each play with a before and after question because doing it the B&A way, you have to handle things like "did the game end on this play?" and "did the inning end on this play?" and whatnot.
No...this needs to be done in C++...I need to read in the entire play by play table as another object.
Tango Tiger
09-20-2006, 12:40 PM
In Oracle, you can create a "join index". I don't know about MySQL. You can also write a stored procedure. Working inside the database is preferable to anything else, because the most costly event is the disk i/o. But, maybe C++'s processing power can overcome that. Try it both ways!
SABR Matt
09-20-2006, 06:42 PM
There are stored procedures in MySQL too...I am not familiar with how to use them though...haven't quite gotten that far in my database learning yet. :)
Now...I'm not sure what you mean by a "join index"...perhaps you could elaborate?
Tango Tiger
09-20-2006, 07:00 PM
Typically, an index is on a single table. However, a join index sets an index across multiple tables. So, if you have a rosters table, and an event table, you can create a join index on say "rosters, event join on r.playerid e.playerid" or some such.
SABR Matt
09-20-2006, 07:05 PM
to be read in by C++ I need these tables in the form of a tab delimited text file (I tried comma delimited, but the istream object does not automatically skip over or stop at the commas when you want to read in the data, which makes it impossible to use correctly), and I have no way of getting MySQL to export the 2.5 million row table or the 7.5 million row table in a tab delimited format, nor do I have a program that can read 7.5 million rows of a comma delimited format and re-export it as a tab delimited format. I couldn't get the data into C++ even if I wanted to.
Gubanich Plague
09-20-2006, 07:52 PM
I believe you can query a mysql database in perl, so you can probably just do it record by record that way. It might take a few minutes to loop over all the records, but you only need to do it once, right?
SABR Matt
09-20-2006, 09:18 PM
Yes well...I haven't the slightest idea how to do that...I'm not familiar with perl beyond the remotest of basics.
Tango Tiger
09-21-2006, 07:02 AM
I think your best bet is to stick to the database.
SABR Matt
09-21-2006, 02:38 PM
My best best if I'm doing the DB internal calculations would be to attack a win probability to the starting condition before a play and ONLY to that starting condition...there are too many ways the ending b/o/i/s state can be messed up that I need to account for.
Once I have that though...how do I get WPA for each event? I was thinking of sticking in an auto_increment variable and then saying WP(after) = WP(counter + 1)...but I don't think that's possible in MySQL.
cobra2341
11-18-2007, 11:10 AM
Does anyone think this could be done with the Minor leagues. The game logs are on Milb