View Full Version : Baseball Hacks?
Chicks Dig the Smallball
03-12-2006, 12:41 AM
Anybody check out "Baseball Hacks," the new O'Reilly book? I've got it and am struggling through a few things. There are some errata on the publisher's site, and there are a few places where you need to be creative to look at data through 2005 instead of 2004, which (I guess because of the publishing schedule) is what a lot of the samples use.
So I thought I'd start a new thread to see what everybody else's experiences with the book are like.
The weirdest problem I've had so far is in hack #14. When I follow the instructions (I'm sure I'm doing something wrong), I end up downloading a ton of zip files that I can't open--when I click on them to unzip I get a message that they're unreadable and may be corrupt.
Ubiquitous
03-12-2006, 01:29 PM
From the preview on Amazon it sounds like an interesting book. A primer for how to manipulate databases using baseball stats. Always a troublesome thing. I might have to check it out if I can find it on the cheap.
SABR Matt
03-12-2006, 03:14 PM
Wow...sounds worth a purchase...I've ordered a copy just to see if he can teach me something new/useful.
I got it last week, it's a good book. Worth a purchase.
In terms of hack #14, you can just manually get the files from retrosheet.
There are a few sample hacks in O'Reilly's website (there are 75 of them in the book):
http://www.oreilly.com/catalog/baseballhks/chapter/index.html
SABR Matt
03-21-2006, 08:27 AM
yeah...just you guys know...I have the book now...reading through it first before I attempt anything...but I gotta say...
AWSOME!
VERY important book for anyone wanting to get and make use of baseball data of all forms. This coming from a guy who has spent his tiem self-teaching in the ways of manipulating the baseball-databank.
Ubiquitous
03-31-2006, 11:21 AM
I've started reading the book and while the overall idea of it is good it is in fact peppered with mistakes. Mistakes that if you are not fluent with the systems will cause one to waste time figuring it out ones own.
For instance in Hack #11 there are at least 3 mistakes that I spotted. He uses runs instead of hits for average and TB. His code for AVG looks like this [R]/[AB] when obviously it should look like this [H]/[AB]. The biggest mistake on this hack though is in step three though. He has you picking "first" in the position category. This won't return the position with the most games per season. Instead it shows the first available position based on number assignment. 3 is firstbase, 4 is secondbase so on and so on. So if a player played 4 games at second and 125 games at OF this step will return 2B as the primary position. What one actually has to do is setup a query in which you have playerID, yearID, and G and then select max for games. This will return the position with the most games played in that season. You then can do another query in which you attach the position and everything else he describes in that hack.
SABR Matt
03-31-2006, 11:49 AM
Yes...there are mistakes and problems...some of them are crippling. It took me a LONG time to figure out that some of his commands are UNIX SPECIFIC and you have to download a series of programs for WINDOWS if you want to be able to use his hacks...for example. I still think it's worth having.
Ubi... there is a Errata on their website that explains all the typos.... it's useful to have.
Matt..... what UNIX specific tools are you talking about? I don't remember any.
SABR Matt
03-31-2006, 03:49 PM
the unzip utility is NOT native to Win XP...you have to get it...it's native only to UNIX.
The rm command is part of the GNU32 package that is NOT part of windows...there were a couple of other commands that were in a set I downloaded to correct the problem that don't leap out at me.
Also...his code to make sense of PBP Event files doesn't translate correctly in a native unix environment...you have to change a couple of lines of text or it converts straight line feeds (\n) to carriage returns which adds to the file size and causes them to be unuseable.
ah, I see. Being taylored to a UNIX environment makes some since O'Reilly is their publisher, and they do a lot of UNIX stuff.
Try http://cygwin.com/
CultofCubs
04-07-2006, 08:43 PM
I've tried to to the hack with mysql, but I cannot get it to install correctly under windows XP, anyone have any tricks on installing this thing.
I'm actually very computer saavy, I have a computer science degree, well Associates. But still, cannot get the thing running.
you can't get mysql installed or something associated with the mysql hack?
Tango Tiger
04-10-2006, 08:24 AM
Cult: try installing InstantRails . That'll give you a complete install of Apache, MySQL, PHP.
Ubiquitous
04-25-2006, 03:07 PM
not from the Hack book but a tidbit I thought I would sure with you all.
Lets say you are perusing Retrosheet and you see some data on a page you wish to play with but you are not all that familiar with manipulating the raw data or simply do not wish to jump through all those hoops when all you want to do is play with the numbers somebody else has already jumped through hoops for. Well Microsoft excel and internet explorer allow you to import most data directly to excel with a right click of the mouse. This will work for most pages and it will put all the data in nice little boxes from which you can play with the data. Now then some of the data doesn't translate so well. For instance lets say you wish to use Todd Heltons career record at each ballpark. If you try to simply import the data it won't work. So what you have to do is to copy the data onto microsoft word. Save as a text document and then open that file in excel and it will translate the data into an excel format, and presto you can once again play with the data to your hearts content.
I know nothing major but it does allow one to use the data presented on the internet without having to know how to manipulate databases and PbP code, or simply wishing to play with useful data that wasn't in a file format.
SABR Matt
04-25-2006, 06:29 PM
actually...one of the hacks in the book is essentially that.
Getting data from web pages using Microsoft Excel Internet Utilities.
Ubiquitous
04-25-2006, 08:20 PM
Yes but try using their hack in retrosheet.org
skyking162
04-25-2006, 09:34 PM
I just started reading through the book and I think it's exactly what I need (perhaps with some additional SQL coding tutorials).
One issue I foresee is running the retrosheet files through BEVENT before importing them into a database (as one of the hacks suggests), since I can't run BEVENT on my Mac. Anyone know a way around that? Anyone want to send me the BEVENTed files?
What are the files that work with Ray Kirby's ASS software? Are they raw retrosheet files? Post-BEVENT retrosheet files? Some other type of file?
Also, has anyone used the Chadwick (http://chadwick.sourceforge.net/) GUI for the retrosheet files? Thoughts?
Ubiquitous
04-25-2006, 11:22 PM
Kirby's files are in a special format that he no longer supports. You can only use the files he has already created. theres about 15 or so.
SABR Matt
04-26-2006, 12:55 AM
The MySQL isn't what's holding me up...I'm held up trying to understand perl and/or C++ enough to write original code to do more complex mathematical operations that MySQL can't do.
As for getting this to work on a Mac...I could have sworn he wrote on a method for working the hack on a Mac some other way...
Empty_One
04-26-2006, 07:17 AM
At the end of hack 15, he mentions a program called Chadwick at http://chadwick.sourceforge.net
There are pre-built binaries of the files for windows, and source files for POSIX-like systems (Mac OS X, Linux)
I haven't taken a look at the program yet, but supposedly, it can work directly with the files from retrosheet.org. He used the DiamondWave programs instead because currently they are more stable. They are actively in development however, while the tools in the book are not.
Ubiquitous
04-26-2006, 09:01 AM
I've got Chadwick and in terms of getting the data it is pretty easy. All you have to have is the retrosheet zip files point chadwick at them and poof you got it. But I don't know enough about chadwick to manipulate the data like a database can. Right now the GUI is only setup to show you all Grand Slams along with normal seasonal stats, box scores, and PbP narrative. It can do more but you have to program it do it and that is the part I don't know how to do.
cactusmitch
05-27-2008, 07:29 AM
I've dropped out of an applied statistics course, and spend all available time doing the Hacks.
My approach is all Linux. I couldn't get Ubuntu to load R properly, but SuSE with YaST, (Yet another Software Tool,) works well on the cheapo laptop I got from Wally world.
Adler's work is pioneering. He is to SABRmetric as Eric Clapton is to Guitar.
Cactusmitch:)
lephio
07-03-2008, 05:56 PM
hi guys! i read about linux.. so here's a question.
is there any unix software capable of compiling scorecards and generate basic stats?
(a printable computer-written scorecard is also appreciated.. :laugh )
thank you in advance
lephio
(italy)