View Full Version : Pardon my Bully Pulpit - Call for Volunteers
SABR Matt
08-11-2008, 09:29 PM
I thought today after completing a fourth team for retrosheet.org in its' quest to convert the PDF versions of the daily summary pages provided by the HOF into a digital database that I should speak up here and make my own call for volunteers.
David W. Smith is a generous and brilliant man and his life's work (in the baseball sense)...to bring high quality data to the masses...has produced tremendous results thus far. The daily summary project is gaining momentum, but there's really no limit on the number of volunteers they can use over there to get this done in a timely manner.
I should be speaking to the choir when I talk about how critical it is to the future of sabermetric research and to the integrity of our data that we get access to this treasure trove of game by game data that goes back into the 19th century.
I'm asking anyone here who would like to use this data when it becomes available to e-mail David (dwsmith at retrosheet dot org) and volunteer your time to enter it for him. You don'thave to spend many hours every day doing this work...you can take more time to do each team...even an hour a day would really help move the project forward (each team takes about 7-8 hours to enter if you're at all adept with Excel and a keyboard).
Please...put your figurative money where your mouth is...help us bring this critical project to fruition so you can benefit from the data more quickly and we can all learn something about the game.
honus14
08-12-2008, 08:24 AM
Do you need to own Excel specifically? (I don't.)
SABR Matt
08-12-2008, 10:03 AM
He prefers Excel format, but if you have another spreadsheet program like Lotus, you can convert that to Excel format so he could probably take that. Ask him what formats he can handle.
Tango Tiger
08-12-2008, 12:52 PM
You can use Google docs I would think. It lets you export too, I believe.
SABR Matt
08-12-2008, 01:04 PM
whaqt is "Google docs" Tom? I'm just curious.
Colin Wyers
08-12-2008, 03:23 PM
Google Spreadsheets is a browser-based spreadsheet - it's not as functional as Excel, but it's handy for some work. I prefer EditGrid, which is a bit more functional and a lot more familiar to Excel users. I know EditGrid lets you export as Excel files.
What I would caution is, as long as these files sound, you run upon practical limits with browser-based spreadsheet programs when it comes to length. As an alternative, users can download one of several free spreadsheet programs that are compatible with Excel (gnumeric and OpenOffice Calc come to mind).
SABR Matt
08-12-2008, 03:54 PM
Each team isn't that long. Each team is like 2-2.5 thousand rows long and about 25 columns deep.
But yes...I whole-heartedly recommend downloading OpenOffice. Very good program that makes .xls files.
Tango Tiger
08-12-2008, 04:28 PM
Matt, go here:
http://docs.google.com/
Let's you export into xls, csv, html, pdf...
And right, you can also download OpenOffice
SABR Matt
08-12-2008, 04:57 PM
Gracias.
That's a neat program (google docs)...I hadn't heard of it, but it's a very good idea...probably the tip of the iceberg when it comes to direct web publishing of documents.
Tango Tiger
08-13-2008, 08:10 AM
The good thing is that you can share the document. It seems to me ideal for what Retrosheet is doing.
SABR Matt
08-13-2008, 08:36 AM
I don't know if David has the time or energy to organize a web-file-share effort. And retrosheet.org wants a hard database they can save and distribute in retrosheet formats. Both in order to validate seasonal and PBP data as they come by those data and to standardize the format of the player daily summaries for popular consumption.
Colin Wyers
08-13-2008, 06:46 PM
The point of Google Docs isn't in sharing the data in general, but in collaborating on the collection of data. That way instead of one person handling a team record, you can send the same record to three or four people, and they can all edit the same spreadsheet at the same time.
Again, you can use it instead of Excel as a standalone spreadsheet, but for that you're probably better off with OpenOffice or Gnumeric. (And 2.5k rows is a lot for a JavaScript-based browser app.)
SABR Matt
08-13-2008, 07:14 PM
That's an interesting idea, but how do you coordinate multiple people updating a file simultaneously...how do you keep them doing only a prescribed chunk of the work?
Colin Wyers
08-13-2008, 09:03 PM
Everybody sees the same document all the time. (Within the limits of the laws of physics and network latency.)
Example: Let's say you log into the spreadsheet and see that I'm already logging the June 3rd game off the PDF. You decide to start logging June 4th. We both see each other editing the appropriate cells, and there's a chat room button if we want to talk to each other.
I'm sure there's an upper practical limit to how many people you can have working on the same spreadsheet at once, but there's no reason you couldn't have 4, or 5 or even 10 people logging games into the same spreadsheet at the same time. And everybody will be able to see what everyone's working on and talk to each other if there's any confusion about it.
SABR Matt
08-13-2008, 10:25 PM
That's...cool....though I don't think it's any faster than handing out one team to each volunteer.
The speed of this project is determined by how fast the volunteers can type and how long they can work. That will be true whether we each work one team at a time or several of us finish each team faster but are all working the same team.
IOW...you can either have five guys taking 5 days to do 1 team each or 5 guys ganging up to do one team in one day instead of five...it makes no difference.
SABR Matt
08-14-2008, 11:38 AM
One should also note...I don't think David is physically allowed to simply post the daily summary files online for public viewing. The HOF has the rights to them...retrosheet.org is doing the hall a favor by converting these to digital for them...we get to keep the resulting digital database, but we can't retransmit the HOF images to the web.
BTW...evidently I picked a bad time to call for people to contact David...he's got jury duty this week (d'oh!)...but I promise anyone who volunteers (and thanks to Tom's blog linking to this thread, there've been several from his community already...keep it coming!) that he is usually very prompt with answering e-mails and handing out team assignments. I turned out 4 teams in 9 days...once you get going, it's not that hard to do...you just do it while you're watching a ballgame or in between actual work one breaks or whatever...you'll be surprised how fast it can go.
west coast orange and black
08-14-2008, 10:18 PM
i spend a fair amount of my bb-f time here at sa&a.
and though i have no idea what you guys are talking about much of the time and i do not contribute post-wise, i do enjoy my time here.
for the enlightenment and entertainment that you sa&a guys give to me > i just now e'd david to offer my services.
SABR Matt
08-15-2008, 11:44 AM
SkyKing was right about one thing...I do have a tendency to quote metrics that are not public knowledge (primarily because I have no means for publishing them for easy access) and I don't think I'm unique in this regard. Sabermetric users tend to throw a lot of stats and terms around without taking the time to make their points accessible to the uninitiated and non-math-focused.
I blame that kind of communication failure for good baseball fans like WCO&B not knowing half the time what I'm talking about...LOL
THanks for volunteering! Tango's blog got a few volunteers as well...keep it up!
skyking162
08-16-2008, 01:12 PM
I do have a tendency to quote metrics that are not public knowledge (primarily because I have no means for publishing them for easy access) and I don't think I'm unique in this regard.
Really? I (and I assume others) would love to see your data.
How about publishing it as Google Docs or EditGrid documents?
SABR Matt
08-16-2008, 07:57 PM
Until this thread, I wasn't aware of the existance of either.
I'm not sure I'd want to publish as a google doc unless I can securely lock it so it can't be edited. But yeah...once my computer upgrades are complete I'll be going back to work on the PBP database (with a machine that can actually handle it) and that may be one of the ways I publish that data. Not sure.
skyking162
08-17-2008, 09:47 AM
You can publish as read only.