Welcome everyone, let's start discussing

268 views
Skip to first unread message

Chris St. John

unread,
Apr 9, 2013, 5:32:27 PM4/9/13
to baseball-sq...@googlegroups.com
I'm excited to get a bunch of people together interested in using SQL for baseball. We have a very diverse group of people from those with no experience to experts. I think the first good discussion may be to describe your experience level and start talking about how to get into using retrosheet from the ground floor. I posted some links to the group page: Beyond the Box Score, Hardball Times and I have some info on those looking to get started. You should be able to start your own discussions from the group page as well.

Welcome and please introduce yourself...I'll start:

I'm Chris St. John, I write at Beyond the Box Score. I have no formal training in SQL but just downloaded it one day and started messing around. I can do basic things but definitely am not an expert. I haven't even looked at retrosheet in about a year but definitely want to get back into it. I have questions about updating the database as new data comes in.

Alex Kienholz

unread,
Apr 9, 2013, 5:39:55 PM4/9/13
to baseball-sq...@googlegroups.com
Hey all, I'm Alex Kienholz and I also write at BtBS. I don't have much experience, but can write some basic stuff in the Lahman database as well as the PITCHf/x database. I hope to get more involved once college gets out for the summer.

Ben Horrow

unread,
Apr 9, 2013, 5:48:27 PM4/9/13
to baseball-sq...@googlegroups.com
Hey guys, my name is Ben Horrow (the applause must just be in my head), I'm basically an SQL novice. Never had any formal training, just picked it up a little on my own, using the Hardball Times piece Chris posted written by Colin Wyers. I can write some simple queries, but honestly can't seem to figure out joins even with the help of Sir Wyers. I'm interested in getting better because SQL is a tool to discover more about baseball and that is always my goal. 

Alan Torres

unread,
Apr 9, 2013, 5:51:52 PM4/9/13
to Alex Kienholz, baseball-sq...@googlegroups.com
I'm Alan and I write for Athletics Nation (@cuppingmaster there and on Twitter).   I'm at a point where some of the things I want to write about I need a database to do.  I have a Mac, and have little to no experience programming.  That said, I'm a quick learner and would consider myself a power user in general.

I'm hoping to be able to gain knowledge from all of you.

Dan Rozenson

unread,
Apr 9, 2013, 5:56:35 PM4/9/13
to baseball-sq...@googlegroups.com
I'm Dan. I just began a part-time gig at Baseball Prospectus and I also write for Big Leagues Mag and BtBS.

I have zippo in the way of formal database management or programming education. I know a couple of bare necessities in SQL syntax that allow me to do some PITCHf/x queries, but I frequently hit a wall in my research when I know there's a way to dig something out that I just don't know how to get.

Sky Kalkman

unread,
Apr 9, 2013, 6:08:23 PM4/9/13
to Dan Rozenson, baseball-sq...@googlegroups.com
I'm Sky and I've mostly contributed to BtB. I have a little SQL experience, mostly with the Lahman database and one-off datasets, but it's been a while. My day job is managing a small analytics department for a market research company where we mostly use SPSS but are transitioning to SAS. I have less stats and programming experience than that might imply -- I think I'm better at big picture stuff than the nitty gritty. Not sure how much I'll participate. But I'd definitely like to hear about others' successes, both big and small. And see what sort of code y'all produce. Nice idea, Chris.

Sean Ahmed

unread,
Apr 9, 2013, 6:13:00 PM4/9/13
to baseball-sq...@googlegroups.com
Hi, I'm Sean and have been both an editor and now writer for the Cubs' monthly magazine, Vine Line. My college background is in Economics, and I love analyzing baseball for work and extracurricular projects. Most of my research is done in Stata since I'm better at programming in that environment, but I'll turn to R from time to time. I'm able to get things done with MySQL queries but generally have to trudge through some trial and error whenever hitting a new concept (like selecting last pitch of a plate appearance).

Looking forward to learning with all of you.

Mike Mulvenna

unread,
Apr 9, 2013, 8:59:31 PM4/9/13
to Sean Ahmed, baseball-sq...@googlegroups.com
My name is Mike and I write for BtBS. I basically have no experience in SQL outside of BtBS's "How to Saberize Your Mac." I tried teaching myself but couldn't really make much progress. I'm hoping to learn how to navigate the Lahman database and some useful commands and tricks. Looking forward to learning more about SQL with everyone.


On Tue, Apr 9, 2013 at 6:13 PM, Sean Ahmed <sean.i...@gmail.com> wrote:
Hi, I'm Sean and have been both an editor and now writer for the Cubs' monthly magazine, Vine Line. My college background is in Economics, and I love analyzing baseball for work and extracurricular projects. Most of my research is done in Stata since I'm better at programming in that environment, but I'll turn to R from time to time. I'm able to get things done with MySQL queries but generally have to trudge through some trial and error whenever hitting a new concept (like selecting last pitch of a plate appearance).

Looking forward to learning with all of you.



--
Mike Mulvenna
Drexel University
Lebow College of Business
International Business & Marketing Student
mb...@drexel.edu
(609) 226-4828

Blake Murphy

unread,
Apr 10, 2013, 12:02:47 AM4/10/13
to Mike Mulvenna, Sean Ahmed, baseball-sq...@googlegroups.com
I'm Blake. I write for 50% of the sites on the internet across multiple sports. I have no SQL background but am hoping to dive in, both for baseball and basketball/hockey (where that skill would be a huge advantage).

Ken Woolums

unread,
Apr 10, 2013, 1:27:12 AM4/10/13
to Blake Murphy, Mike Mulvenna, Sean Ahmed, baseball-sq...@googlegroups.com
I'm Ken.  I have no real experience actually using SQL, but I know the language pretty well.  One of my best friends in college was a Management Information Systems (basically database management and building) major.  I spent last summer learning bits from him, and he and I expect to talk more about it in the future.  I'm really just looking for a place to get started.  Once I get going, I look forward to bouncing ideas off of everyone.

Max Weinstein

unread,
Apr 10, 2013, 11:44:16 AM4/10/13
to baseball-sq...@googlegroups.com, Blake Murphy, Mike Mulvenna, Sean Ahmed
Hey Everyone,

I am Max and I write for BtBS and Big Leagues Mag. I have a pretty good experience with BDB and Lahman in SQL, but I am hoping to get into Retrosheet. I am self-taught so I have no formal training, meaning it takes me a lot of trial and error but I eventually get it done. I would say my expertise is doing year to year data mining so that I can create models for the data, other than that my skills are pretty weak in SQL. 

Looking forward to some hardcore SQL training with you guys!

Max

Colin Wyers

unread,
Apr 10, 2013, 12:18:27 PM4/10/13
to baseball-sq...@googlegroups.com, Blake Murphy, Mike Mulvenna, Sean Ahmed
Hi, my name is Colin. I am.

James Gentile

unread,
Apr 10, 2013, 12:24:47 PM4/10/13
to Colin Wyers, baseball-sq...@googlegroups.com, Blake Murphy, Mike Mulvenna, Sean Ahmed
I'm James and everything I learned came from Colin's Statistically Speaking tutorial and Basql codes.

Matt Hunter

unread,
Apr 10, 2013, 12:33:58 PM4/10/13
to baseball-sq...@googlegroups.com
Let's get to the good stuff. I asked James yesterday but I'll ask everyone here. How would you go about making a LI/WPA/RE database? It would be more amazing than I have words to describe to have access to that kind of data. Anyone have resources or advice for how to go about doing that?

Oh yeah, and I'm Matt, I write for BtB and THT, and I have a 35 SQL tool.

James Gentile

unread,
Apr 10, 2013, 12:34:24 PM4/10/13
to Colin Wyers, baseball-sq...@googlegroups.com, Blake Murphy, Mike Mulvenna, Sean Ahmed
So are we gonna dilly-dally all week or are we gonna crunch some data already!

Colin Wyers

unread,
Apr 10, 2013, 12:35:35 PM4/10/13
to baseball-sq...@googlegroups.com

Carson Sievert

unread,
Apr 10, 2013, 12:42:08 PM4/10/13
to baseball-sq...@googlegroups.com
Hello all,

My name is Carson Sievert and I'm a 2nd year statistics PhD student. I wrote an R package, pitchRx, which simplifies collection and visualization of PITCHf/x data. Even if you don't have "computing chops", you should be able to use it for basic stuff. I joined this group to share my experiences and also learn from others who know the data better. I don't consider myself a baseball or SQL expert, but I'm fairly proficient in both.

Visit the pitchRx demo page (which also with instructions for querying MySQL within R): http://cpsievert.github.io/pitchRx/demo/

I also created a web app for visualizing PITCHf/x (if you can't install R for some reason) - http://glimmer.rstudio.com/cpsievert/pitchRx

I'd love to hear some feedback and possible extensions for pitchRx! I'd also like more twitter followers @cpsievert

Ken Woolums

unread,
Apr 10, 2013, 12:52:29 PM4/10/13
to Carson Sievert, baseball-sq...@googlegroups.com
I may be more of a novice than was previously thought!  Anyone care to help me with the absolute basics? (aka, what programs to install, etc).  Like I said, I know the language, and I even know how to do unions and stuff (my roommate was able to teach me all of that).  I just don't know how to physically get started with it all!

Ken

Matt Hunter

unread,
Apr 10, 2013, 1:09:25 PM4/10/13
to baseball-sq...@googlegroups.com
Ken, Colin's tutorial at THT was helpful for me: http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/

The SQLyog link in that tutorial now points towards a paid program, so you can go here: https://code.google.com/p/sqlyog/downloads/list and select the most recent Community edition for the free version.

As far as building your database, you can find Retrosheet and PITCHf/x here: http://www.baseballheatmaps.com/retrosheet-database-download/

Matthew Bultitude

unread,
Apr 10, 2013, 1:09:53 PM4/10/13
to baseball-sq...@googlegroups.com
I am Matt.  My educational background is Economics/Math/CS/Law, and I have dabbled in baseball analysis over the past five years.  I consider myself to be a power computer with hobbyist-level knowledge of C++, VisualBasic, Matlab, R, and MySQL, and I am a constant lurker on Tango's blog.  The formation of this group was fortuitous for me as I had identified this week to reestablish my databases.

My primary computer is a Mac, so I will be attempting to do most of my analysis there.  Back in 2009 I participated in the Saberizing a Mac series that Sky & VivaElPujols led and got a pretty solid database set up, but a recent hard drive failure has me starting over on a brand new machine.

Carson, this is the first time I have come across your pitchRx tool and it looks awesome.  I will give it a spin today or tomorrow and let you know if I have questions.

garik16

unread,
Apr 10, 2013, 1:30:42 PM4/10/13
to baseball-sq...@googlegroups.com
Hey Yall,

I'm Josh, I've written at BtBscore and THT Fantasy and Amazin Avenue for baseball and Lighthouse Hockey for hockey.  Mainly I  do pitchf/x stuff.  My dirty secret is that well, I don't know any coding really.  I download my pitchf/x data off of an old R Script i found (not the PitchR/x one) and I use JMP to analyze teh data.  Course this means i have Ginormous files (1.2 GBs per season), and my access to JMP is running out in two months as my school license runs out.  Yikes! 

So yeah, learning R and SQL would be very useful as a replacement! 

Chris St. John

unread,
Apr 10, 2013, 1:34:07 PM4/10/13
to garik16, baseball-sq...@googlegroups.com
Everyone who is interested in starting to discuss actual issues: You should be able to go to the actual group page: https://groups.google.com/forum/?hl=en&fromgroups#!forum/baseball-sql-discussion and start a new topic. That would make it easier to keep track of everything. I say we start putting ~BSQL~ in all topic headings just for email filtering purposes.

Andrew Koo

unread,
Apr 10, 2013, 1:38:01 PM4/10/13
to baseball-sq...@googlegroups.com
Hi everyone. I'm Andrew (@xAndrewKoo), intern at Baseball Prospectus, and university student. No formal training, but I've familiarized myself with basic MySQL at BP and am working a day job now that's heavily dependent on SQL. Looking forward to learning more with everyone here.

Isaac Hall

unread,
Apr 10, 2013, 1:40:59 PM4/10/13
to baseball-sq...@googlegroups.com
Hi Everyone,

I'm Ike Hall, and I'm getting back into playing around with pitchf/x.  As I had never used SQL before, I previously used a heirarchial tree structre that I worked with in other capacities as well.  This time around I decided to go the more standard way and use SQL.  As I still have never used SQL for non pitchf/x purposes, I am very much a novice.  Right now I simply outsource all my SQL calls to either the SQLAlchemy or peewee python extensions. 

I regularly write numerical oriented software in C/C++, Python, fortran, and Java.  Looking forward to learning a lot more.

Chris Cwik

unread,
Apr 10, 2013, 2:41:57 PM4/10/13
to baseball-sq...@googlegroups.com
Gentlemen,

I'm Chris Cwik. I write for FanGraphs and CBSSports. I've done a little bit with SQL, but I'm still learning some of the commands/language. I've found both Colin's and Sky's posts on setting yourself up with a database to be pretty helpful. I've also picked up quite a bit from Jeff Zimmerman, who might be helpful here. I'm using the FanGraphs database mainly, but I'm not sure how different that is from Lahman's or Retrosheet. I do have access to those if need be. 

I'm interested in starting conversations with all of you, since I think there's a lot you guys can teach me. I would do more with SQL if I had a group of people to bounce ideas off/confirm data I've found, so this is the perfect place for that. 

Let's do this.

On Tuesday, April 9, 2013 5:32:27 PM UTC-4, Chris St. John wrote:

Doron Barbalat

unread,
Apr 10, 2013, 3:00:59 PM4/10/13
to baseball-sq...@googlegroups.com
Hey everyone! Great idea for this group!

My name is Doron, and I think I'm the least involved in baseball of anyone in here! I work in marketing for a software company in Toronto, and love what I do, though I'd love to add baseball research and writing as a hobby to get me more involved with the game.

I don't have much SQL experience besides trying to play around with it in my spare time, so I hope this will be a good avenue to allow me to learn an important new skill, relevant to both my career and my love of baseball.

Thanks!

Harry Pavlidis

unread,
Apr 10, 2013, 4:15:13 PM4/10/13
to baseball-sq...@googlegroups.com
Hi, I'm Harry. I like data. I use SQL a lot, have experience going back to late 90s. Not like I've become a master DBA since, so this old dog is looking for new tricks and  is happy to share any useful knowledge.

garik16

unread,
Apr 10, 2013, 4:23:46 PM4/10/13
to baseball-sq...@googlegroups.com
"Hi I'm Harry, I like data." 

Best.  Introduction.  Ever. 

<3

Alex Kienholz

unread,
Apr 10, 2013, 7:12:19 PM4/10/13
to garik16, baseball-sq...@googlegroups.com
Hey guys, is it possible for my PITCHf/x database to update daily by itself? I get my data from Jeff Zimmerman's site. Maybe it isn't possible.

garik16

unread,
Apr 10, 2013, 7:27:42 PM4/10/13
to baseball-sq...@googlegroups.com, garik16
Alex, I run mine manually each time I want an update. So I don't know. 

Matt Bandi

unread,
Apr 11, 2013, 12:26:28 PM4/11/13
to baseball-sq...@googlegroups.com
Hey guys, I'm Matt and I write for piratesprospects.com (although I am taking a big step back from writing this year). I don't really know anything about SQL, but I've learned some bits and pieces over the past few years by reading the various online tutorials by Mike Fast, Colin, Sky, etc. I've managed to get a PITCHf/x database and a few other smaller databases set up. I usually have to go right to Google anytime I try to do something new, so I'm excited about this group. Will be nice to have a place to bounce ideas/questions off some other people.

Matt Bandi

unread,
Apr 11, 2013, 12:42:15 PM4/11/13
to baseball-sq...@googlegroups.com, garik16
Alex, the second half of this tutorial talks about setting up a daily update. I had to tinker with Jeff's scripts a little to get everything working correctly, but overall it wasn't too painful.

http://blog.stealingfirst.com/2008/03/07/how-to-link-pitchfx-to-retrosheet



On Wednesday, April 10, 2013 7:12:19 PM UTC-4, Alex Kienholz wrote:

Alex Kienholz

unread,
Apr 11, 2013, 12:42:59 PM4/11/13
to Matt Bandi, baseball-sq...@googlegroups.com, garik16
Awesome. Thanks, Matt!

Chris St. John

unread,
Apr 11, 2013, 1:16:30 PM4/11/13
to baseball-sq...@googlegroups.com
Awesome, that was a big question I had too, Matt. I'll take a look at that.

Matt Filippi

unread,
Apr 11, 2013, 2:09:47 PM4/11/13
to baseball-sq...@googlegroups.com
Hi everyone, I'm Matt and I write for THT and used to write for BtB. I don't really have much experience with SQL, but I want to learn and get more involved.

Mark McC

unread,
Apr 11, 2013, 3:17:19 PM4/11/13
to baseball-sq...@googlegroups.com
Hey everyone, 

My name is Mark McCluskey.  Like many of the newly enlightened, I became interested in Sabermetrics after reading Moneyball by Michael Lewis, and the rest is history.  A very expensive and time-consuming history, but I probably don't have to tell you that.

I have been using SQL for roughly 10 years in various data management systems (Oracle, MySQL, even Access if you want to call it that).  I had no formal training, picking up things from books and Google, so there is hope for everyone, I guess.  I also have a background in Math & Statistics, having received a B.S. and M.S. in Mathematics from Illinois State University.  I have learned to use SAS, and more recently S-Plus/R to a much lesser degree.  My collegiate studies also included Abstract Algebra, Queueing Theory, and Graph Theory, which oddly all lend themselves to baseball analytics in some way, shape or form.

Unlike many of you, I don't write about baseball for a blog or website.  Like Nate Silver in his early days, I work for an accounting firm, and this requires *most* of my attention during the day.  I also have a very energetic two-year-old daughter that takes up much of the evenings and weekends.  In what spare time I do have, I've amassed quite a bit of MLBAM and Retrosheet data.  I have to thank Mike Fast for help with the former, and the Zimmerman brothers for the latter.  But, now that I'm here, I need to figure out what to do with it.

I hope to be able to exchange some ideas I've had, as well as contribute to the general wealth of knowledge.  I'm pretty excited about the list, clearly.  :)

Joe21

unread,
Apr 12, 2013, 1:22:48 PM4/12/13
to baseball-sq...@googlegroups.com
Hi everyone. I just became an author over at Minor League Ball, thanks to John Sickels. I'm still learning, so I look forward to learning from all of you.

Joseph Menke

unread,
Apr 14, 2013, 10:37:11 PM4/14/13
to baseball-sq...@googlegroups.com
Hi all. My name is Joseph, and I'm an IT technician interested in both baseball and databases mostly as a hobbyist. I've been doing my best to rediscover baseball over the past few years after paying almost no attention to the sport for the past decade or so. Trying to cultivate a sabermetrician's perspective on the sport has greatly increased my enjoyment of the sport, and I look forward to deepening my understanding of the sport and statistical analysis.

I've taken a couple SQL courses at community college. I'm fairly comfortable doing simple queries, but often wrack my brain conceptualizing certain joins. I've followed along with some of the Saberizing a Mac posts and Colin Wyers' Hardball Times SQL introduction, but am more or less still an SQL noob. While I have some SQL knowledge, I totally lack a background in statistics, which is a fairly significant sabermetrics hurdle. I've been slowly working my way through a statistics course at Udacity, which is helping me understand proper analysis of statistics.

Looking forward to seeing what everyone has to bring to the conversation.

Spencer Schneier

unread,
Apr 14, 2013, 10:41:41 PM4/14/13
to Joseph Menke, baseball-sq...@googlegroups.com
Hey everyone, my name is Spencer, and I write for Beyond the Box Score and am an Associate Scout for the St. Louis Cardinals. I (obviously) am involved in both the scouting and analytical sides of baseball, and hope to hone my SQLing skills. I intend to major in Computer and Information Sciences in college, but would love to get started on learning the basics now.

I have an understanding of how to use the programs to the point where I have a database set up, and some data in it, but have not been able to do much with it to this point,

So I'm basically at square 1.
--
Best Regards,
Spencer Schneier

Independent Scout
Contributor at the Beyond the Box Score; MLB Daily Dish, and Amazin' Avenue
I help head the prospect coverage at Big League Mag
Follow me on Twitter @BaseballSpencer

KiNG KoNG

unread,
Apr 14, 2013, 11:13:14 PM4/14/13
to baseball-sq...@googlegroups.com
Hey everybody...excited to get the discussion going.  My name is Matt Koenig.  I am the founder of the prospect website 80Grade.com.  I've worked with a number of different databases...Oracle, Access, SQL Server, and MySql.  I'm generally pretty solid with the complex joins and group bys that are required for baseball data.  I've found that the two most important things are backing up your data and creating appropriate indexes so that your queries don't take 20 minutes to run.  Most of my baseball work is done using MySql.  I've worked with Pitch/Fx, Lahman, and Retrosheet data.  Most of my focus now is with PitchFx (and cursing those with access to HitFx and FieldFx data).  I currently spend my days as a derivatives trader and head of technology for a small options trading firm in Chicago.

Matt

mattdennewitz

unread,
May 11, 2013, 2:30:56 PM5/11/13
to baseball-sq...@googlegroups.com
Hey everyone, my name is Matt Dennewitz. I head up engineering at Pitchfork (http://pitchfork.com). I've got a lot of experience with Retrosheet and MLBAM data, the Lahman DB, running MySQL and Postgres (and SQL in general), designing and running large data crunching ecosystems. My programming language of choice is Python. Right now, I'm working with Eno Sarris to build BeerGraphs, and on Saber Archive, an index/archive for modern Sabermetrics research. I'm excited for this group, and always around to help out.

Ryan Kasperbauer

unread,
Sep 2, 2013, 10:43:23 PM9/2/13
to baseball-sq...@googlegroups.com
Hello all, my name is Ryan Kasperbauer

I am very new to SQL. I am taking a a online SQL class. What got me into SQL is geographic information or GIS for short.

And yes I am a MLB fan; Arizona Diamondbacks

Question is this the correct place to download the data from: http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/

Bryan Cole

unread,
Sep 9, 2013, 7:41:58 PM9/9/13
to baseball-sq...@googlegroups.com
Yes, that link will tell you how to set up the (now slightly out-of-date) Baseball Database.  If you want to use Retrosheet, this tutorial (www.hardballtimes.com/main/blog_article/building-a-retrosheet-database-the-short-form/) walks you through how to set that up.  Be advised, though, the scripts in the tutorial are only set up to get 1952-2009, so to get additional years, you're going to have to edit the batch files yourself.

(If it sounds like I'm complaining, by the way, I'm not: I love my Retrosheet SQL database and I can't praise it highly enough.  Thank you, Colin.)

jonath...@gmail.com

unread,
Nov 30, 2013, 11:18:54 AM11/30/13
to baseball-sq...@googlegroups.com
Hello, my name is Jonathan Cram and I learned about the Retrosheet db via the 'Saberizing a Mac' series at BtB. I have a moderate amount of experience with MSSQL and will be happy to help if I can.

I installed the Retrosheet db a few days so I'm still getting comfortable with the table structure and data. I'm using the skills I gain from building and querying the baseball databases to extend my knowledge of querying logic and analysis. I'm looking forward to learning from and sharing with this group. @JonCram on Twitter
Reply all
Reply to author
Forward
0 new messages