It's not ideal imaging and the OCR output isn't as great as I would have liked (it excluded a lot of pages I would have expected would have been included) but it's pretty darn good place to start.
I'm going to try to run this against NLTK.
David Riordan | Product Manager, NYPL Labs | @NYPL_Labs