So, wouldn't it be nice if you could ask a computer to give you the text from each page in a PDF into a list of pages? I think so, that's why I created slate this morning[1]
You might know PDFMiner and other tools, but be slightly confused that it takes about 20 lines to get anything working[2]. And you still may not have a Python object, because PDFMiner almost always works with files.
slate is a small python module that simplifies PDFMiner's API so that you can do the things you want - process its text. How does it work?
>>> with open('example.pdf' as f:
... doc = slate.PDF(f)
...
>>> doc[0]
'Yay, some example text...'
Slate has been manually tested in Python 2.6 and seems to work fine. Test coverage will occur if people seem interested in the module. My setup.py wizardary has not yet been tested however, so beware.
Tim
@timClicks