Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Ruby PDF text extractor

42 views
Skip to first unread message

Kevin Olbrich

unread,
Aug 13, 2005, 1:01:06 PM8/13/05
to
I notice that Ruby has lots of tools for creating PDF files, are there any
that let you extract text from a PDF file?

_Kevin

Austin Ziegler

unread,
Aug 13, 2005, 1:45:10 PM8/13/05
to
On 8/13/05, Kevin Olbrich <kevin....@duke.edu> wrote:
> I notice that Ruby has lots of tools for creating PDF files, are there any
> that let you extract text from a PDF file?

Not yet. PDF::Writer will be refactored a little bit for version 2.0
(coming out later this year) so that it will be three separate
components: PDF::Core (the core objects representing a PDF object in
memory, as well as rendering), PDF::Writer (the writer/layout code),
and PDF::Reader (read a PDF object into an in-memory representation).
Much of the code to do PDF::Core is already in place (it's currently
called PDF::Writer::Object or PDF::Writer::Objects), but there's
nothing explicitly present to represent this.

PDF::Reader will probably be released in early 2006, depending on how
long it takes to refactor the code that already exists, properly
extend it, and get the necessary PDF::Writer code finished.

-austin
--
Austin Ziegler * halos...@gmail.com
* Alternate: aus...@halostatue.ca


Kevin Olbrich

unread,
Aug 13, 2005, 1:59:13 PM8/13/05
to
Thanks, I'll keep my eyes open for it.

_Kevin

Andreas Schrafl

unread,
Aug 16, 2005, 7:53:12 PM8/16/05
to
I once wrote a Ruby PDF Text extractor while workin at ywesee.

I tought they released it on rubyforge but I can't find it anymore.
perhaps if you contact them they can help you.
www.ywesee.com

Greetings
Andy

Martin DeMello

unread,
Aug 17, 2005, 6:18:54 AM8/17/05
to
Austin Ziegler <halos...@gmail.com> wrote:
>
> PDF::Reader will probably be released in early 2006, depending on how
> long it takes to refactor the code that already exists, properly
> extend it, and get the necessary PDF::Writer code finished.

I'd be interested in helping with this.

martin

0 new messages