Is it possible to use PDF::Writer to open an existing PDF and edit it?

63 views
Skip to first unread message

Krippa

unread,
Jan 26, 2008, 5:53:29 PM1/26/08
to PDF::Writer
Hi,

I've been searching for a PDF library in Ruby that's capable of more
or less the same things as the PDF::API2 module for Perl.

I like the functionality of the PDF::API2 module, so why don't I stick
to it? Because I don't like Perl. I'll use it if I really have to, but
I'm still kinda hoping I don't.

So I've been looking at PDF::Writer but all seems to revolve around
creating new PDF:s from scratch.

What I'd like to do is to start off with a template and just add the
unique parts of the contents.

Can PDF::Writer handle that? If yes, how? If no, are you aware of any
other Ruby library that can?

Cheers,


/Christer

David Richardson

unread,
Jan 29, 2008, 7:36:45 PM1/29/08
to PDF::Writer
A new gem is available for parsing PDFs - PDF::Reader. You shoudl be
able to read and parse a PDF using this gem, then write some or all of
the parsed objects into a new PDF using PDF::Writer.

I will be using this approach but don't have time to produce a working
example until next week at the earliest.

cheers,
david

Gregory Brown

unread,
Jan 29, 2008, 9:45:37 PM1/29/08
to pdf-w...@googlegroups.com
On Jan 29, 2008 7:36 PM, David Richardson

If you figure out how to do this, please post a code example here.

Also, I've been meaning to start up a dialog with the lead developer
of that project. It'd be interesting to have the two projects under
one roof, even if they don't share code or other resources for the
time being....

James Healy

unread,
Jan 29, 2008, 9:37:24 PM1/29/08
to pdf-w...@googlegroups.com
David Richardson wrote:
> A new gem is available for parsing PDFs - PDF::Reader. You shoudl be
> able to read and parse a PDF using this gem, then write some or all of
> the parsed objects into a new PDF using PDF::Writer.

I'm the current active developer of PDF::Reader, and I'd be interested
in seeing code that does this.

I suspect it would be an interesting challenge without some further work
to Reader. In its current state it's pretty good at extracting vector
image commands, as well as text (provided a standard encoding is used -
advanced encoding support is coming). However advanced information like
embedded raster images, fonts and metadata are currently inaccessible.

-- James Healy <jimmy-at-deefa-dot-com> Wed, 30 Jan 2008 13:30:45 +1100

Helder Ribeiro

unread,
Jan 30, 2008, 2:57:05 AM1/30/08
to pdf-w...@googlegroups.com
This could also be a good way to test pdfs written with PDF::Writer,
at least for simple stuff like "does it contain such text?", "does it
have the right number of pages?", etc.

2008/1/30, James Healy <ji...@deefa.com>:

James Healy

unread,
Jan 30, 2008, 5:25:50 AM1/30/08
to pdf-w...@googlegroups.com
Helder Ribeiro wrote:
> This could also be a good way to test pdfs written with PDF::Writer,
> at least for simple stuff like "does it contain such text?", "does it
> have the right number of pages?", etc.

That's exactly what I'm using it for. The shipped README includes a
simple rspec example, but I'd love to see/ship further examples if
people have them.

The current work on adding advanced encoding support is driven by my
desire to unit test the contents of Unicode encoded PDFs generated by
cairo.

-- James Healy <jimmy-at-deefa-dot-com> Wed, 30 Jan 2008 21:22:48 +1100

Gregory Brown

unread,
Jan 30, 2008, 6:22:05 AM1/30/08
to pdf-w...@googlegroups.com
On Jan 30, 2008 5:25 AM, James Healy <ji...@deefa.com> wrote:
>
> Helder Ribeiro wrote:
> > This could also be a good way to test pdfs written with PDF::Writer,
> > at least for simple stuff like "does it contain such text?", "does it
> > have the right number of pages?", etc.
>
> That's exactly what I'm using it for. The shipped README includes a
> simple rspec example, but I'd love to see/ship further examples if
> people have them.

We'll definitely be looking into this for possible limited PDF::Writer specs.

-greg

glennswest

unread,
Mar 5, 2008, 1:39:01 AM3/5/08
to PDF::Writer
Yes I stubbled on to PDF Reader today.
Actually started working on a "pdftoruby" generator.
Using "pdf-reader" as input. Since I have reasonbly complex forms I
dont want to have to hand-code.
Sounds like a few people are thinking the same way.

Anyone want to give there examples?

Anything I do will be back in open source for the generator.

glennswest

unread,
Mar 5, 2008, 8:30:56 AM3/5/08
to PDF::Writer
While I'm fighting a "bug" on the main target pdf, I went
ahead with the converter. Actually was able to extract the text
from pdf, write a ruby class to regenerate it on the fly using
pdf-writer.

So I use:
pdf-reader which parses the pdf
a set of "filters" or "callbacks" that issue "code" into a .rb file

When your done, you can call the "require" and call the class, and it
will regenerate the original pdf.
Right now just for text and position. Tommorrow I'll add font, and
basic blocks and graphics.
If all goes as planned, I'll release the .rb on my blog, and if
the "maintainers" of pdfwriter/reader wish, I dont might contributing
it into the tree.

James Healy

unread,
Mar 5, 2008, 8:50:18 AM3/5/08
to pdf-w...@googlegroups.com
Hi Glenn,

This sounds like an interesting project, but you may run into a few
hurdles.

glennswest wrote:
> Right now just for text and position. Tommorrow I'll add font, and
> basic blocks and graphics.

This will work for 7-bit text (ie. US-ASCII chars), but anything else
will have unpredictable results. PDF::Reader returns all text encoded
as UTF-8, and PDF::Writer (by default) assumes all input text is encoded
as cp-1252 (the windows charset). For the first 127 characters these
will match, but above that all hell breaks loose.

The ideal solution is for PDF::Writer to support UTF-8 input, but for
the moment it's not available. You might be able to use iconv or
something to convert the utf-8 back to cp-1252 before passing it into
PDF::Writer, but any characters that aren't representable in the
destination character set will be lost.

At this stage, the majority of PDF::Reader is focussed on correct
extraction of text, so there isn't a great amount of detailed access to
information on metadata, fonts, embedded raster images etc.

I'm happy to start looking at that stuff in due course if there's a
need, but patches are always welcome if you need it sooner :)

> If all goes as planned, I'll release the .rb on my blog, and if
> the "maintainers" of pdfwriter/reader wish, I dont might contributing
> it into the tree.

Regardless of what I've said above, I'm still keen to see your code if
you feel like releasing it and/or proving me wrong.

-- James Healy <jimmy-at-deefa-dot-com> Thu, 06 Mar 2008 00:42:07 +1100

glennswest

unread,
Mar 5, 2008, 9:08:43 PM3/5/08
to PDF::Writer
For me, most all of my data is "english" and ascii is fine.
if you really need it in another language it always could be a
embedded image.
(Sometimes its a "image" anyway.)

Let you know after I play a few days.

I thought pdf writer was getting closer to utf8 support.

James Healy

unread,
Mar 5, 2008, 8:55:03 PM3/5/08
to pdf-w...@googlegroups.com
glennswest wrote:
> For me, most all of my data is "english" and ascii is fine.
> if you really need it in another language it always could be a
> embedded image.
> (Sometimes its a "image" anyway.)

Some common-ish non-alphanumeric characters will appear above byte 127 -
things like the euro symbol, windows "smart quotes", some hyphens, etc.
How often these show up depends on what generated the PDF file and the
locale of the system at the time.

> Let you know after I play a few days.

Sounds good.

> I thought pdf writer was getting closer to utf8 support.

I've been hacking at a patch, but it still needs a little work. The
text encoding is working fine, but mapping the character codes to font
glyphs is taking some time, as I'm not particularly familiar with how
fonts work.

-- James Healy <jimmy-at-deefa-dot-com> Thu, 06 Mar 2008 12:51:10 +1100

signature.asc

glennswest

unread,
Mar 6, 2008, 1:44:00 AM3/6/08
to PDF::Writer
Its interesting to see how the format "changes" depending on which
"tool" touches the pdf.
Now I just keep a pdfdump.rb standing by so I can pretty print the
call backs.

But least at the moment, the text is coming thru, thiings are working
as I expect, and just translating callbacks of the real form into
pdfwrite.

Looks like I need to keep the pdf reference handly.

glennswest

unread,
Mar 14, 2008, 9:11:17 PM3/14/08
to PDF::Writer
In response to the original question,

You can now take a "pdf" file, convert it to ruby code.
Edit the code to add/change/delete items from the page.
And then run the ruby code to regenerate the form.
Even you can replace items with data from the database.

I've already done it for a complex form. The code
is available in the pdftoruby project. See the annouce.


Reply all
Reply to author
Forward
0 new messages