Problem with content stream parsing

41 views
Skip to first unread message

Dave Cole

unread,
Oct 25, 2012, 10:27:14 PM10/25/12
to origa...@googlegroups.com
I just picked up Ruby for the first time this week so I could modify some PDF files via Origami.

In order to figure out how to do what I need to do I started from the code in pdfdecompress and pared it back to the simplest code that would do something, then ran it on one of the PDF files that are a sample of those I need to modify.  The first file I rewrote spat out errors when I displayed it via evince.


It turned out that reading and rewriting the stream objects was reversing the order of the arguments to the Tf operator - for example, while the input stream contained:

/F1 1 Tf

The output stream contained:

1 /F1 Tf


I tracked it down to http://code.google.com/p/origami-pdf/source/browse/lib/origami/graphics/instruction.rb#82


I am not sure if that is the correct fix, but it removed the problem for me.

Then I noticed another problem in the stream parsing and rewriting.  It appears that numbers are formatted with the default .to_s for floats which appears to be %g (I know very little Ruby - only started learning it this week).  This leads to small numbers like 0.000001 being formatted as 1.0e-06 in the output content stream.  Evince does not like numbers in that format.

I posted another patch to force %f formatting of numbers.  I made the change here http://code.google.com/p/origami-pdf/source/browse/lib/origami/numeric.rb#185

The problem with %f is that it appears to include up to six trailing zeroes after the decimal point.  The posted patch removes extraneous trailing zeroes.


Again, I am not sure if this is the correct fix, but It Works For Me (TM).

- Dave
Reply all
Reply to author
Forward
0 new messages