Is there some way in ghostscript (or any other free tool, for that
matter) to do something along the lines of:
gs -sDEVICE=pdfwrite \
-dDemarkup=2,3 \
-dNOPAUSE -dBATCH -dSAFER \
-dFirstPage=1 -dLastPage=48 \
-sOutputFile=1SlidePerPage.pdf NSlidesPerPage.pdf
where the second line has the hypothetical command line options needed
to accomplish this, and DeMarkup gives the markup on the input in
subpages in X, Y. Any other selection syntax would be fine, for
instance, specifying a rectangular select region, with everything
outside of that ignored. The only way I know of to do this now
involves printing to images, then using rectangular cut operations,
and then back to PDF, which changes the PDFs to images and isn't a
good solution.
I know, the best solution is to start form the source document and
print it the right way in the first place, but sometimes that is not
an option.
Thanks,
David Mathog
Currently, Ghostscript does not support this, and I do not know if there are
plans to implement this (ar a similar feature).
But, if you have access to a system where CUPS is running, set up a print
queue (if not already done) which "prints" to a PDF file and then print your
source PDF with the option "-o number-up=4" (or 6, respectively).
Helge
I think maybe you understood the question backwards.
That would take a PDF with 4 slides per page and make it into one with
16 or 24 slides per page.
I want to go from 4 slides per page to 1 slide per page.
As far as I know there is no "number-down" option in CUPS.
Regards,
David Mathog
Oops, obviously I understood your issue the wrong way round.
Helge
Can a free tool simply *crop* your four-up or six-up pdf document several times, once to each of the "pages" you'd like to isolate? If not, even old (paid) Adobe Acrobat v. 4 can do that for you (as it has done for us, too, on occasion).
HTH. Cheers, -- tlvp
--
Avant de repondre, jeter la poubelle, SVP
I've got some whims of an idea.
You could define a little procedure to translate scale and clip
to a specific frame. Call it explicitly on page one,
and hook it into showpage for the rest.
And then if you don't mind a little collating (I don't know how
big these are), just print a copy, modify the procedure, print, etc.
If they're too big to manually collate, then it's probably time
for perl. Then you repeat a copy of each page with the modifications
at the start of each one.
> I've got some whims of an idea.
> You could define a little procedure to translate scale and clip
> to a specific frame. Call it explicitly on page one,
> and hook it into showpage for the rest.
> And then if you don't mind a little collating (I don't know how
> big these are), just print a copy, modify the procedure, print, etc.
Uh sure, but what is the correct syntax to do this? I searched a bit
more and found some references to /CropBox and /UseCropBox, which
looks like it might do the right thing, but the examples found so far
result in empty output files or errors when I run them. (The input
pdf does not have the string "crop" in it, but it might be there
within a compressed object.) Here is one of the references (which
didn't work for me).
http://article.gmane.org/gmane.comp.printing.ghostscript.devel/50
Just for the sake of argument, if the rectangle bounding the original
page is [ 0 0 1000 1000 ]
and there are 4 slides/page, then a new bounding rectangle [ 0 0 500
500 ] should make a separate page out of one of these slides. What is
the syntax to do this? I can handle the scripting around ghostscript
to pull out each slide to a page, but only once I know the proper
ghostscript command to do that operation in the first place.
Thanks,
David Mathog
I don't really know how to hack the pdf directly, so I would convert
to
ps first. Then the issue isn't really about clipping, but translating
and scaling.
I'd add something like this to the prologue:
/zoomtoslide {
0 0 translate % change to pick different slide
2 2 scale
0 0 1000 1000 rectclip
} def
/showpage { showpage zoomtoslide } bind def
By hooking into showpage, you only need to add one call to zoomtoslide
at the top of page one.
The downside is we've damaged the document structure, so the file is
crippled for the purposes of any fancy DSC restructuring. Another way
would be to add zoomtoslide to an existing procedure in the document;
one that already gets called at the start of each page. But this
would
require more investigating, and might not be necessary if the showpage
hook works.
Yes - this could be done directly in
PDF::API2, importing each page 6 times (as a form)
with suitable translate/scale/crop parameters.
BugBear
Well, I tried it with a little test program, based on page 9 from
here:
http://pdfapi2.sourceforge.net/pdfapi2_for_fun_and_profit_APW2005.pdf
use PDF::API2;
$pdf=PDF::API2->new;
$pdf->mediabox('A4');
$page=$pdf->page;
$page->mediabox('A4');
$page->rotate(90); #landscape
$pdf->info(
'Author' => '',
'Title' => '',
);
$imp=PDF::API2->open('test.pdf');
$xo=$pdf->importPageIntoForm($imp,1);
$page=$pdf->page;
$gfx=$page->gfx;
$gfx->formimage($xo, 0, 0, 2);
$pdf->saveas('new.pdf');
__END__
And the resulting PDF had images and not text (never mind the screwy
scaling and positioning, it was just
a test). This won't work - it has to remain as text, with each page
one of the original slides from the "marked up" input, scaled up to
fit.
Can that be done with PDF::API2?
Thanks,
David Mathog
Why not? Provided you use a sufficiently high resolution, the images
will be perfectly readable. If it's a presentation, the resolution
won't have to be very high. I'd use something like
pdftk something.pdf burst
convert -density 100 something_001.pdf something_001.png
convert -crop ... something_001.png something_001.1.png
convert -crop ... something_001.png something_001.2.png
etc.
where convert is from the ImageMagick package. You can use optipng
to maximally compress the images.
Then put the pieces together again using pdflatex.
Bob T.
> Why not?
No search or text extraction. (Unless the whole thing OCR'd.)
Regards,
David Mathog
> For handouts for classes are frequently printed to a PDF with 4 or 6
> slides per page. These end up on line, where they are not so easy to
> work with, since moving from page to page requires a good deal of
> scrolling and zooming. The internal geometry of these is very well
> defined - every page has the same layout.
Is each of the 'subpages' the same size ? I can conceive of a solution
but if the smaller cells are different sizes it may not work.
Ken
Here's a PostScript program which is a bare skeleton of a method for
this. This particular program just draws two possible 4 inch square
pages from each page of the source document, its readily extensible. I
would need to know more about the requirements before offering any
further improvements.
8<--------------------------8<----------------------8<-----------------
%!PS
% Copyright (C) 2011 Artifex Software, Inc. All rights reserved.
%
% This software is provided AS-IS with no warranty, either express or
% implied.
%
% This software is distributed under license and may not be copied,
% modified or distributed except as expressly authorized under the terms
% of the license contained in the file LICENSE in this distribution.
%
% For more information about licensing, please refer to
% http://www.ghostscript.com/licensing/. For information on
% commercial licensing, go to http://www.artifex.com/licensing/ or
% contact Artifex Software, Inc., 101 Lucas Valley Road #110,
% San Rafael, CA 94903, U.S.A., +1(415)492-9861.
% Slice up a PDF file
% usage: gs -sFile=____.pdf pdf_slice.ps
/File where not {
(\n *** Missing input file name \(use -sFile=____.pdf\)\n) =
( usage: gs -dNODISPLAY -q -sFile=____.pdf pdf_slice.ps\n) =
() =
flush
quit
} if
pop % discard the dict from where
/QUIET true def % in case they forgot
() =
File dup (r) file runpdfbegin pop
/PDFPageCount pdfpagecount def
1 1 PDFPageCount {
dup
<</PageSize [288 288] /PageOffset [-72 -72]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
<</PageSize [288 288] /PageOffset [-72 -144]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
} for
8<--------------------------8<----------------------8<-----------------
Save the file as pdf_slice.ps then you can feed it to Ghostscript. If
you use the pdfwrite device then it will create a new PDF file for you.
For example:
./gs -sDEVICE=pdfwrite -sOutputFile=new.pdf -sFile=input.ps pdf_slice.ps
With my test file (a 2 page PDF) that created a 4 page PDF where each of
the pages was a cropped area of the original page.
Ken
> Is each of the 'subpages' the same size ? I can conceive of a solution
> but if the smaller cells are different sizes it may not work.
Depends on what you mean by size. Geometrically each is exactly the
same size Width X Height
rectangle, just with different positions on the page. In terms of
content size, of course, they are all very different sizes.
Your idea?
Thanks,
David Mathog
Ah, the content size (width and height) was what I meant. Well the
PostScript program I posted separately will do the job I think, but you
would need some prety exact idea about the size and position of each of
the subpages you want extracted. If these vary considerably then jus
supplying the information could be tedious.
Ken
That's great! Given (almost) complete absense of documentation of
these operators, could you comment it a little bit?
E.g.: is it the first `dup' of the for-loop which makes it produce two
pages of output for any page of input? What is the purpose of
defining /Page?
Moreover: maybe you also have a bit of a similar snippet which would
print a 4-page letter-size document 2-up on 2 pages?
Thanks,
Ilya
Oups, it was particular stupid question - I did not notice that the
for-loop operates on NUMBERS, not "objects". The argument dup'ed is
just passed as a parameter to pdfgetpage. This part is clear to me now.
> What is the purpose of defining /Page?
> Moreover: maybe you also have a bit of a similar snippet which would
> print a 4-page letter-size document 2-up on 2 pages?
These questions still stand...
[The real problem I would like to circumvent is that pdf2dsc output is
not scalable/croppable/shiftable - I suspect this is due to
pdf_PDF2PS_matrix resetting the matrix at the start...]
The third question is: can one modify the code above to set /BBox of
the pages to some particular value?
Thanks,
Ilya
This defines QUIET in userdict. Is there a reason that the '-d' option
defines in systemdict instead? Is it irrelevant for QUIET but not for
all names?
AFAIK its searched for in either (by using where), but its a
Ghostscript-specific feature.
Ken
Well you aren't really supposed to use these routines (they are
PostScript toutines, not operators) stand-alone like this.
> E.g.: is it the first `dup' of the for-loop which makes it produce two
> pages of output for any page of input?
Umm, no, that just duplicates the loop counter. The counter is used as
the page index.
> What is the purpose of
> defining /Page?
The pdfshowpage_finish routine expects it to be defined and uses it ;-)
Its a dictionary which contains a bunch of specific stuff.
> Moreover: maybe you also have a bit of a similar snippet which would
> print a 4-page letter-size document 2-up on 2 pages?
You could probably do that more easily by defining a custom BeginPage
and EndPage. The BeginPage would alter the CTM to scale the page to 1/2
size and translate it depending on the count of showpages, the EndPage
simply returns true on even pages and false for odd pages (NB one of the
operands to both BeginPage and EndPage is the count of showpages).
Ken
> These questions still stand...
>
> [The real problem I would like to circumvent is that pdf2dsc output is
> not scalable/croppable/shiftable - I suspect this is due to
> pdf_PDF2PS_matrix resetting the matrix at the start...]
You could use the ps2write device in Ghostscript, which emits DSC
compliant level 2 PostScript. I know that the output works with Angus's
psnup so I imagine it will be OK with what you want.
NB you need Ghostscript 9.01 or better for this.
> The third question is: can one modify the code above to set /BBox of
> the pages to some particular value?
I don't think I understand the question, pages don't have a /BBox. Form
XObjects, Shading dictionaries and type 1 patterns do, but not pages.
Pages have MediaBox, CropBox, ArtBox etc.
The size of the page (and hence the MediaBox) is, unsurprisingly, given
by the /PageSize.
Ken
Ilya Zakharevich wrote:
> On 2011-03-10, ken <k...@spamcop.net> wrote:
> > File dup (r) file runpdfbegin pop
I'm guessing that runpdfbegin both loads the file and pushes
a dictionary representing the document.
> > /PDFPageCount pdfpagecount def
This variable appears unnecessary. Unless it's setting the pagecount
for the ouput device. It is defining into the pdf dictionary, right?
> > 1 1 PDFPageCount {
> > dup
As you already guessed, dup makes a copy of the integer pushed
by the for.
> > <</PageSize [288 288] /PageOffset [-72 -72]>> setpagedevice
> > pdfgetpage
I'm guessing this takes an integer to load a specific page.
I'm guessing it returns an opaque object (maybe a dict) that
represents
the page.
> > dup /Page exch store
Then we save the structure. No, wait. that can't be right. That would
leave an extra dup of this "thing". There must be no opaque object
and pdfgetpage just takes an integer and passes the page into the
framebuffer or whatever output device. Then we're just saving
the integer into /Page for some reason. Maybe pdfshowpage_finish
needs to have the right page number in the dict for some reason.
> > pdfshowpage_finish
Then it looks like we just move the offset and repeat.
> > <</PageSize [288 288] /PageOffset [-72 -144]>> setpagedevice
> > pdfgetpage
> > dup /Page exch store
> > pdfshowpage_finish
> > } for
>
> That's great! Given (almost) complete absense of documentation of
> these operators, could you comment it a little bit?
>
> E.g.: is it the first `dup' of the for-loop which makes it produce two
> pages of output for any page of input? What is the purpose of
> defining /Page?
I think defining Page must help in certain circumstances like when
rendering to dsc postscript, it can write the %%Page comments
properly. Ditto for the PDF equivalent of a page number label.
> Moreover: maybe you also have a bit of a similar snippet which would
> print a 4-page letter-size document 2-up on 2 pages?
4/2 == 2/1 !
This program should work for that. Just change the PageSize and
Offsets.
Copy and paste the business for more slices.
It's a very nice program. I imagine it could be modified to rewrite a
pdf
to add decorative frames around each page. Or even more general
formulaic compositing.
> Thanks,
> Ilya
> I'm guessing that runpdfbegin both loads the file and pushes
> a dictionary representing the document.
Opens the file and interprets parts of it, begins a dictionary with some
variables and so on. So the dictionary is on the dict stack (and is in
fact the current dictionary).
> > > /PDFPageCount pdfpagecount def
>
> This variable appears unnecessary.
It is not required. Some of this was copied from another program, and I
realised that I simply didn't know enough about the requirements to
finish the job, so I stopped when I had something working without
bothering to tidy it up.
> > > <</PageSize [288 288] /PageOffset [-72 -72]>> setpagedevice
> > > pdfgetpage
>
> I'm guessing this takes an integer to load a specific page.
> I'm guessing it returns an opaque object (maybe a dict) that
> represents
> the page.
Its a dictionary.
> > > dup /Page exch store
>
> Then we save the structure. No, wait. that can't be right. That would
> leave an extra dup of this "thing".
We need it on the operand stack, and also defined in the dictionary.
Like I said, you're not really supposed to do what I'm doing here...
> Then it looks like we just move the offset and repeat.
With option to alter the page size :-) Which is actually the bit that
causes most grief, without that the program could be a lot simpler and
use better defined routines.
When the OP gets back to me I'll do a better job, honest.....
Ken
> > > > dup /Page exch store
>
> > Then we save the structure. No, wait. that can't be right. That would
> > leave an extra dup of this "thing".
>
> We need it on the operand stack, and also defined in the dictionary.
> Like I said, you're not really supposed to do what I'm doing here...
>
> > Then it looks like we just move the offset and repeat.
>
> With option to alter the page size :-) Which is actually the bit that
> causes most grief, without that the program could be a lot simpler and
> use better defined routines.
>
> When the OP gets back to me I'll do a better job, honest.....
No complaints. If the size is constant, aren't we all overlooking
"posterizing" programs? There's got to already be a program to zoom
and chop these things. Maybe not as configurable as this creature is
becoming, but something!
> No complaints. If the size is constant, aren't we all overlooking
> "posterizing" programs? There's got to already be a program to zoom
> and chop these things. Maybe not as configurable as this creature is
> becoming, but something!
I'm not sure that such a thing will take a PDF and produce multiple
smaller PDFs though, they are usually used for printing....
Ken
> > Moreover: maybe you also have a bit of a similar snippet which would
> > print a 4-page letter-size document 2-up on 2 pages?
>
> You could probably do that more easily by defining a custom BeginPage
> and EndPage. The BeginPage would alter the CTM to scale the page to 1/2
> size and translate it depending on the count of showpages, the EndPage
> simply returns true on even pages and false for odd pages (NB one of the
> operands to both BeginPage and EndPage is the count of showpages).
Sadly the BeginPage/EndPage doesn't work, because the PDF interpreter
emits a setpagedevice between each page, which erases the page.....
My code would actually get around that, because it calls bits of the PDF
interpreter directly (which one is not supposed to do) and can avoid the
page being erased.
It would be a very different piece of code though, can't use
setpagedevice to set the page offsets so you would need to run each page
manually. Maybe I'll see if ti works. I'm sure there must be a better
way to do this.
Ken
> For handouts for classes are frequently printed to a PDF with 4 or 6
> slides per page. These end up on line, where they are not so easy to
> work with, since moving from page to page requires a good deal of
> scrolling and zooming. The internal geometry of these is very well
> defined - every page has the same layout.
>
> Is there some way in ghostscript (or any other free tool, for that
> matter) to do something along the lines of:
>
> gs -sDEVICE=pdfwrite \
> -dDemarkup=2,3 \
> -dNOPAUSE -dBATCH -dSAFER \
> -dFirstPage=1 -dLastPage=48 \
> -sOutputFile=1SlidePerPage.pdf NSlidesPerPage.pdf
Hi!
Since you seem to use MS Windows, what about
http://www.noliturbare.com/pdf-tools/pdf-tiler
If you need to rescale the resulting pdf,
http://www.noliturbare.com/pdf-tools/pdf-rotate-and-more
Or you can use the free pdf tools from http://www.pdfill.com
Regards
--
Wilfried Hennings
please reply in the newsgroup, the e-mail address is invalid
> Since you seem to use MS Windows
Linux/Unix tools are fine too.
> what abouthttp://www.noliturbare.com/pdf-tools/pdf-tiler
Tried it on a 2x2 marked up document and it converted the first page
(only) into two pages consisting of the left and right columns. No
controls to change that. So thanks, but it did not work.
> If you need to rescale the resulting pdf,http://www.noliturbare.com/pdf-tools/pdf-rotate-and-more
That looks like it has enough control, but it also looks like it will
require punching in coordinates for each subpage, kind of tedious.
>
> Or you can use the free pdf tools fromhttp://www.pdfill.com
I will look around on there later.
Thanks,
David Mathog
> Here's a PostScript program which is a bare skeleton of a method for
> this.
That's exactly the sort of thing I was looking for.
One question, it isn't clear to me where in this program lies the part
that does "next page". Is that built into ghostscript somehow, so
that this postscript program will run on each input page
automatically?
Thanks!
David Mathog
> > Here's a PostScript program which is a bare skeleton of a method for
> > this.
>
> That's exactly the sort of thing I was looking for.
It really is no more than an outline, as pointed out elsewhere its not
commented and does some stuff it doesn't need to. I can clean it up and
make it more flexible, but I just don't understand the problem well
enough at the moment.
If you could share an example file that would help. Its easier to
discuss with reference to a concrete example.
> One question, it isn't clear to me where in this program lies the part
> that does "next page". Is that built into ghostscript somehow, so
> that this postscript program will run on each input page
> automatically?
Yes, and no ;-)
The program runs in a for loop, in PS for executes a procedure, and it
puts the index of the for on the operand stack before each invocation of
the procedure.
In this case, the procedure first makes a copy of that index on the
stack, because we're going to use it twice. Then it uses the topmost
index to do 'pdfgetpage', whcih returns a dictionary which (sort of)
contains the page. It then does 'pdfshowpage_finish' which, well,
finishes the page.
Then we do it again using the second copy of the index from the stack.
Then we go back to 'for' which increments the counter, sticks a copy on
the stack and calls the procedure again.
Now the terminator of the for loop is given by the number we extracted
right at the start of the program, the number of pages in the PDF file.
This is what pdfpagecount does.
So in summary, we open the PDF file using a routine which does some
special stuff. Then we call a routine which counts the number of pages
in the file. Then we execute a for loop as many times as there are pages
in the file. For each invocation we draw the page twice, using different
positioning commands.
So the program will run each page automatically, but its in the program,
not Ghostscript.
NB the use of pdfshowpage_finish is *not* a goo didea, I'm using it here
because I want to be able to change the media size on every page of
output. It may be that is not required, in which case a simpler and more
robust program could be written.
Ken
I see that the scaling is going into Install; and Install is using the
"previous Install subroutine" (see pdfshowpage_setpage - which you do
not use in your snippet). If one could force the "previous Install
subroutine" which uses the CTM, at least the shifting/scaling/rotating
part of pstools would start to work...
Thanks,
Ilya
Since when? In 9.00 all I see is %%BoundingBox: 0 0 612 792... (Now
I see the answer below...)
Currently I'm forced to use pswrite, which is painfully slow (one
needs up to 3 passes through pswrite to make pstools AND printers to
be satisfied...). It may take up to a minute per (scanned) page on
1GHz Athlon... And temporary files are HUGE (I discussed this here
already - it is due to REPEATED calls to pswrite).
> NB you need Ghostscript 9.01 or better for this.
AHA!
>> The third question is: can one modify the code above to set /BBox of
>> the pages to some particular value?
>
> I don't think I understand the question, pages don't have a /BBox. Form
> XObjects, Shading dictionaries and type 1 patterns do, but not pages.
> Pages have MediaBox, CropBox, ArtBox etc.
>
> The size of the page (and hence the MediaBox) is, unsurprisingly, given
> by the /PageSize.
Sorry, http://www.prepressure.com/pdf/basics/page_boxes claims they
have. And, at least, AcroRead is using SOMETHING to display in "fit
visible" mode. My question is: WHAT should one modify BETWEEN calls
to pdfgetpage and pdfshowpage to modify this "something"? (I assume
one uses pdfwrite device, so changes have a chnace to be saved
somewhere...)
Thanks,
Ilya
> > I don't think I understand the question, pages don't have a /BBox. Form
> > XObjects, Shading dictionaries and type 1 patterns do, but not pages.
> > Pages have MediaBox, CropBox, ArtBox etc.
> >
> > The size of the page (and hence the MediaBox) is, unsurprisingly, given
> > by the /PageSize.
>
> Sorry, http://www.prepressure.com/pdf/basics/page_boxes claims they
> have.
Not exactly:
"Errors referring to the BBox
Within PDF files there is another box, the bounding box or BBox, that is
used. The bounding box is a rectangular frame that determines the
dimensions of an object (such as a graphic, font or pattern) that is
placed inside a PDF document. As such, this box has nothing to do with
the page boxes"
Pages don't have BBox entries, they have different kinds of *Box
entries. This is clear from the PDF 1.7 reference and ISO document.
> And, at least, AcroRead is using SOMETHING to display in "fit
> visible" mode.
Yes, one of the boxes, generally the CropBox, but if that is not present
then the MediaBox, unless you select one of the others from the menu.
> My question is: WHAT should one modify BETWEEN calls
> to pdfgetpage and pdfshowpage to modify this "something"? (I assume
> one uses pdfwrite device, so changes have a chnace to be saved
> somewhere...)
Change the PageSize and call setpagedevice, which will result in a
different MediaBox. You can also use pdfmark to set the other Box
entries if you have some need for them, pdfwrite does not set them
unless you send a pdfmark directing it to.
Ken
Well, this would be fine with my own pdf workflow, but not with ps2pdf
(or enhancements). This:
(somefile.pdf) (r) file runpdfbegin
3 pdffindpage
dup /CropBox pget { % stack: <pagedict> [xs-ys]
oforce_elems
% process /CropBox
} if
% do other things with the page
is almost documented ;-). I'm thinking about something like
(somefile.pdf) (r) file runpdfbegin
3 pdffindpage
dup /CropBox [30 50 300 500] put
pdfshowpage
Would it work (meaning device=pdfwrite would store the info in the
output file), or do I need to iterate through parents, like pget does?
Thanks,
Ilya
> (somefile.pdf) (r) file runpdfbegin
> 3 pdffindpage
> dup /CropBox [30 50 300 500] put
> pdfshowpage
All that would do would be to change the CropBox read from the file and
stored in the dictionary into a different one. Since pdfwrite doesn't
get/use that CropBox information, the resulting PDF file would not be
different.
> Would it work (meaning device=pdfwrite would store the info in the
> output file), or do I need to iterate through parents, like pget does?
Neither. You need to send a pdfmark to pdfwrite.
Ken
I do not think what you proposed is any way helpful. Here is my try.
(This is from reading the sources, with practically absent
understanding of PostScript and PDF. Add your grains of salt as
needed... Based on v9.00 of gs.)
========= The standard way to pass through a PDF page N
As used in pdf2dsc:
N
dup /Page# exch store
pdfgetpage % puts pdfpagedict on stack
% pdfshowpage is expanded to:
dup /Page exch store
pdfshowpage_init % only increments DSCPageCount
pdfshowpage_setpage % calculates and calls setpagedevice
pdfshowpage_finish % actually shows data from pdfpagedict
======== The posted example
omits pdfshowpage_init, replaces pdfshowpage_setpage by a simplified
call to setpagedevice, then calls pdfshowpage_finish.
======== The original pdfshowpage_setpage does setpagedevice with:
/UseCIEColor true
/Orientation 0
/PageSize [A B] % based on -dPDFFitPage, -dNoUserUnit, -dUseTrimBox
% -dUseCropBox, and TrimBox CropBox MediaBox UserUnit
% properties of the page
/PageSpotColors N % a calculated value
/PageUsesTransparency N % a calculated value
/Install I_proc % a calculated "procedure"
Apparently, I_proc is: if a nontrivial /Install "with previous value"
is defined, arrange for the previous value to be called, then apply a matrix
calculated by pdf_PDF2PS_matrix. This matrix is calculated (later! ???) by
pdf_PDF2PS_matrix - which ignores the CTM.
========= the calculated matrix
is based on on -dPDFFitPage, -dNoUserUnit, -dUseTrimBox,-dUseCropBox,
and TrimBox CropBox MediaBox UserUnit Rotate properties of the page.
.HWMargins and PageSize of the currentpagedevice (I suspect that this
done AFTER the call to pdfshowpage_setpage exits!). The intent is, I
suspect, to move the Trim/Crop/MediaBox to the orign with appropriate
scaling and rotation.
============================================
This looks too baroque to be the intended implementation; looks more
like a fix on top of a fix on top of a fix... E.g., I have no clue
how would one modify this to grand the CTM at the moment of call, and/or
not to do showpage at the end...
Puzzled,
Ilya
How can you write a phrase without thinking it?
What you imply is that invoking the pair pdffindpage/pdfshowpage would
lose the cropbox information. Did not sound plausible to me...
>> Would it work (meaning device=pdfwrite would store the info in the
>> output file), or do I need to iterate through parents, like pget does?
>
> Neither. You need to send a pdfmark to pdfwrite.
Vgrep of pdfshowpage_finish shows a chunk of code which reads CropBox,
and issues a pdfmark command. I suspect your analysis is wrong...
Ilya
Easy. I do not think in phrases.
Ilya
The pdfmark is only issued *if* .writepdfmarks returns true. This
returns true if either DOPDFMARKS is defined (normally only if this is
specified as a command line parameter) or the current device implements
a /pdfmark device parameter.
You should not rely on the current device implementing the /pdfmark
device parameter, as this is liable to change without notice. DOPDFMARKS
being true depends entirely on the command line. If you are certain that
this will be the case, then you can go ahead and not bother to emit a
CropBox.
However my reply is intended to be general, not specific to the use of a
command line parameter and in general, if you want a specific CropBox,
then emit the pdfmark, don't rely on the PDF itnerpreter doing it for
you.
Ken
Sorry Ken, this is a very useful explanation of what this piece of
code REALLY does, but I'm still confused about the general logic:
a) in pdfwrite, I assume that /pdfmark is implemented, right?
b) if the device DOES NOT IGNORE cropbox in the result of
getpdfpage, then my code would STILL effectively perform the same
way as if the cropbox in the read document was actually modified.
Right?
So it looks like: eigher the logic following pdfgetpage does not care
about CropBox (and then my code performs as intended), or it reads it
using pget (and then my code performs as intended)... Am I missing something?
Ilya
> Sorry Ken, this is a very useful explanation of what this piece of
> code REALLY does, but I'm still confused about the general logic:
>
> a) in pdfwrite, I assume that /pdfmark is implemented, right?
Currently, yes.
> b) if the device DOES NOT IGNORE cropbox in the result of
> getpdfpage, then my code would STILL effectively perform the same
> way as if the cropbox in the read document was actually modified.
> Right?
Assuming you took care to replace the CropBox in the appropriate sub-
object of the dictionary returned from pdfgetpage, then yes. I don't
actually know exactly where this is stored.
> So it looks like: eigher the logic following pdfgetpage does not care
> about CropBox (and then my code performs as intended), or it reads it
> using pget (and then my code performs as intended)... Am I missing something?
At present, and specifically for using the pdfwrite device as it is
currently configured, and presuming you correctly located and modified
the CropBox in the returned dictionary, then it should work.
But don't count on this working with other devices, or always being the
case, we might change the way pdfwrite works. Or the PDF interpreter
come to that.
Ken
1 1 PDFPageCount {
dup
% 1 of 4
<</PageSize [330 248] /PageOffset [-42 326]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
% 2 of 4
<</PageSize [330 248] /PageOffset [-420 326]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
% 3 of 4
<</PageSize [330 248] /PageOffset [-42 38]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
% 4 of 4
<</PageSize [330 248] /PageOffset [-420 38]>> setpagedevice
pdfgetpage
dup /Page exch store
pdfshowpage_finish
} for
And it blows up on the 3rd one. The odd thing is that any 2 of the
subpages will work, but if there are 3 (or more) of them it crashes.
It will crash on the 3rd instance of the same subpage (ie, do 1 of 4
three times in a row). In the following example the first two
>>showpage were the successful displays, and what follows is what
transpired after the second (return):
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
>>showpage, press <return> to continue<<
>>showpage, press <return> to continue<<
Error: /stackunderflow in --index--
Operand stack:
--nostringval-- 1
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --
nostringval-- 2 %stopped_push --nostringval-- --
nostringval-- --nostringval-- false 1 %stopped_push 1910
1 3 %oparray_pop 1909 1 3 %oparray_pop 1893 1 3
%oparray_pop 1787 1 3 %oparray_pop --nostringval--
%errorexec_pop .runexec2 --nostringval-- --nostringval-- --
nostringval-- 2 %stopped_push --nostringval-- 2 1 16 --
nostringval-- %for_pos_int_continue --nostringval-- --
nostringval--
Dictionary stack:
--dict:1153/1684(ro)(G)-- --dict:1/20(G)-- --dict:83/200(L)--
--dict:83/200(L)-- --dict:108/127(ro)(G)-- --dict:295/300(ro)
(G)-- --dict:23/30(L)--
Current allocation mode is local
Current file position is 1716
GPL Ghostscript 9.02: Unrecoverable error, exit code 1
This is on Solaris 8 (Sparc), using gs 9.02 from SunFreeware. It did
the same thing with the PDF device:
gs -sFile=test.pdf -sDEVICE=pdfwrite -sOutputFile=killme.pdf
pdf_slice.ps
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
>>showpage, press <return> to continue<<
>>showpage, press <return> to continue<<
Error: /stackunderflow in --index--
Operand stack:
--nostringval-- 1
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --
nostringval-- 2 %stopped_push --nostringval-- --
nostringval-- --nostringval-- false 1 %stopped_push 1910
1 3 %oparray_pop 1909 1 3 %oparray_pop 1893 1 3
%oparray_pop 1787 1 3 %oparray_pop --nostringval--
%errorexec_pop .runexec2 --nostringval-- --nostringval-- --
nostringval-- 2 %stopped_push --nostringval-- 2 1 16 --
nostringval-- %for_pos_int_continue --nostringval-- --
nostringval--
Dictionary stack:
--dict:1155/1684(ro)(G)-- --dict:1/20(G)-- --dict:83/200(L)--
--dict:83/200(L)-- --dict:108/127(ro)(G)-- --dict:295/300(ro)
(G)-- --dict:23/30(L)--
Current allocation mode is local
Last OS error: 2
Current file position is 1716
GPL Ghostscript 9.02: Unrecoverable error, exit code 1
The resulting pdf wasn't viewable, but that's not surprising since it
was probably corrupted by the crash.
Is there a way to work around this 2 page limit in ghostscript?
Thanks,
David Mathog
It doesn't seem like this works when written to the PDF or PS device
even if the former bug is avoided. I edited the pdf_slice.ps back
down to just two entries per page, did:
gs -sFile=test.pdf -sDEVICE=pdfwrite -sOutputFile=killme.pdf
pdf_slice.ps
and hit return 40 times times (until it started showing GS> prompts)
then quit.
The resulting killme.pdf file had the expected number of pages, but
they were
all blank when viewed with either xpdf or PDF XChange Viewer. Same
thing with the pswrite device when viewed with gv. Using the default
X11 device every page had something on it. The pdf and ps files
produced were quite large, they just are not displaying correctly.
Regards,
David Mathog
> I finally got around to this and your test program worked great - when
> extracting two pages per page. Unfortunately the PDFs I am trying to
> convert are marked up 4 to a page. So I made this change (the
> coordinates match my input):
t 9.02: Unrecoverable error, exit code 1
>
> The resulting pdf wasn't viewable, but that's not surprising since it
> was probably corrupted by the crash.
That's a gracefully handled error condition rather than a crash :-) But
yes, if the interpreter aborts the job then the PDF file is not written
properly (at all in fact).
> Is there a way to work around this 2 page limit in ghostscript?
Well, sadly the code is making use of undocumented internals in the PDF
interpreter, so it's not entirely surprising that it doesn'r work 100%.
The error message is saying that the stack was empty (or contained too
few objects) when trying to manipulate it, so something I did has had
unexpected side effects on the stack.
Its been rather a while since I looked at this, but I seem to remember
saying that if the sub-pages were all the same size then I could get
away without the nasty hack of using undocumented code. So... if I can
remember what I had in mind I may be able to fix it (assuming that the
sub-pages are all the same size, as per your example).
I'll give it a bash over the weekend.
Ken
> It doesn't seem like this works when written to the PDF or PS device
> even if the former bug is avoided. I edited the pdf_slice.ps back
> down to just two entries per page, did:
>
> gs -sFile=test.pdf -sDEVICE=pdfwrite -sOutputFile=killme.pdf
> pdf_slice.ps
>
> and hit return 40 times times (until it started showing GS> prompts)
> then quit.
You could use -dBATCH and -dNOPAUSE which avoid the 'press any key'
between pages prompt and also exits at the end of the job.
> The resulting killme.pdf file had the expected number of pages, but
> they were
> all blank when viewed with either xpdf or PDF XChange Viewer. Same
> thing with the pswrite device when viewed with gv. Using the default
> X11 device every page had something on it. The pdf and ps files
> produced were quite large, they just are not displaying correctly.
Well, I only checked it with a simple PDF file I had here, not a real
example. If you could let me have a real example I could be more certain
that the code works. In any event, I'm going to have a play withh it
based on your last message.
Ken
Real example, a little under 13MB.
http://saf.bio.caltech.edu/bi170c/BMB170c_2011_04_21_LECTURE.pdf
In every PDF I have seen marked up like this the subpages (or whatever
the right
term is) are all exactly the same geometry.
Thanks,
David Mathog
Fetching it now.
> In every PDF I have seen marked up like this the subpages (or whatever
> the right
> term is) are all exactly the same geometry.
I don't know if there's a 'right' term, its the best description my
bear-like brain could come up with ;-)
If they are all the same size, then we can set up the page size in
advance, which makes everything much easier, changing page size in mid-
stream is the tricky bit.
I'll try and get back to you tomorrow.
Ken
> convert are marked up 4 to a page. So I made this change (the
Silly me, its quite obvious when I think about it.
THe pdf_slice.ps program is intended to deal with 2 pages, and
pdfgetpage takes the page count from the stack so it knows which page to
use. So this:
1 1 PDFPageCount {
dup
...
...
} for
takes the current index of the loop counter, which is pushed on the
stack each time round the loop by the 'for' operator, and copies it. So
we haev 2 copies of the page index on the stack.
Obviously if we then call pdfgetpage 3 times, we will be short of one
copy :-)
So if you change 'dup' to 'dup dup dup' (or as many as are required for
the number of sub pages -1) then ti will work better :-)
Ken
> So if you change 'dup' to 'dup dup dup' (or as many as are required for
> the number of sub pages -1) then ti will work better :-)
Ok, that fixes the "crash", but how about the blank PDF and PS output?
Thanks,
David Mathog
I believe you have the page offsets incorrect, I changed them and was
able to get decent output. I'm working on a better PostScript program,
now I have a good idea of what's required.
I should also ask what version of Ghostscript you are using ? Also, if
you want PostScript output, do please use the ps2write device, not the
deprecated pswrite device.
Ken
> I believe you have the page offsets incorrect, I changed them and was
> able to get decent output. I'm working on a better PostScript program,
> now I have a good idea of what's required.
And below is the PostScript program, I hope UseNet doesn't mess up the
line endings. If cut&paste doesn't work let me know an address I can
mail the file to.
I used the example file you posted a link to, with the following command
line:
c:\gs-scratch\bin>gswin32c -dBATCH -dNOPAUSE -sFile=\temp\BMB170c_2011_
04_21_LECTURE.pdf -dSubPagesX=2 -dSubPagesY=2 -dFIXEDMEDIA -
dDEVICEHEIGHTPOINTS=288.36 -dDEVICEWIDTHPOINTS=378.36 -sDEVICE=pdfwrite
-sOutputFile=\temp\out.pdf \temp\pdf_slice.ps
Now the way this works is that you start by setting the page size to the
size of the individual slides (or sub-pages). You also set the -
dFIXEDMEDIA switch to true which prevents the PDF interpreter from
changing the page size.
The program uses this size, the original page size from the PDF file and
the number of sub-pages in each direction to calculate any margins
surroundnig the sub-page group (your file has a 17 point margin all the
way round).
Then, using the SubPageOrder it draws each of the sub pages in the
required order, shifting the page as it does so. The page size set at
the command line acts like a window, anything which is visible
underneath it is what gets drawn.
For me the command line above produced 64 pages with content. I have to
warn you, however, that the original PDF file seems to use clipping
paths to remove some content from some pages. Unfortunately this is not
working reliably and some pages contain content that I believe should
have been clipped.
This does not happen if I simply pass the original file through
pdfwrite, so its a consequence of the PageOffset being applied. I guess
that there are some comditions where the PDF interpreter is not moving
the clip region in line with the PageOffset. Feel free to open a bug
report on it at
But I should probably warn you that it may be some time before it gets
addressed, as the engineers are very busy at the moment.
8<-------------------8<------------------------8<-------------------8<
%!PS
% Copyright (C) 2011 Artifex Software, Inc. All rights reserved.
%
% This software is provided AS-IS with no warranty, either express or
% implied.
%
% This software is distributed under license and may not be copied,
% modified or distributed except as expressly authorized under the terms
% of the license contained in the file LICENSE in this distribution.
%
% For more information about licensing, please refer to
% http://www.ghostscript.com/licensing/. For information on
% commercial licensing, go to http://www.artifex.com/licensing/ or
% contact Artifex Software, Inc., 101 Lucas Valley Road #110,
% San Rafael, CA 94903, U.S.A., +1(415)492-9861.
%
% Slice up a PDF file
%
% usage: gs -sFile=____.pdf -dSubPagesX= -dSubPagesY= [-dSubPageOrder=]
[-dVerbose=]pdf_slice.ps
%
% SubPageOrder is a bit field;
% Default = 0
% Bit 0 - 0 = top to bottom
% 1 = bottom to top
% Bit 1 - 0 = left to right
% 1 = right to left
% Bit 3 - 0 = increase x then y
% - 1 = increase y then x
%
% 0 - page 1 at top left, increasing left to right, top to bottom
% 1 - page 1 at bottom left increasing left to right, bottom to top
% 2 - page 1 at top right, increasing right to left, top to bottom
% 3 - page 1 at bottom right increasing right to left, bottom to top
% 4 - page 1 at top left, increasing top to bottom, left to right
% 5 - page 1 at bottom left increasing bottom to top, left to right
% 6 - page 1 at top right, increasing top to bottom, right to left
% 7 - page 1 at bottom right increasing bottom to top, right to left
%
% Check the parameters to see they are present and of the correct type
%
/Usage {
( usage: gs -dNODISPLAY -q -sFile=____.pdf \n) =
( -dSubPagesX= -dSubPagesY= [-dSubPageOrder=] pdf_slice.ps \n) =
(Please see comments in pdf_slice.ps for more details) =
flush
quit
} bind def
/Verbose where not {
/Verbose false def
}{
pop /Verbose true def
} ifelse
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
/SubPagesX where not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
} {
Verbose { (SubPagesX ) print } if
SubPagesX type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
}
pop
}ifelse
/SubPagesY where not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
} {
Verbose { (SubPagesY ) print } if
SubPagesY type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
}
pop
}ifelse
/SubPageOrder where not {
/SubPageOrder 0 def
} {
Verbose { (SubPageOrder ) print } if
SubPageOrder type
Verbose { dup == } if
dup ==
/integertype eq not {
(\n *** SubPageOrder not integer! \(use -dSubPageOrder=\)\n) =
Usage
}
pop
}ifelse
%
% Turns off most messages
%
/QUIET true def % in case they forgot
%() =
%
% Open the PDF file and tell the PDF interpreter to start dealing with
it
%
File dup (r) file runpdfbegin pop
/PDFPageCount pdfpagecount def
%
% Set up our bookkeeping
%
% First get the size of the page from page 1 of the PDF file
% We assume that all PDF pages are the same size.
%
1 pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
%
% Now get the page size of the current device. We are assuming that
% this is the size of the individual sub-pages in the original PDF. NB
% This assumes no margins between sub-pages, all sub-pages the same
size.
%
currentpagedevice /PageSize get
dup 0 get /SubPageWidth exch def
1 get /SubPageHeight exch def
%
% Calculate the margins. This is the margin between the page border and
% the enclosed group of sub-pages, we assume there are no borders
% between sub pages.
%
/TopMargin PDFHeight SubPageHeight SubPagesY mul sub 2 div def
/LeftMargin PDFWidth SubPageWidth SubPagesX mul sub 2 div def
Verbose {
(PDFHeight = ) print PDFHeight ==
(PDFWidth = ) print PDFWidth ==
(SubPageHeight = ) print SubPageHeight ==
(SubPageWidth = ) print SubPageWidth ==
(TopMargin = ) print TopMargin ==
(LeftMmargin = ) print LeftMargin ==
} if
%
% This rouitne calculates and sets the PageOffset in the page device
% dictionary for each subpage, so that the PDF page is 'moved' in such
% a way that the required sub page is under the 'window' which is the
current
% page being imaged.
%
/NextPage {
SubPageOrder 2 mod 0 eq {
/H SubPagesY SubPageY sub SubPageHeight mul TopMargin add
def
}{
/H SubPageY 1 sub SubPageHeight mul TopMargin add def
} ifelse
SubPageOrder 2 div floor cvi 2 mod 0 eq {
/W SubPageX 1 sub SubPageWidth mul LeftMargin add def
}{
/W SubPagesX SubPageX sub SubPageWidth mul LeftMargin add
def
} ifelse
<< /PageOffset [W neg H neg]>> setpagedevice
Verbose {
(SubPageX ) print SubPageX ==
(SubPageY ) print SubPageY ==
(X Offset ) print W ==
(Y Offset ) print H == flush
} if
PDFPage
} bind def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
% Do the gross ordering here rather than in
% NextPage. We eiither process rows and then
% columns, or columns then rows, depending on
% Bit 3 of SubPageorder
SubPageOrder 3 le {
1 1 SubPagesY {
/SubPageY exch def
1 1 SubPagesX {
/SubPageX exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} {
1 1 SubPagesX {
/SubPageX exch def
1 1 SubPagesY {
/SubPageY exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} ifelse
} for
8<-------------------8<------------------------8<-------------------8<
I think the following postings on Stackoverflow.com will help you
solve your problem:
http://superuser.com/questions/54054/convert-pdf-2-sides-per-page-to-1-side-per-page/189109#189109
http://superuser.com/questions/235074/freeware-to-split-a-pdfs-pages-down-the-middle/235401#235401
They don't exactly match your problem, but the method described there
can easily be modified for your case.
Cheers,
pipitas
Oh, I didn't get to see the long thread after the original post when
first visiting this Google Groups URL today.
Of course, what Ken posted up there earlier today is much more
powerful. Hopefully this utility will end up in Ghostscripts SVN/Git
repositories in their "toolbin/" subdirectory so it is easier to find
an point to should one need it in the future :-)
Very nice Easter present, Ken. Thank you very much! :-)
Anybody? Is it maybe only the Solaris/Sparc 9.02 that has that
problem?
Thanks,
David Mathog
Are you using the latest code I posted ? If so can you post the
resulting PDF file somewhere we can look at ?
Ken
**** File did not complete the page properly and may be damaged.
I see the clipping you mentioned, but also in my hands pages
3,4 are not centered (way up at top, and cut in half), and none of the
pages are scaled well (the desired subpage is situated in the bottom
1/3 or so of a landscape page). There were some line wraps when the
text was extract from the newsgroup, and I don't know if I found them
all. Before proceeding it would be good to get a copy of the script
which definitely does not have this issue. Perhaps you could email it
to me? last name at CALifornia institute of TECHnolgy, a DepOT for
EDUcation. ;-).
Thanks,
David Mathog
> I am now, having missed the post of 4/25. With that one there is
> output on all pages but there are problems, including every page
> emitting:
>
> **** File did not complete the page properly and may be damaged.
Basically that is saying that there is 'stuff' on the operand stack
after running the page, and there shouldn't be. It may be related to the
clipping problem, I haven't investigated, but it seems to be benign.
> I see the clipping you mentioned, but also in my hands pages
> 3,4 are not centered (way up at top, and cut in half), and none of the
> pages are scaled well (the desired subpage is situated in the bottom
> 1/3 or so of a landscape page).
Sounds like you may forgotten or mis-spelled the -dFIXEDMEDIA and/or -
dDEVICEWIDTHPOINTS= -dDEVICEHEIGHTPOINTS= parameters. The code doesn't
do any scaling.
> There were some line wraps when the
> text was extract from the newsgroup, and I don't know if I found them
> all. Before proceeding it would be good to get a copy of the script
> which definitely does not have this issue. Perhaps you could email it
> to me? last name at CALifornia institute of TECHnolgy, a DepOT for
> EDUcation. ;-).
I've mailed it to your address, along with a copy of the output I get
from your sample and the command line:
gswin32c -dBATCH -dNOPAUSE -sFile=\temp\BMB170c_2011_04_21_LECTURE.pdf -
dSubPagesX=2 -dSubPagesY=2 -dFIXEDMEDIA -dDEVICEHEIGHTPOINTS=288.36 -
dDEVICEWIDTHPOINTS=378.36 -sDEVICE=pdfwrite -sOutputFile=\temp\out.pdf
\temp\pdf_slice.ps
Obviously that's a Windows invocation so you will need to change the
path names and so on (well, you would need to anyway ;-) But the sizes
shuold be OK.
Ken
Very useful link. That method was easy to convert to a script, which
is here:
http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/unup4.sh
Here is a test file
http://saf.bio.caltech.edu/bi170c/BMB170c_2011_04_21_LECTURE.pdf
and it works pretty well.
The only thing I can't quite figure out is why the quality of images
degrades. Text is fine, but even with /prepress there are more
compression artifacts visible in the pictures in the repacked file
than in the original. With /screen the pictures in the output are
horrible. The file size also decreases, from 12M to 9.5M, again with /
prepress. See for instance page 16 in the output (4th subframe of 4th
page in the original.) My best guess is that this is some sort of
(re)sampling error which results from this
-g${NWIDTH}x${NHEIGHT}
In the script NWIDTH=3300, NHEIGHT=2476 were determined by trial and
error, as being just large enough to fit the bounding rectangle line
in each of the original subframes.
Is there a gs switch I could add to minimize this issue?
Thanks,
David Mathog
>
> The only thing I can't quite figure out is why the quality of images
> degrades.
Here are two screen dumps taken at (about) the same resolution showing
the image
degradation:
http://saf.bio.caltech.edu/pub/pickup/platonic_original.PNG
http://saf.bio.caltech.edu/pub/pickup/platonic_flate.PNG
The first is from the test.pdf, and the latter from the script using:
gs -o $FNAME -sDEVICE=pdfwrite \
-dAutoFilterGrayImages=false \
-dGrayImageFilter=/FlateEncode \
-dMonoImageFilter=/FlateEncode \
-dAutoFilterColorImages=false \
-dColorImageFilter=/FlateEncode \
-dFirstPage=$PAGE -dLastPage=$PAGE -g${NWIDTH}x${NHEIGHT} -c "<</
PageOffset [$XOFF $YOFF]>> setpagedevice" -q -f test.pdf
Using /prepress instead of /FlateEncode results in essentially the
same image as platonic_flate, just
a pixel here and there changed. Is it possible that the image is
degraded in ghostscript BEFORE it is encoded to be stored in the PDF
file?
Thanks,
David Mathog
I don't see image 'degradation', the shapes of the letters look the
same. What I do see is a difference in the colours of the pixels, which
suggests either a decoding difference (the image looks to have been a
JPEG initially), or a difference in colour conversion.
Colour conversion should not be taking place for a PDF output, so the
most likely difference is something to do with JPEG decoding.
You could try using Ghostscript to render the originsal PDF file to CMYK
and RGB, and look at the output to see if the result looks the same.
You haven't said which version of Ghostscript you are using either, if
you are using an older version you could try upgrading.
Ken
To me it looks like ringing caused by frequency domain filtering. For
instance, on the leading "P" in platonic there are vertical lines to
the left, and horizontal lines above it (not perfect lines, just
linear groups of pixels, orange/yellow in color) that are absent from
the original.
> Colour conversion should not be taking place for a PDF output, so the
> most likely difference is something to do with JPEG decoding.
Agreed.
> You could try using Ghostscript to render the originsal PDF file to CMYK
> and RGB, and look at the output to see if the result looks the same.
Which command, exactly, would be appropriate? Ideally it would
unpack the image in exactly the same way the gs commands in the
preceding posts did, but save it immediately in an image format.
> You haven't said which version of Ghostscript you are using either, if
> you are using an older version you could try upgrading.
9.02
Thanks.
David Mathog
> > You could try using Ghostscript to render the originsal PDF file to
CMYK
> > and RGB, and look at the output to see if the result looks the same.
>
> Which command, exactly, would be appropriate?
gs -r720 -sDEVICE=tiff32nc -sOutputFile=out.tif input.pdf
If this isn't a single page document then add -dFirstPage= and -
dLastPage=
720 is the default resolution for pdfwrite, but you can fiddle with it
until you see something correctly sized for your output.
> Ideally it would
> unpack the image in exactly the same way the gs commands in the
> preceding posts did, but save it immediately in an image format.
There's no way to save immediately in an image format I'm afraid. But
that shouoldn't matter. One thing that just occured to me is that its
possible that the original is a JPEG image, and you have colour images
set for JPEG compression, which would apply a second set of DCT losses.
Ah, no, I see you are using Flate.
> > You haven't said which version of Ghostscript you are using either,
if
> > you are using an older version you could try upgrading.
>
> 9.02
Well, not likely to be different in the working code.
Can you post a sample PDF somewhere public ?
Ken
OK, here goes:
gs -r720 -sDEVICE=tiff32nc -sOutputFile=out.tif -dFirstPage=4 -
dLastPage=4 test.pdf
Which produced a 190MB tiff. Cut and pasted the "platonic" part (in
Windows XP Paint) and made a side by side comparison with the original
and the flate versions, here:
http://saf.bio.caltech.edu/pub/pickup/platonic_3_versions.PNG
Interesting. The pixel pattern is the same in the from_tif and
original versions, but the colors are very slightly different. (I
didn't say this before, the original is a screen shot taken from PDF-
Xchange Viewer, version
2.5, build 190.0. In all cases the screen shots were made on Windows
XP and then saved by pasting into paint, trimming, and saving as PNG.)
The flate image is quite different, with (newly) colored pixels all
around the originals. Using the tool in gimp to show the value of
each pixel
http://groups.google.com/group/comp.graphics.apps.gimp/browse_thread/thread/dad232c8c2937a8a
I can see that the "white" in the from_Tif version is 253,253,253 but
in the original it is 255,255,255. Similarly all the other colors are
slightly changed. Also I checked and the flat white areas in the
original are also flat in the from_tif, and are not flat in the Flate
version.
> Can you post a sample PDF somewhere public ?
http://saf.bio.caltech.edu/pub/pickup/platonic.pdf
The "Platonic" image is the "slide" from the lower right corner of
page 4.
Thanks,
David Mathog
> > Can you post a sample PDF somewhere public ?
>
> http://saf.bio.caltech.edu/pub/pickup/platonic.pdf
>
> The "Platonic" image is the "slide" from the lower right corner of
> page 4.
Got the file, will look more closely tomorrow.
Ken
> gs -r720 -sDEVICE=tiff32nc -sOutputFile=out.tif -dFirstPage=4 -
> dLastPage=4 test.pdf
>
> Which produced a 190MB tiff. Cut and pasted the "platonic" part (in
> Windows XP Paint) and made a side by side comparison with the original
> and the flate versions, here:
>
> http://saf.bio.caltech.edu/pub/pickup/platonic_3_versions.PNG
>
> Interesting. The pixel pattern is the same in the from_tif and
> original versions, but the colors are very slightly different.
Yes, that was what I was attempting to say in my earlier post. To me
there doesn't appear to be any corruption per se, but the colours do
look different. This is possible in JPEG where the high and low
frequency components (edges and colours) are separate.
I tried the latest code here, at 300 dpi on Windows, and the result is
most similar to the 'Platonic_original.PNG' in your screenshot. Not
quite the same, but then PNG is potentially a lossy format too, which is
why I suggested TIFF.
I also redid this using the tiff24nc device so that we don't get RGB->
CMYK->RGB conversion. I also zoomed in to 2400% in Acrobat with image
smoothing turned off, again the 'Platonic_original.PNG' is the best
match to Acrobat.
In fact at 600 dpi, tiff24nc (for RGB output) the result is
indistinguishable from Acrobat's output.
This means its not the decompression of the JPEG file, so on to
pdfwrite.
Here I do see the kind of colour changes you describe, and to my eye
they look to be the sort of artefacts caused by reapplying JPEG
compression to data which has previously been JPEG compressed. And
indeed if I decode the file I see that all 3 images within it are
compressed with DCT. (NB I was using a very simple command line here).
I then used this command line:
gswin32c -sDEVICE=pdfwrite -dCompressPages=false -
dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -
sOutputFile=out.pdf -dFirstPage=4 -dLastPage=4 platonic.pdf
Checking the (uncompressed) PDF file I see that all three images are now
compressed using Flate. Opening the file in Acrobat I see that the
colour shifting has disappeared, and the area in question looks OK. In
fact, comparing the two in Acrobat at 2400% zoom and image smoothing
turned off they are again indistinguishable.
So it seems to me that the problem is that your ourput is still being
DCT compressed. Can I ask you to try the command line above and see if
that improves matters for you ?
Obviously on Linux you'll want to use 'gs' instead of gswin32c.
Ken
>
> Here I do see the kind of colour changes you describe, and to my eye
> they look to be the sort of artefacts caused by reapplying JPEG
> compression to data which has previously been JPEG compressed. And
> indeed if I decode the file I see that all 3 images within it are
> compressed with DCT. (NB I was using a very simple command line here).
>
> I then used this command line:
>
> gswin32c -sDEVICE=pdfwrite -dCompressPages=false -
> dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -
> sOutputFile=out.pdf -dFirstPage=4 -dLastPage=4 platonic.pdf
That was it.
There are two gs runs in the script, one pass to break out all the
frames into separate files, and a second to put them back together. I
see now how the image processing in these was interacting. So in the
first step it now uses a command similar to yours above so that the
images stored contain the decompressed originals with no further
processing, and in the second it uses -dPDFSETTINGS=/printer to
restore compression. Once. I did try using CompressPages=false and
flate encoding at the final step. The input and output images were
identical, but the resulting pdf was > 3X larger than the original.
Since the original in this case was (apparently) compressed with /
printer, recompressing it ONCE at the final stage with the same
setting (nearly) recreates the original image. The differences
between the original and the final are now tiny, just a few pixels
having slightly different shades, which isn't visible except at very
high magnifications.
The revised script is again here:
http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/unup4.sh
Does this suggest a need for a "pass through" setting for images?
It would certainly be simpler than the decompress, (carefully) store
at full decompressed resolution for all intermediate steps, recompress
method. It might be something like
dColorImageFilter=/asis
Naively it seems like this should be possible, since the PDF is
essentially just a bag of objects, and I can't think of a reason why
an image object couldn't be copied from one bag to the other, albeit
possibly wrapped in another object if the image scale needs to be
changed.
Thanks!
David Mathog
16520653 2011-06-27 10:12 out_prepress3.pdf
13073094 2011-06-27 09:53 out_printer4.pdf
6408552 2011-06-27 09:34 out_screen2.pdf
13024632 2011-04-22 08:28 test.pdf
The /prepress one is even closer to the original than the /printer
version,
but again, the difference is only visible at high magnification. The /
screen version looked dreadful, as did /ebook (not shown). Apparently
recompressing (further) a once compressed image results in an image
very much inferior to compressing that image to the final compression
directly from the original. Not surprising I suppose.
I will now go through the /printer and original file versions again
carefully and see if there are any other issues.
Regards,
David Mathog
> Does this suggest a need for a "pass through" setting for images?
>
> It would certainly be simpler than the decompress, (carefully) store
> at full decompressed resolution for all intermediate steps, recompress
> method. It might be something like
>
> dColorImageFilter=/asis
>
> Naively it seems like this should be possible, since the PDF is
> essentially just a bag of objects, and I can't think of a reason why
> an image object couldn't be copied from one bag to the other, albeit
> possibly wrapped in another object if the image scale needs to be
> changed.
It can't be done because, basically, Ghostscript and pdfwrite aren't
copying things from one 'bag' to another at all.
What actually happens is that Ghostscript fully interprets the PDF file,
converting all the instructions into graphics primitives (things which
make marks, like text, images, linework etc).
These are passed into the rendering pipeline, where pdfwrite intercepts
them before they are rendered, and re-emits them as PDF obejcts,
suitably wrapped up in a PDF file with all the furniture required.
So the image you are talking about is converted from a DCT-compressed
PDF object into an uncompressed sequence of bytes representing the
samples of an image. These are then recompressed back into a PDF object.
There is no way in the existing scheme to have a 'pass-through' mode,
because the graphics library needs to see the image in that decompressed
fashion, not as a DCT compressed stream.
In a former life I did do some work on this and it looks to me like it
is possible to reverse the decoding of a DCT image. If you know the
exact way the image was originally comrpessed (which is stored in the
JPEG information, as it is required for decompression) then it should be
possible to rerun the DCT compression and get the same JPEG data out, no
additional artefacts.
I never got finished with it though, there always seems to be something
more important......
Ken
>
> So it seems to me that the problem is that your ourput is still being
> DCT compressed. Can I ask you to try the command line above and see if
> that improves matters for you ?
I posted a response yesterday put it seems to have gone into the bit
bucket somewhere. Apologies if this ends up as a double post.
You were correct, that was the way to go. The script runs gs twice
(effectively, the first time is in a loop), once to extract each
subframe into a separate file, and once to reassemble into a single
file. The key was to use the Flate nocompress settings for the first
one, so that the intermediate PDF is stored at full resolution without
compression. At the final step there is a tradeoff of size versus
quality. Here is how size turns out:
Size (MB)
13 original
41 /flateencode
16.5 /prepress
13 /printer
For quality, flate is identical to the original, prepress differs
slightly, but it is only evident at high magnification. Printer is
ever so slightly worse than prepress, but still acceptable as it is
only visible at high magnification. /ebook and /screen looked
dreadful though. The original was (I believe) generated with /
printer. I guess it isn't surprising that
original -> printer -> ebook looks a lot worse than original -> ebook.
All of this suggests to me that another switch would be helpful, that
is:
-dColorImageFilter=/copy
Naively, since the PDF is just a bag of objects, one would assume that
it should be possible to just copy one through on both of the gs
operations used here, and so not affect the image quality or the file
size. (Well, the latter would change slightly due to a different
number of pages.) If the page shape changes possibly the copied image
object might have to reside within another object which scales it.
Possible?
The final (I hope) script is again here:
http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/unup4.sh
I used it to break out the test file, went through it very carefully,
and didn't notice any new problems.
Thanks,
David Mathog
> I posted a response yesterday put it seems to have gone into the bit
> bucket somewhere. Apologies if this ends up as a double post.
Apparently Google groups is having problems with usenet. That interface
is accepting usenet posts but not showing any usenet posts made in the
last few days. Hopefully this is just a glitch related to their new
interface. I set up an account on another news server and am happy to
once again be able to both write and read posts.
Regards,
David Mathog
> ...
> Apparently Google groups is having problems with usenet. That interface
> is accepting usenet posts but not showing any usenet posts made in the
> last few days. ... I set up an account on another news server and am happy to
> once again be able to both write and read posts.
And don't you find the Seamonkey newsreader interface to be far more satisfactory
than the GoogleGroups "digest" or "blog" interface? (I sure would :-) .)
Cheers, -- tlvp
--
> Regards,
>
> David Mathog
--
Avant de repondre, jeter la poubelle, SVP