For many documents,
gs -q -dNOPAUSE -dBATCH -r600x600 -sDEVICE=bbox filename
would work. However, for many documents originated from scanned
sources, this produces too large bounding box (AFAIK, these documents
contain an embedded image with huge white margins).
Anyone knowing a better way to treat this problem? I could convert to
a bitmap, but what is the quickiest way to find bounding boxes of ink
in a bitmap?
Thanks,
Ilya
P.S. I see that pdfcrop uses similar technique to what I do, so it
would fail likewise...
>I maintain a script which mangles a PS/PDF/DVI document for the "max
>scale" 2up printing. To do this, I need to find bounding box of the
>pages.
>
>For many documents,
> gs -q -dNOPAUSE -dBATCH -r600x600 -sDEVICE=bbox filename
>
>would work. However, for many documents originated from scanned
>sources, this produces too large bounding box (AFAIK, these documents
>contain an embedded image with huge white margins).
White margins are the normal thing. I suspect the problem is that you
have off-white margins from scanning, so the true bounding box
includes all of the near-white paper.
----------------------------------------
Aandi Inston
Please support usenet! Post replies and follow-ups, don't e-mail them.
Well, they are not "my" scans, but those produced by the publisher.
However, "their" scans still may have background not normalized to
0xFFFFFF...
To detect this, I will need to ghostscript to ppm/etc, and observe the
output, right? Hmm, probably imagemagick should be able to convert
PDF --> TXT too.
Thanks,
Ilya
http://netpbm.sourceforge.net/doc/pnmcrop.html
You might need to posterise the rendered image
to get the pnmcrop to work the way you want.
BugBear
BTW, is there a way to ask Ghostscript to make document 10% brighter,
so that 0xEEEEEE would overblow to 0xFFFFFF?
Thanks,
Ilya
Do not think it is applicable. bbox returns
%%BoundingBox: 0 0 448 681
(or some such) for all pages. Doing
convert xx1.pdf[2]" ~/tmp/xx1.txt
reports size as 448x680, but the first non-white pixel is at 66,45
(0-based). Same with -depth 16.
So the margin IS white, but bbox can't find it...
gs is 8.54...
Puzzled,
Ilya
As I understand it bbox records the PAINTED
area.
The border may (well) be painted white.
BugBear
> As I understand it bbox records the PAINTED area.
This was my initial conjecture too (given the results), but Aandi says
otherwise, and AFAIK, the documentation explicitely says that what is
painted white should not be included.
Maybe this relates to what is painted by "pure" PS commands, and not
by "embedded graphics"?
Bug or feature?
Thanks,
Ilya
>As I understand it bbox records the PAINTED
>area.
>
>The border may (well) be painted white.
That would be incorrect, in as much as that isn't what the bounding
box is. However, I'm quite happy to accept it if you say that is what
this particular piece of software does (it IS much easier to measure).
> http://netpbm.sourceforge.net/doc/pnmcrop.html
> You might need to posterise the rendered image
> to get the pnmcrop to work the way you want.
Thanks for the pointer. I remember that I saw something like this
somewhere, but forgot where.
However, it is not what I need. I do not need the actual crop; what I
need is bbox. BBoxes for first/odd/even pages are combined in my tool
to decide on the best croping strategy...
Thanks anyway,
Ilya
A simple experiment shows that white raster pixels are not considered as
painted white : I run the following PS snippet through GS 8.54 (OS X)
with gs -DEVICE=bbox. On each page, a 100x100 white square is drawn with
different methods and a smaller 60x60 black square is drawn on top :
%!
(white square as filled path:) = flush
0 0 moveto 0 100 lineto 100 100 lineto 100 0 lineto
closepath 1 setgray fill
0 setgray 20 20 60 60 rectfill
showpage
(white square as rectfill:) = flush
1 setgray 0 0 100 100 rectfill
0 setgray 20 20 60 60 rectfill
showpage
(white square as an image:) = flush
gsave
100 100 scale
100 100 8 [100 0 0 -100 0 100] <FF>
image
grestore
0 setgray 20 20 60 60 rectfill
showpage
(white square as an imagemask:) = flush
1 setgray
gsave
100 100 scale
100 100 true [100 0 0 -100 0 100] <FF>
imagemask
grestore
0 setgray 20 20 60 60 rectfill
showpage
%%EOF
The results are :
AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
white square as filled path:
%%BoundingBox: 19 19 81 81
%%HiResBoundingBox: 19.997999 19.997999 80.009998 80.009998
>>showpage, press <return> to continue<<
white square as rectfill:
%%BoundingBox: 19 19 81 81
%%HiResBoundingBox: 19.997999 19.997999 80.009998 80.009998
>>showpage, press <return> to continue<<
white square as an image:
%%BoundingBox: 0 0 100 100
%%HiResBoundingBox: 0.000000 0.000000 99.999981 99.999981
>>showpage, press <return> to continue<<
white square as an imagemask:
%%BoundingBox: 0 0 100 100
%%HiResBoundingBox: 0.000000 0.000000 99.999981 99.999981
>>showpage, press <return> to continue<<
________________________________________________________
François Robert
(white square as an image, clipped:) = flush
gsave
10 10 80 80 rectclip
100 100 scale
100 100 8 [100 0 0 -100 0 100] <FF>
image
grestore
0 setgray 20 20 60 60 rectfill
showpage
The results is :
white square as an image, clipped:
%%BoundingBox: 9 9 90 90
%%HiResBoundingBox: 9.990000 9.990000 89.999997 89.999997
(it appears I'm, wrong, Aandi - see posts by François Robert)
I dread to think what happens if the background is no white
e.g. an inverted transfer curve.
BugBear
It's true that pnmcrop is going to do more work than you want,
but it *can* tell you the bounding box, or at least information
that is equivalent, if you use the -verbose option:
pnmcrop: Background color is blue
pnmcrop: cropping 17 rows off the top
pnmcrop: cropping 6 rows off the bottom
pnmcrop: cropping 3 cols off the left
Just throw pnmcrop's actual output at /dev/null and
parse the above and you have what you want. As others have
pointed out, given that your images are scanned, they are
unlikely to have pure white backgrounds. Try either turning
the postscript to pbmraw format (which forces every pixel
to black or white) or using some other tool such as
posterising, which someone else has suggested. You will need to
do some experimentation to find out what works best
for you.
They do. Did you ever see epublished-by-scanning magazines?
Thanks for pnmcrop tips...
Ilya
> I dread to think what happens if the background is no white
> e.g. an inverted transfer curve.
FYI, I ran again my tests with your suggestion. I setup a simple linear
transfer function in the graphic state befor the first test :
%!
{ 0.5 mul 0.25 add } settransfer
(white square as filled path:) = flush
etc...
The results :
AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
white square as filled path:
%%BoundingBox: 0 0 101 101
%%HiResBoundingBox: 0.000000 0.000000 100.007997 100.007997
>>showpage, press <return> to continue<<
white square as rectfill:
%%BoundingBox: 0 0 101 101
%%HiResBoundingBox: 0.000000 0.000000 100.007997 100.007997
>>showpage, press <return> to continue<<
white square as an image:
%%BoundingBox: 0 0 101 101
%%HiResBoundingBox: 0.000000 0.000000 100.007997 100.007997
>>showpage, press <return> to continue<<
white square as an image, clipped:
%%BoundingBox: 9 9 90 90
%%HiResBoundingBox: 9.990000 9.990000 89.999997 89.999997
>>showpage, press <return> to continue<<
white square as an imagemask:
%%BoundingBox: 0 0 101 101
%%HiResBoundingBox: 0.000000 0.000000 100.007997 100.007997
________________________________________________________
François Robert
Could you please sum up the results in plain language, for those of us
not fluent in PS? Something like
with transfer function set, bbox acts on color AFTER applying the
transfer function (with white not contributing to bbox), with an
exception of non-cropped images contributing ALL their size into bbox?
Thanks,
Ilya
(Conjecturally,) I found such a way. I suspect that
convert filename.pdf -fuzz 10% -trim info:
will print what I want (or maybe I could even substitute some
-print FORMAT
to get EXACTLY the info I want in a reliable format).
======================
Unfortunately, the computers around have only an older version of
ImageMagick which does not support info:, -identify, or -print. And I
cannot even convince it to translate to a particular format without
specifying an extension (e.g., to write to STDOUT)... Best I managed
to do is to use
convert filename.pdf -fuzz 10% -trim -depth 1 tmp.miff
identify tmp.miff
tmp.miff[0] MIFF 555x770 570x792+2+16 PseudoClass 2c 1.2mb 0.120u 0:01
tmp.miff[1] MIFF 547x780 570x792+16+6 PseudoClass 2c 1.2mb 0.090u 0:01
tmp.miff[2] MIFF 556x783 570x792+7+3 PseudoClass 2c 1.2mb 0.050u 0:01
Anyone being able to force it at least into a pipeline, as with
convert filename.pdf -fuzz 10% -trim -depth 1 -format MIFF - | identify -
(which does not work, since -format is not understood the way I want)?
Thanks,
Ilya
P.S. Versions around here are 6.0.7 and 6.1.1...