Axel Berger <
Axel_...@b.maus.de> wrote:
> JohnF wrote on Tue, 15-06-30 08:40:
>>Can you suggest what might be done to fix the files vis-a-vis my
>>complaint?
>
> As I dislike those ready made single keypress solutions I tend to do a
> step by step myself (and then write my own single klick script for the
> process).
Yeah, that's what I did to use pdf180 (manually tweaked a procedure
and then automated it), but with a ~25-line C program for a script
that just uses system() to execute constructed commands. It just
mv's odd-numbered pages to an originals/ subdirectory, runs pdf180
against each scan file in originals/ with --outfile ../, and finally
mv/renames the output to remove the -rotated180 suffix. I'd also
tried running pdf180 with --papersize and some other switches,
which resulted in some diff's but didn't affect the behavior
I was complaining about.
> So my first step when confronted with your file would be to use
> pdfimages from the XPDF bundle to extract the hidden images. As you
> don't want them changed (unless to and from lossless compressions that
> are fully reversible) don't forget the "-j" option.
That -j switch saves output as jpegs according to the man page,
and I believe that's not lossless. gif would be lossless, but not jpg.
Anyway, I'd originally experimented before deciding to "save as pdf"
(in quotes as per your original wrapper remarks). The printer has
a bunch of options, jpg being one, and I tried scanning a few pages
in each available format and viewing the output. I can't say why,
but pdf was obviously and noticeably the clearest and most readable.
So pdfimages would just take the already-saved pdf and convert
it back to one of the formats I could have saved "natively".
I'd actually tried that using Imagemagick convert during original
experimentation (just to see what would happen because I was clueless
how the scanner actually saves stuff), and it indeed looks even worse
than the native jpg. But I still have all those test files, and will
try pdfimages (again, just to see what happens).
Aside: regarding your original remarks about a native scanner raster
format, each scan leaves a separate file in /tmp/ with names like
574794 Jun 30 03:50 brscan_jpeg_PAGE1_0hFq9t
469321 Jun 30 04:09 brscan_jpeg_PAGE1_0vPFsF
466243 Jun 30 04:36 brscan_jpeg_PAGE1_1V6wiY
459293 Jun 30 04:14 brscan_jpeg_PAGE1_1axovp
They're larger than the pdf's, which are typically ~250-325K.
I can't make sense of the hash suffix, and despite that "jpeg" in the
name, that's not what they are. Looks more like what you said, some kind
of raster format. Here's the first few "lines" from one. No typical
header/signature/magic-number/etc at the beginning that I can see...
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
00 42 07 00 01 00 84 00 00 00 00 46 00 81 ff da ff B.........F..???
10 00 fe ee 00 00 04 f9 00 00 01 f8 00 00 20 f9 00 .??...?...?.. ?.
20 04 0c 00 1f f0 e0 fd 00 01 01 80 f9 00 03 70 00 ....???....?..p.
30 00 03 fc 00 00 0f f9 00 01 01 c0 f0 00 00 02 ef ..?...?...??...?
40 00 00 78 e3 00 00 01 fa ff 01 ef 80 fc 00 00 0f ..x?...??.?.?...
50 fe ff 42 07 00 01 00 84 00 00 00 00 45 00 81 ff ??B.........E..?
> From there on you might cut off black borders (if present), enhance
> greyscale to go from black to white instead of from a darker middle
> grey to a slightly lighter middle grey (I often endure presentations
> where this has not been done, enhancing colour is trickier and can
> lead to unexpected results), convert to a sensible compressed format,
> often reducing the colour space as a first step (the change to 256
> colours is often invisible) and then bundle into a PDF. For a
> consistent trimming of margins a disciplined and precise positioning
> while scanning is a must, if omitted you have the choice of pages all
> over the place or hand-trimming all pages individually (don't even
> think about it, it's cruel and unusual).
> Axel
Thanks for the additional suggestions, Axel. Actually, the scans are
in b&w. It's just monochrome ink on paper, so there's no information
in the color. And I'd tried various Xsane color,etc settings, but b&w
at 300dpi was actually best (even "crisper"-looking than b&w at 600dpi
for some reason). As for borders, the notebook pages are 11.75x9.375"
requiring a tabloid-size scanner. And Xsane lets you adjust the scan
region, so I got a precise fit, leaving about 0.25" margins all around to
accommodate my "disciplined and precise positioning" (quoting you above).
I used to spend lots of time at library xerox machines copying journal
articles (now almost everything's on the net), so I've become quite good
at reproducibly precise positioning, if I do say so myself.