> You CAN'T scan to PDF. A scanner is a raster machine and what it
> sees is and always will be a raster format.
Some devices are more sophisticated than others. You're talking about
the simplest of scanners.
> You can through a PDF mantle around it or, seen the other way,
> include a raster image inside a PDF, but it will stay a raster
> format.
In principle, there's nothing the stop a scanner from looking for
lines instead of pixels, and composing a vector image. While I'm not
sure if any such thing exists, certainly there are scanners that
derive a raw bitmap just as a first step, but then run it through an
OCR algorithm that results in a collection of text and graphics.
> What's more important and worse, all those "intelligent" machines
> (if you defer to their superior intelligence, what does that say
> about you?) outputting a PDF in a single operation generally allow
> you NO control whatever about the internal raster format
> used.
Actually there are scanners that give copious control over many
variables.
More importantly, secondary utilities like the OCR tool can give
feedback to the light rod such as adjust pixel density to optimize for
that particular OCR algorithm, etc. Making everything a multi-stage
operation forces the human to do the job of a machine, and manually
repeat steps because the feedback loop is severed. If the user sets
the OCR language to Thaana, for example, maybe a different pixel
density works better. While the user should be able to override the
setting, the default should be the most sensible.
Information should go the other direction as well. PDF metadata can
store details about the scan not otherwise possible.
> Usually that will be a Jpeg with 16 million different colours
> at an atrociously low resolution and a high compression setting
> losing content and inserting highly visible artefacts.
Versatile scanners give you the choice. Users should have control
over the quality. It's obviously unreasonable for the scanner to
always produce the highest quality image saved in the raw internal
bitmap for every page, when a user may be scanning a 250 page text
document that then needs to go over the LAN. By the time they do the
final processing, it's too late, the network congestion has already
happened.
> THat said, taking the PDF, unpacking all the images, roting as
> needed, and repacking again, is one of the simpler tasks batch files
> were invented for.
Batch files? Are you talking DOS batch files? How do you unpack the
text that has already been linked to images in the PDF container? I'm
aware of pdftotext, but that loses the linkage, no?