On 3/14/2022 9:40 PM, Dale wrote:
>
> Thank You !!!!
>
> I didn't want to buy Photoshop.
>
> I might install GIMP again ...
>
This is an analysis of the PDF file. Using this package for Windows.
https://www.mupdf.com/releases/index.html
mupdf-1.19.0-windows.zip # The Tesseract version includes OCR.
The first command, removes more compression than necessary, and does not
expose the secrets of Adobe well. But this command still comes in handy.
Even this command does not remove all the compression inside some documents,
but it's a start.
C:\TEMP> mutool convert -F pdf -O decompress,clean -o output.pdf DaleRKelly122717.pdf
This command converts some of the structure of the PDF, into text that Notepad
can read. We can use this one for a look at the structure of the PDF.
C:\TEMP> mutool convert -F pdf -O clean -o output.pdf DaleRKelly122717.pdf
Next, change the file extension on the file, by adding .txt to the end with
the File Explorer rename capability. It helps if the OS and its File Explorer
are set to display the file extension (PDF in this case) so that in the
GUI you can see what is going on.
ren output.pdf output.pdf.txt
notepad output.pdf.txt
You can tack extensions on like that, and it's a way of keeping
track what transformations you have performed.
******
Object number 3 is a raw picture.
Object number 7 is a raw picture.
The pictures need a transform matrix, to position
the picture on the page properly.
When you ask a tool to extract the picture, it calls
the pictures image-0005 and image-0009 because at that
point, they have been instantiated on the paper surface.
And what we came here for, was to verify it uses CCITTFaxDecode as
the compressor. It's their CCITT4 compressor of long ago. All I
wanted to see was "CCITT" to verify a quality compression.
3 0 obj <=== this object stores a picture, byte length 102826
<</DecodeParms<</Columns 5088/Rows 6688/K -1/EndOfBlock true>>/Type/XObject/Subtype/Image/Filter/
CCITTFaxDecode/BitsPerComponent 1/Width 5088/Height 6688/ColorSpace/DeviceGray/Length 102826>>
stream ...
5 0 obj <=== image-0005
<</Length 61>>
stream
1 0 0 -1 0 802.56 cm <=== some sort of transform matrix
-610.56 0 0 802.56 610.56 0 cm
/Img3 Do
endstream
endobj
7 0 obj <=== this object stores a picture, byte length 62500
<</DecodeParms<</Columns 5090/Rows 6672/K -1/EndOfBlock true>>/Type/XObject/Subtype/Image/Filter/
CCITTFaxDecode/BitsPerComponent 1/Width 5090/Height 6672/ColorSpace/DeviceGray/Length 62500>>
stream
9 0 obj <=== image-0009
<</Length 59>>
stream
1 0 0 -1 0 800.64 cm <=== some sort of transform matrix
-610.8 0 0 800.64 610.8 0 cm
/Img7 Do
endstream
endobj
Raw PDF is usually a "binary-looking" format. Using mutool converts
portions of it to readable text, so you can see the commands. PDF files
are hard to edit manually (for a human), because like a banker, the length
of every structure is recorded in a ledger. You can't change the length of
objects, without upsetting some ledger entry.
Paul