"John C." <
r9j...@yahoo.com> wrote:
> Sometimes, it's possible to download two identically-sized .pdf files,
> which if you do a page-by-page comparison, are also identical in that
> regard.
>
> However, if you do a hash check to see if they're two copies of the same
> file, it turns out that they are not.
>
> If the files have a lot of pages, then doing a page-by-page compare of
> the two can be pretty difficult to accomplish.
>
> Winmerge (
https://winmerge.org/) will compare .pdf files and point out
> metadata differences, but not actual content differences.
>
> Does anybody know of a freeware program that can compare two .pdf files
> and list any differences in content?
>
> TIA.
There is more information in a PDF than just the content of the
document. There's where you were last in the document to reposition you
to the same spot when you later revisit the document. There's the
creation datestamp (inside the file, not in the file system for the OS).
There is metadata that contains the author's name (different people can
generate the same PDF), keywords, copyright info, if a password,
certificate, or security policy was applied to the PDF, the initial view
(opening page number, zoom level, if bookmarks, thumbnails, toolbar, and
menu are displayed), custom properties can be added, images embedded in
the document could be at different resolutions, or even slightly
different images that your eye cannot perceive, and more.
The author of a PDF can decide to include no fonts in the PDF relying on
the recipient's host to have identical or nearly identical fallback
fonts to render the document, or they can include just those fonts used
in the PDF, or they can include the complete font(s) used in the PDF.
That a document looks the same doesn't mean it is identical to another
document that looks the same.
Did you view the document properties to see if those were identical?
There is a lot more defined within a PDF than just the bare document.
You don't want to do a binary compare on the .pdf files. You want to do
a text compare on the documents within. If you're checking for
plagarism, you only want to compare on the document's text, not on
photos. You can copyright paintings, but not photos since anyone else
could take the same photo as you.
Adobe has their own PDF compare tool, but it is trialware, not freeware
(
https://www.adobe.com/acrobat/how-to/compare-two-pdf-files.html).
You could print the PDFs, and compare the document content dumped into
the output files. Many PDF readers let you "print" or save to doc
formats other than PDF, like save to .docx or .txt. You could even
print/save to .pdf format, but that could carry along all the metadata
which could differ between the PDF files. Word files (.doc[x]) have
metadata, so you'd have to check if saving a .pdf in whatever PDF viewer
you are using will migrate PDF metadata to Word metadata. Text files
have no metadata. They can have Alternate Data Streams (ADS), but you
won't be viewing those when doing a compare. Of course, text files
cannot contain images, and sometimes a PDF is nothing but one, or a
series, or images, or will contain an image; however, saving
(converting) to .txt format will strip images in both the files you want
to compare, but that also means you won't be doing a true compare since
the images could be different (even if they don't look different).
There's PDFforge online (
https://www.pdfforge.org/online/en/compare-pdf)
but I've not used that one; however, I've rarely had to compare PDF
files against each other. It's been a very long time since I used the
PDFforge tools. They have their PDFCreator that lets you convert from
PDF to other doc formats, so you could convert to a doc format that
doesn't have or carry along any metadata. Their free edition is limited
(see
https://www.pdfforge.org/pdfcreator/editions), so I don't know if
it supports the conversion feature; however, it looks like the free
edition is adware per their own comparison table. While the conversion
they advertise is about printing a document using their printer emulator
(you create PDFs from other docs), I would think it would have its own
print or save function that lets you choose the doc format for output.
PDF24 also has their online tools; see:
https://tools.pdf24.org/en/compare-pdf
https://tools.pdf24.org/en/all-tools
There's
https://www.diffchecker.com with their online PDF comparer. You
can get their client tool, but it is trialware (30 days) - their pricing
page shows it is actually subscriptionware. However, you didn't say if
this was a one-time need, or if you need something for repeated use for
later.
Another online tool:
https://app.copyleaks.com/text-compare/compare-pdf-files
You didn't mention if online tools are a taboo to you. This one looks
to require you create an account with them.
We don't know what other document software you have, or even which PDF
viewer you are using. If you have MS Word (after whatever version added
PDF support), you can open a .pdf in Word, and then save as a .doc[x]
file. Be aware that Word files also have metadata. See:
https://support.microsoft.com/en-us/office/remove-hidden-data-and-personal-information-by-inspecting-documents-presentations-or-workbooks-356b7b5d-77af-44fe-a07f-9aa4d085966f
I suspect the local client products are mostly payware is because they
have to pay for a license to Adobe to use some tool they embed in their
product. You can try the trialware versions, but that's not a solution
if you intend to keep doing PDF compares well past the trial period.
Their online versions are free, because you never get their software
used to do the compare that runs in the background.