Re: Question about ElementWriter.e_replacement

51 views
Skip to first unread message

Support

unread,
Dec 6, 2012, 5:29:26 PM12/6/12
to pdfne...@googlegroups.com
 

The most likely problem is that the file is really badly constructed. For example, it is possible that identical resources (e.g. images, fonts, etc) are being created over and over again instead of being referenced only once, but it is hard to guess without looking at a sample input document.

 

ElementWriter.e_replacement replaces/discards the old Resources dictionary, so it is possible that the original file contains refereferences to redundand/unecessary data in the Resources dictionary.

 
You can use PDFTron CosEdit (http://www.pdftron.com/pdfcosedit/downloads.html) to inspect the original file and track down the problem.

On Wednesday, December 5, 2012 9:27:58 AM UTC-8, Bruce Petersen wrote:
Greetings!

We have a customer who is plagued by storing a PDF statement file that is huge (1.8GB) and they asked us if we could do something to reduce the size of this file (they receive such files almost daily).  This statement file is produced by a credit card processing company, First Data / FDR, apparently from AFP print stream origins.  

The awesome news:  Using PDFTron's SDK, I will be able to solve our customer's problem!  The attached code (well, using the same concepts) will save our customer much pain.  Given small (3.5MB) and huge (1.8GB) files, we consistently reduce the PDF file by 88%.

The potentially anxious news:  The customer is likely to ask me to describe (in great detail) how we magically performed this feat ... could you kindly offer your thoughts / opinions / advice or point me in the right direction for further research?

I started by experimenting with a smaller (3.5MB) specimen using PDF Converter Professional (from Nuance).  Converter's typical optimizations did not reduce the file size.  However, the file size was dramatically reduced when using Converter's flatten with stamp and layer options selected.  I am hoping this report may help as a background to the narrative below.

Using PDFTron's optimizing tools yielded the same results as PDF Converter - no reduction in file size.  Using PDFTron's ElementEditTest.java example (almost verbatim), the file size was dramatically reduced - with nearly identical results as PDF Converter noted above.  Would it appear there are sizable, invisible layers that are being omitted?  The ElementWriter write mode appears to make all the difference:  e_replacement yields the significantly smaller file, where e_underlay or e_overlay produce a file of the same huge size as the original.  BTW - the original document doc.hasOC() reports false.  I've attached code that demonstrates using ElementWriter and ElementReader to produce an 88% reduced 'copy' of the original PDFs.

I can honestly report to the customer that when using PDF Converter to compare the original (large) and new (reduced) files, the documents are identical (with a few typical pixel-based exceptions).  I guess it's just hard to resort to "proprietary magic" as an answer :)

I am most grateful for your consideration! - Best Regards,

Bruce

Reply all
Reply to author
Forward
0 new messages