Image to XOD conversion performance

248 views
Skip to first unread message

Jeffrey Stanford

unread,
Jun 20, 2014, 10:20:13 AM6/20/14
to pdfne...@googlegroups.com
Hello,

We've started performance testing the PDF-oriented workflow outlined here (Best practice for converting multiple TIFF/JPG files to a single XOD), and we're seeing some performance problems.

For smaller image collections (<=100kb or so), we reliably get sub-second conversion times, which is what we want.  However, once documents are larger than this threshold, we start to see times measured in tens or hundreds of seconds, and would like to find a faster way.

Our environments tend to have spare processor power for jobs like this; is there any way we can parallelize the conversion process, or parts of it?  Alternately, do you have any performance numbers around other methods of conversion, such as converting single-page TIFF files to multi-page instead of PDF?

Thanks!

Aaron

unread,
Jun 20, 2014, 3:23:13 PM6/20/14
to pdfne...@googlegroups.com
Hello Jeffrey,

We don't have specific performance metrics published for image-to-xod conversion, as these metrics can vary by image type, dimensions, etc.


> once documents are larger than this threshold, we start to see times measured
> in tens or hundreds of seconds, and would like to find a faster way.

Could you elaborate on your requirements?  If this time is a throughput issue, not simply a latency issue, you should be able to deal with it by running multiple conversion processes at once.

Support

unread,
Jun 20, 2014, 4:17:18 PM6/20/14
to pdfne...@googlegroups.com

The performance would in large part depend on input files (i.e. the type of TIFF/JPEG) you are dealing with. For example, a single TIFF frame at 200 DPI could theoretically be 4x slower that processing 100 DPI file (since the number of pixels increases exponentially with resolution). Also the conversion speed may vary depending on the compression method (e.g. CCITTFax, JBIG, Flate, LZW, Flate, etc) and other factors.

 

> is there any way we can parallelize the conversion process, or parts of it

 

Technically this is possible. You could parallelize conversion from multiple images to PDF (then merge files together) or to XOD then merge pages, h

however the question arises whether this is really necessary.

 

Once XOD is converted, you should probably cache it to avoid having to convert every time someone want to view the same document. You could also convert to XOD as soon as the file is added to doc library instead of at the moment user requests the file for viewing. Finally rather than waiting for the file to completely convert you can also on-the-fly convert and stream partially converted file to WebViewer (for example see WebViewerStreaming sample that comes as part of PDFNet SDK). This is probably preferable to parallelizing conversion because a user may want to close the document soon after it is opened. Kepp in mind that for best viewing experience (that allows to instant jumping to page regardless of doc size) we recommend accessing pre-converted XOD files and HTTPPartRetriever (which supports HTTP Byte Ranges).


Jeffrey Stanford

unread,
Jun 20, 2014, 4:50:05 PM6/20/14
to pdfne...@googlegroups.com
To answer both replies:


> If this time is a throughput issue, not simply a latency issue, you should be able to deal with it by running multiple conversion processes at once.
It's a latency issue.  A user issues a request to view a document, and must wait until we finish the conversion and return it.

> Once XOD is converted, you should probably cache it to avoid having to convert every time someone want to view the same document.
We are; we are also caching ahead, but attempting to cache ahead too far may cause this to become a throughput problem as well, and drastically increase space requirements.

>
You could also convert to XOD as soon as the file is added to doc library instead of at the moment user requests the file for viewing.
Possible, but infeasible; this would drastically increase both storage requirements, as well as time to upgrade from our older architecture, since our system often manages tens of millions to several billion documents in an environment.

> Finally rather than waiting for the file to completely convert you can also on-the-fly convert and stream partially converted file to WebViewer (for example see WebViewerStreaming sample that comes as part of PDFNet SDK).
We'll investigate this; while we will probably stick with full conversion for smaller documents, larger documents may warrant this kind of option.

Keith Kaminski

unread,
Jul 28, 2014, 11:42:55 AM7/28/14
to pdfne...@googlegroups.com

I'd like to share some metrics with you based on conversion times:

Method A: Loop through the TIF images (images are CCITT Group4Fax, save as PDF (in memory), and then save as XOD (to disk). Code is very similar to the following:
var files = System.IO.Directory.GetFiles(inputDirectory);
var pdf = new PDFDoc();
foreach (string file in files)
{
 pdftron
.PDF.Convert.ToPdf(pdf, file);
 pdf
.Save(SDFDoc.SaveOptions.e_incremental);
}
pdftron
.PDF.Convert.ToXod(pdf, outputFile);

Results:



Method B: Convert individual TIF files to a multipage TIF (to disk) using FreeImage then save multipage TIF as XOD (to disk). I won't bother showing the code because we basically just call the Convert.ToXod method on the multipage TIF and I'm not sure you're interested in the FreeImage code.

Results:

In both cases, the source images are CCITT Group 4 FAX TIF files. I don't know if they all had the same standard DPI or dimensions. I know that often times we get images that are 3300x2550 at 300 DPI, but that isn't always the case.

If you have ANY recommendations to improve the speed, it would be greatly appreciated. Having to wait an average of 2.5 seconds for 8 images to convert to XOD is far, far longer than I had hoped for.



Keith Kaminski

unread,
Jul 28, 2014, 12:41:32 PM7/28/14
to pdfne...@googlegroups.com
As a follow-up (and the reply I'm following up to hasn't yet been posted), is there a way to convert from a memory stream to XOD? The input stream would be a multipage TIF file. We have the ability to generate multipage TIF files in memory, and we'd prefer to convert the in-memory stream as opposed to writing the stream to disk and having PDFTron read the file from disk.

Aaron

unread,
Jul 31, 2014, 9:57:17 PM7/31/14
to pdfne...@googlegroups.com
Hello Keith,

If you could forward the images you're converting, as well as the exact code you're using to convert, to support@pdftron we can take a look at the performance.

If you're trying to avoid touching disk while converting to XOD, you can convert a memory stream to a PDFDoc (http://www.pdftron.com/pdfnet/samplecode.html#PDFDocMemory), then convert the PDFDoc to XOD (http://www.pdftron.com/pdfnet/PDFNet/html/M_pdftron_PDF_Convert_ToXod.htm), all in-memory.
Reply all
Reply to author
Forward
0 new messages