webviewer range request optimization

687 views
Skip to first unread message

Александр Кирин

unread,
Oct 1, 2014, 3:41:10 AM10/1/14
to pdfnet-w...@googlegroups.com
Hi!

I'm evaluating the pdftron webviewer as a new solution to present a user pdf docs on web. 

A xod file i get from a pdf is about 20 mb and webviewer tries to  load it using a so called range request. 
By means of chrome developer tools i can see web viewer generates several range requests even when i don't try to change the view to another page, some kind of background loading. My first thought was it would just request the visible page of a doc, but it looks like it's wrong. 

Even if a page in size is larger that the one specified in range request the viewer makes several requests to load it full. 

The problem is the server side code which serving range requests  is heavy weight as it retrieves a pdf doc from a remote webdav server. So to get content for a range request may take a long time, the things get worst when there are several requests.

so my questions are:
1. is it possible tell the web viewer how many bytes to request per range request? i want to make it greater to minimize the number of requests.
2. is it possible make the viewer to load not a whole xod doc file but several files presenting  pages of a pdf doc - load just a file representing a visible page in the viewer?
3. maybe there's some another approach the viewer has to the problem?

Alexander.

Matt Parizeau

unread,
Oct 1, 2014, 4:17:47 PM10/1/14
to pdfnet-w...@googlegroups.com
Hi Alexander,

To answer your first first two questions unfortunately the answers are both no, but there may be some other solutions that could work.

You mentioned that you request a PDF from a remote webdav server, does that mean that you convert it to XOD on every request? One potential solution is to create a XOD file cache on your server so that you can then serve the files statically and take advantage of range requests. This post mentions it briefly but basically you would convert once and then store the converted XOD on your server for future requests: https://groups.google.com/forum/#!msg/pdfnet-webviewer/-twjop95iQ8/Q8C_bMoe7gMJ

The other option, which I wouldn't recommend for files of this size, would be to set streaming: true when creating your new PDFTron.WebViewer. This is meant for streaming XOD file conversions (which might be what you're doing) however it only works well when these files are small. 20MB is probably too large because what will happen is that the entire file must be downloaded and parsed and kept in memory all at the same time by the browser. When the parts are expanded and parsed it will take up much more than 20MB and the browser will hang while it's loading. So you could try this way and if it loads quickly enough and doesn't crash your browser then that's great, but if you can get range requests working, for example with a XOD cache then I would highly recommend that.

Matt Parizeau
Software Developer
PDFTron Systems Inc.

Александр Кирин

unread,
Oct 3, 2014, 7:48:23 AM10/3/14
to pdfnet-w...@googlegroups.com
Hi!

We store pdf converted in xod file, there's no conversion on request.  As pdfs are not changed over time it seems like caching xod file on the server makes no sense because  we have millions of pds and thus, eventually, we tend to end up with a duplicated content from webdav server.

Is it possible for you to extend the webviewer to meet our meets for extra pay?

четверг, 2 октября 2014 г., 0:17:47 UTC+4 пользователь Matt Parizeau написал:

Matt Parizeau

unread,
Oct 3, 2014, 5:43:54 PM10/3/14
to pdfnet-w...@googlegroups.com
Hi Alexander,

Since you already have xod files on your webdav server (if I understand correctly) then could you basically just forward the range requests onto the webdav server? The amount of code required to handle range requests shouldn't be too much (if it isn't already supported on your webdav server) so this most likely seems like the best way to go. Any solution that just requests bigger chunks at a time is really just hiding the problem, but I suppose might improve things over what is happening now. I would still recommend handling range requests correctly on your servers as the best way to support WebViewer.

If you would still like to have WebViewer extended through a custom project then you can submit a custom engineering request at this page: https://www.pdftron.com/support/professionalservices.html

Matt Parizeau
Software Developer
PDFTron Systems Inc.

Александр Кирин

unread,
Oct 22, 2014, 5:28:40 AM10/22/14
to pdfnet-w...@googlegroups.com
Hi!

I made a request using https://www.pdftron.com/support/professionalservices.html but for several days no reply. Are they alive?

суббота, 4 октября 2014 г., 1:43:54 UTC+4 пользователь Matt Parizeau написал:

Support

unread,
Oct 22, 2014, 4:25:07 PM10/22/14
to pdfnet-w...@googlegroups.com

Sorry for the delay... our support engineer will get in touch with you shortly.

Support

unread,
Oct 28, 2014, 12:56:09 AM10/28/14
to pdfnet-w...@googlegroups.com

For optimal operation we assume that HTTP server supports byte ranges ( this is true for most servers out there ... including S3, Azure storage,  etc).

If support for byte ranges is completely out of question, WebViewer also supports loading XOD document parts stored in a folder (XOD is just XPS ... so just a ZIP).

In this case use   --external_parts option in DocPub (https://www.pdftron.com/docpub/downloads.html) or SetExternalParts (bool generate) in XODOutputOptions (if you are using PDFNet).

 

More info about this option:

 

  --external_parts                   For conversions to .xod only. Output XOD

                                     as a collection of loose files rather than

                                     a zip archive. This option should be used

                                     when using the external part retriever in

                                     Webviewer.



Once you've converted the PDF to an unzipped XOD file then you'll have to make one small modification to WebViewer.js to be able to load it.

 

In WebViewer.js find the _getHTML5OptionsURL function and after the if statement if (typeof options.initialDoc...) add this code:

if (options.externalPath) {

    var path = this._correctRelativePath(options.externalPath);

    path = encodeURIComponent(path);

    url += "&p=" + path;

}

 

Then when you create your WebViewer instance you would do it like this:

var myWebViewer = new PDFTron.WebViewer({

    initialDoc: "external",

    externalPath: "/path/to/your/xod_folder",

    ...

    ...

}, viewerElement);

dimitar...@gmail.com

unread,
Jun 18, 2015, 3:27:37 PM6/18/15
to pdfnet-w...@googlegroups.com
Hello,

I tried the externalpath solution provided here but could not load the xod folder in the viewer although that worked with the zipped xod itself.

It will be good if I can rather use the unpacked version as this will mean that we can just use the files as we have them on the server as folders.

The error is:

Uncaught TypeError: Cannot read property 'length' of null      CoreControls.js:516

Can you please help with this?

Thanks.
Dimitar

Anatoly Kudrevatukh

unread,
Jun 18, 2015, 6:40:49 PM6/18/15
to pdfnet-w...@googlegroups.com
Hello,

Thank you for reporting this issue.

What version of WebViewer are you using?

With the latest WebViewer you can pass a dummy initialDoc option to workaround an issue with an external path option. The future releases of WebViewer will have it fixed.
Also note that the option is externalPath with a capital P.

dimitar...@gmail.com

unread,
Jun 19, 2015, 12:14:59 PM6/19/15
to pdfnet-w...@googlegroups.com
Hello,

Yes I am using the latest WebViewer.

The options I tried are these:

$(function () {
            var myWebViewer = new PDFTron.WebViewer({
                path: "Scripts/lib",
                type: "html5",
                initialDoc: "external",
                externalPath: "/000000000030179899",
                config: "Scripts/viewerCustomizations.js",
                streaming: true
            }, viewerElement);
        });

$(function () {
            var myWebViewer = new PDFTron.WebViewer({
                path: "Scripts/lib",
                type: "html5",
                initialDoc: "external.xod",
                externalPath: "/000000000030179899",
                config: "Scripts/viewerCustomizations.js",
                streaming: true
            }, viewerElement);
        });

Just in case, in the second line I have changed external to external.xod and re-tried. I am still getting the same error in both cases.
What would you advise?

Thanks.
Dimitar

Matt Parizeau

unread,
Jun 19, 2015, 8:36:47 PM6/19/15
to pdfnet-w...@googlegroups.com
Hi Dimitar,

How did you convert your external XOD file? Did you use DocPub with --external_parts or did you just unzip the XOD file manually? To work with externalPath you'll have to convert the XOD file using DocPub or PDFNet, it won't work if it's just unzipped.

Matt Parizeau
Software Developer
PDFTron Systems Inc.

dimitar...@gmail.com

unread,
Jun 22, 2015, 1:17:31 PM6/22/15
to pdfnet-w...@googlegroups.com
Hi Matt,

What we do is convert the pdf into a xod in memory using PdfNet and then create an xps file which is later on unpacked as it is practically the zip file that outputs the folder I used for the externalPath.
Here is the code that does what I described above:

var file = pdftron.PDF.Convert.ToXod(pdfDoc, options);

                var fr = new FilterReader(file);
                var buffer = new byte[64*1024]; //64 KB chunks

                var memoryStream = new MemoryStream();

                int len;
                while ((len = fr.Read(buffer)) > 0)
                {
                    memoryStream.Write(buffer, 0, len);
                }

                fr.Flush();
                fr.Dispose();

                memoryStream.Seek(0, SeekOrigin.Begin);

                xpsFile = ZipFile.Read(memoryStream);

ZipFile is coming from Ionic.Zip.dll.
Is there something different that we need to do to create the externalPath folder?

Thanks.
Dimitar

Matt Parizeau

unread,
Jun 22, 2015, 5:14:04 PM6/22/15
to pdfnet-w...@googlegroups.com
Hi Dimitar,

Unfortunately it won't work if you convert to a normal XOD file and then unpack it yourself. You'll need to convert with the external parts option. There are some crucial differences in the XOD when converted with that option which is why it doesn't work to just unzip the XOD.

Matt Parizeau
Software Developer
PDFTron Systems Inc.

dimitar...@gmail.com

unread,
Jun 23, 2015, 12:26:37 PM6/23/15
to pdfnet-w...@googlegroups.com
Hi Matt,

Here is the code where I tried to add the SetExternalParts option but that resulted into a 0 size for the file after the line of XOD conversion - var file = pdftron.PDF.Convert.ToXod(pdfDoc, options);.
I am not sure why that is - can you please provide an example code that works for a PDF converted to the XOD external parts or point me to the fix needed for the below code?

                var options = new Convert.XODOutputOptions();

                options.SetFlattenContent(pdftron.PDF.Convert.FlattenFlag.e_off);

                 
                options.GenerateURLLinks(true);

                options.SetAnnotationOutput(pdftron.PDF.Convert.XODOutputOptions.AnnotationOutputFlag.e_flatten);

                options.SetOutputThumbnails(false);

                options.SetSilverlightTextWorkaround(true);

                options.SetExternalParts(true);

                var file = pdftron.PDF.Convert.ToXod(pdfDoc, options);

                int fileSize = file.Size(); //Size of the file is 0.

                var fr = new FilterReader(file);
                var buffer = new byte[64*1024]; //64 KB chunks

                var memoryStream = new MemoryStream();

                int len;
                while ((len = fr.Read(buffer)) > 0)
                {
                    memoryStream.Write(buffer, 0, len);
                }

                fr.Flush();
                fr.Dispose();

                memoryStream.Seek(0, SeekOrigin.Begin);
                
                xpsFile = ZipFile.Read(memoryStream);

Thanks.
Dimitar

Matt Parizeau

unread,
Jun 23, 2015, 7:24:32 PM6/23/15
to pdfnet-w...@googlegroups.com
Hi Dimitar,

Is there a reason that you need to use the filter to convert? Unfortunately converting to a filter doesn't work with the external parts option.

You could instead use the ToXod call that takes a pdfDoc, output path and options. For example:
pdftron.PDF.Convert.ToXod(pdfDoc, "my_output_folder", options);

Matt Parizeau
Software Developer
PDFTron Systems Inc.

dimitar...@gmail.com

unread,
Jun 24, 2015, 12:34:38 PM6/24/15
to pdfnet-w...@googlegroups.com
Hi Matt,

Great, that works, now I have the external path folder. How can I consume this folder for the web viewer through an http endpoint, in the same way a web service would deliver pdf files?

Thanks.
Dimitar

Matt Parizeau

unread,
Jun 24, 2015, 6:27:38 PM6/24/15
to pdfnet-w...@googlegroups.com
Hi Dimitar,

For the external path folder WebViewer will make HTTP requests for the individual parts that are inside the folder. So you just need to make sure that all those parts are accessible through HTTP. For example WebViewer will request yourserver.com/my_external_path/Pages/1.xaml.

Matt Parizeau
Software Developer
PDFTron Systems Inc.
Reply all
Reply to author
Forward
0 new messages