Where can i find the byte range description of a xod file?

139 views
Skip to first unread message

fraga

unread,
Jun 6, 2014, 7:49:28 PM6/6/14
to pdfnet-w...@googlegroups.com
Is there a map of the bytes? so i can get them my self with javascript?
Or a documentation to implement my own xod reader?

Support

unread,
Jun 6, 2014, 8:54:39 PM6/6/14
to pdfnet-w...@googlegroups.com
It is located in central directory table of the ZIP file and special organization of parts within.

Having said this, why would you want to reinvent the wheel? Even if you want to write completely new WebViewer from scratch (e.g. a new DocView) you could still use low-level Document and PartRetrievers. So it is hard to imagine why would anyone need to do that?

fraga

unread,
Jun 7, 2014, 6:44:09 PM6/7/14
to pdfnet-w...@googlegroups.com
I want to do this:

User upload 1 pdf, the server convert its to xod
User is now able to open de document on my site and see it in the webviewer
User need to split this document in multiple documents.
In server i read all the structure getting the byte range of all the elements needed in a page canvas drawing (text.xml, page.xml, thumb.jpg, images, annotations, etc)
I save this information in a database so user can split files logically
When a user enter the new created documents, i can get the compound pages from the db and render de pages and still getting the content from the same XOD.

Voila!.. currently i am doing just that! but only with images, I take a screenshot of each pdf page and save it! so my user can split one big document in little documents.

I am using the same as you are using! a canvas in which I draw the image and the annotations! but i want to draw the vectors so the quality of the text is better and the size on my server
get smaller.

fraga

unread,
Jun 7, 2014, 6:51:17 PM6/7/14
to pdfnet-w...@googlegroups.com
I have been analizing the CoreControl.js and now know how to get the entire names in the zip and the start position for the bytes!
but it is really slow to understand the code! don't you have an open source? or a white document with the byte specification?
something like:

-22 bytes at the end of the xod file is the information in this format (Start-End), so you can get all the names, etc, etc

Or maybe a PartRetriever that can get part from other documents, so i can draw this parts in my canvas?

Matt Parizeau

unread,
Jun 10, 2014, 2:13:34 PM6/10/14
to pdfnet-w...@googlegroups.com
We don't release the unobfuscated source code for the core WebViewer code but do make it available for the viewer components. A XOD file is a valid XPS file so if you really want you could go digging in the XPS spec...

With that said there may be a solution using the lower level Document object in WebViewer. If you take a look at the provided deck.js sample in the WebViewer download, specifically viewer.js, you'll see an example of using Document.LoadCanvasAsync. This will give you a canvas for a single page from a specific document. What you could do is have one Document object for each of the documents that have pages in your combined document. Then as a page is visible in the combined document you would check which document it belongs to and get the canvas from the appropriate WebViewer Document object.

One very important thing to note about this! Multiple Document objects loading resources can interfere with each other if they are on the same page. To prevent this you'll need to make sure that no two separate document objects are rendering pages at the same time. Also once a document has finished loading a page you must call the UnloadResources function on it to make sure it is cleaned up and ready for another Document to load a page.

One other thing to note is that this means you'll have to layout the canvases on your page yourself and add your own controls for zooming.

Here is some very quick and dirty sample code for you to get a general idea:

function loadPage(docLocation, pageIndex, callback) {
   
var partRetriever = new window.CoreControls.PartRetrievers.HttpPartRetriever(docLocation, window.CoreControls.PartRetrievers.CacheHinting.CACHE);
   
var doc = new window.CoreControls.Document();
    doc
.LoadAsync(partRetriever, function() {
        doc
.LoadCanvasAsync(pageIndex, 1, 0, function(canvas) {
            $
('body').append(canvas);
            doc
.UnloadResources();
            callback
();
       
}, function() {}, 1);
   
});
}

loadPage
(my_doc_location_1, 0, function() {
    loadPage
(my_doc_location_2, 0, function() {
        loadPage
(my_doc_location_1, 1, function() {

       
});
   
});
});

I was doing some testing with this and unfortunately it seems like you can't reuse the Document objects because of some issues with fonts but you should be able to reuse the part retriever objects if you like. LoadAsync won't load too much data so hopefully this shouldn't affect performance too much.

Matt Parizeau
Software Developer
PDFTron Systems Inc.
Reply all
Reply to author
Forward
0 new messages