On 11/28/2012 09:13 PM, Julian Viereck wrote:
> I don't think there is an API to extract images :/
>
> The way it goes right now is like this:
>
> - a single page gets extracted from the PDF and all required objects for
> the page are build up (e.g. font data, images etc)
> - the images are send to the main thread using the messageHandler. In
> case of images, you want to look at this line:
>
>
https://github.com/mozilla/pdf.js/blob/master/src/api.js#L598
>
> If you want to have a good solution, you would have to implement
> something like an imageExtractor in the PartialEvaluator, that looks for
> images, parses only them (to get good performance) and then send them
> back to the main thread in a way, such that you can catch them more easily.
I understand that the API is really layed out for rendering and not for
accessing or even editing parts of the documents.
> If you want to implement such an image extractor, I'm happy to give you
> guidence to get going.
For the moment, I will stick with my Java based server solution that
uses PdfBox, since the adaptation would exceed the time I have currently
available for this use.
Thank you for the explanation and the offer of guidance,
Jos