Convert PDF to HTML 5

77 views
Skip to first unread message

Support

unread,
Oct 9, 2012, 2:15:18 PM10/9/12
to pdfnet-w...@googlegroups.com
Q: is there any component available with the SDK that will allow me to save a converted .pdf directly to an .html file not use the WebViewer to display it (http://www.pdftron.com/pdfnet/webviewer/demo.html)? Even if saved to .xod and then html5 would be fine.
------------
A:

There are no built-in functions to convert PDF to HTML; however, it is possible in .NET as shown in the sample below.

(http://www.pdftron.com/pdfnet/samplecode/Pdf2Html.cs)

"The only intent of this sample is to show how to use core PDFNet API to implement a very basic PDF to HTML converter. It was not designed to be bullet proof nor to be used in production. The main limitation is related to font substitution. In PDF fonts are typically embedded, which guarantees accurate text reproduction. In case of Pdf2Html sample text locations are correct, however in some cases (where font match is not found) substituted font has larger advance widths words can grow and start overlapping each other. You could extract embedded fonts (pdftron.PDF.Font.GetGlyphPath) and normalize them to WOFF (a format compatible with most browsers) then use these 'web fonts' instead of default fonts."

If your PDF files are fairly simple, then you could try this approach to see if the html results turn out okay.
Overall, we recommend using our WebViewer technology instead, as it gives much better results.

Reply all
Reply to author
Forward
0 new messages