Q:
We are using PdfNet for converting pdf to html (http://www.pdftron.com/pdfnet/samplecode/Pdf2Html.cs)
and then back from html to pdf (http://www.pdftron.com/pdfnet/samplecode.html#Html2Pdf).
While converting from pdf to html some words are overlapping (Please check the
attached file). Moreover hyperlinks in PDF are also not getting part of
generated HTML. Will this issue persist in your licensed version?
Also, we want to
purchase licensed version of PDFNet. Do we also need to purchase one of extra
add-ons (http://www.pdftron.com/pdfnet/features.html#Convert)
for license for PDF to HTML conversion?
-------------
A:
The currently recommended approach for PDF viewing / annotation / collaboration
in a web browser is to use the WebViewer SDK (http://www.pdftron.com/pdfnet/webviewer/demo.html).
For a quick test, use http://s84786.gridserver.com/website/demo/bookstore/details.php
or Cloud API (http://www.pdftron.com/pdfnet/cloud/started.html).
The cool thing about the WebViewer works on all platforms, browsers, and
viarions devices (including iPad/iPhone, Android, Windows 8 Surface, old
non-HTML5 browsers, etc...). Unlike other HTML solutions (e.g Google docs?) the
content is not rasterized and the system does not rely on server side
rasterization or proprietary systems/APIs etc.
The reason for recommending WebViewer instead of straight PDF -> HTML ->
PDF conversion is that the latter is impossible to implement without
significant loss of information (i.e. HTML without Canvas support doesn't
support _most_ PDF features that are required for accurate document
reproduction).
Regarding Pdf2Html sample, the only intent of the sample is to show how to use
core PDFNet API to implement a very basic PDF to HTML converter (e.g. PDFNet
users that want to implement a custom import filer for their apps). For the
reasons outlined above, the sample was not designed to be a bullet-proof
solution nor it is meant to be used in production. Any behavior you see in
trial mode is what you'll see after licensing (except of course no trial mode
watermarking). For example, one of limitations is due to font substitution. In
PDF fonts are typically embedded, which guarantees accurate text reproduction.
In case of Pdf2Html sample text locations are correct, however in some cases
(where font match is not found) substituted font has larger advance widths
words can grow and start overlapping each other. You could verify this by
adjusting the font size in the converter (e.g. scaling it down 30% or more).
You could extract embedded fonts and normalize them to WOFF (a format
compatible with most browsers; for more info please see https://groups.google.com/d/topic/pdfnet-sdk/weHNRhmlvn4/discussion)
then use these 'web fonts' instead of default fonts. But there are many other
issues with plain PDF to HTML conversion that simply can't be worked around,
unless you are ok with a totally rasterized page. The goial of WebViewer
Development Platfrom is to solve this problem.
Having said this you could extend Pdf2Html sample with extra features. For
example, if you would like to preserve PDF links in HTML you would use PDFNet
annotation API (as shown in Annotation sample - http://www.pdftron.com/pdfnet/samplecode.html#Annotation)
to extract the link regions (annot.GetRect() -> Rect) and to add href URL to
an HTML DIV floating on top of the content underneath:
if (annot.GetType() == Annot.Type.e_Link) {
Action action = lk.GetAction();
if (action.GetType() == Action.Type.e_GoTo) {
Destination dest = action.GetDest();
if (dest.IsValid()) {
int
page_num = dest.GetPage().GetIndex();
System.Console.WriteLine(" Links to: page number {0:d} in this
document", page_num);
}
}
else if (action.GetType() == Action.Type.e_URI) {
string uri =
action.GetSDFObj().Get("URI").Value().GetAsPDFText();
}
}
----
> Do wealso need to purchase one of extra add-ons
+ In case you are happy with Pdf2Html sample and you are not using anything from 'pdftron.PDF.Converthttp://www.pdftron.com/pdfnet/html/classpdftron_1_1PDF_1_1Convert.html' namespace, the Core PDFNet API will suffice (i.e. you do not need to purchase any extra add-ons).
+ If you are planning to use the WebViewer API in PDFNet (i.e. pdftron.PDF.Convert.ToXod()) you would need to obtain a WebViewer Publisher Add-on license. Alternatively, as a potentially more cost effecitve option, you could use Cloud API (http://www.pdftron.com/pdfnet/cloud/started.html) instead of using/hosting PDFNet on your own servers.