PDF Structure recogntion with PDFNet

46 views

LogicalStructurePDF2HTMLXMLconvertextract

Skip to first unread message

Support

unread,

Mar 3, 2014, 9:14:03 PM3/3/14

to pdfne...@googlegroups.com

We are trying to build an ontology based search engine. We are running into a problem where we need to identify some sections like tables headers, headings, footers etc. from a pdf document. Is there a way to accomplish this using pdftron?

--------

PDFNet could be used to implement structure recognition (or to extract existing structure, if available).

For some background on structure recognition please see:

http://blog.pdftron.com/2014/03/02/table-extraction-and-pdf-to-xml-with-pdfgenie/

PDFGenie technology described in the article is also available as a PDFNet SDK add-on (in 'pdftron.PDF.Convert.ToHtml()' when SetReflow() option is set ).

Reply all

Reply to author

Forward

0 new messages