Text search / extraction in the converted XPS document

67 views
Skip to first unread message

Support

unread,
Feb 26, 2010, 1:30:18 PM2/26/10
to PDF2XPS
Q: In the output document (XPS format):
1-When trying to select text in paragraph the selection behave in a
strange way like selecting columns of characters not continues text.
2-When trying to search for a string in the result xps the order of
the search result is totally wrong

------------
A: It is possible that these issues are more related to the XPS viewer
(consumer) than PDF2XPS (http://www.pdftron.com/pdfnet) - . In PDF and
XPS text may be rendered in different ways and the ordering may not
make much sense. For example, there is no requirement that text needs
to be laid down top to bottom, right to left. It is possible that XPS
viewer you are using is not on par with current PDF viewers in terms
of their text search. A proper text extractor for XPS would need to
sort the text in Y/X order and reconstruct words/lines based on their
spatial information.

Reply all
Reply to author
Forward
0 new messages