Q:
We found that there is coordinates difference between element.GetTextMatrix() & TextExtractor. I hope theoretically there should not be any difference in the coordinate values.
Output generated using element.GetTextMatrix():
X: 133.731
Output generated using TextExtractor:
x1: 134.044824
A:
The bounding box returned by the text extractor is as tight a bounding
box as PDFNet can calculate. The first character is draw slightly after
the X coordinate of the text matrix, and thus the bounding box from
text extractor begins there. Thus its X coordinate is slightly larger
than the X coordinate from the text matrix.
GetTextMatrix() returns so called Text Matrix (as documented in PDF Specification – Text space details:
http://xodo.com/view/#/c0c11968-ee14-478e-9b09-6dc5635c0915).
TextExtractor bbox (or element.GetBBox()) is concatenation of text
matrix (element.GetTextMatrix()) , Current Transformation Matrix
(element.GetCTM()), and number of other test state parameters in the
graphics state.