I am doing some conversion work, and studying the information I can
get page text retrieval both by looking at e_text Elements and also
TextExtractor.Words. What I find is that the bounding boxes vary
vertically. Text (e_text) Element bounding boxes are just a bit
higher on the page. But the X1 (left) sides of these bounding boxes
agree very closely.
When I find conditions most easily found via Text Elements I want to
capture the line and words within the same line as text (e_text)
elements. That is to use a clipping Rect on the TextExtractor which
I create from Y1 and Y2 of the Text Element bounding box to then in
turn get the line and words via the TextExtractor. But because of
the vertical shift it is not really working. If I could understand
why they are different I might be able to calculate a better
clipping Rect for the TextExtractor. (from the e_text Element(s))
Can you please explain?
--