Hi there,
If I have a text rectangle and/or object, received either through FPDFText_GetRect or FPDFPage_GetObject, is there a reliable way to find out its start & end character indexes? The kind one could pass into FPDFText_GetText/FPDFText_GetFillColor and similar methods.
I could iterate through all the rectangles/objects on the page and keep track of character count (index += len(rectText);) but I'm not sure if it would account for all the corner cases. (E.g. ligatures, modifier symbols, chinese/japanese characters, or some niche pdf features I never heard of) I'm concerned that I'd make a solution that works 97% of the time and fails badly on less common documents.
And a follow up question. Are FPDFText_GetRect rectangles always the same as text objects received through FPDFPage_GetObject? That seems to be the case in my testing (after discarding 0-length text objects), but I want to make sure I don't make wrong assumptions.
Kind regards,
Nikita