When processing page as elements, we can discover clipping paths
applied to a group, which we know should clip other text elements
within.
My question is when using the text extractor, should we count on it
reflecting any clipping that may be in effect perhaps in the
TextExtractor.Line.GetBBox ? If not, is there a way I get find out
the clipping that should be applied to a give TextExtractor.Line, or
TextExtractor.Word ? If GetBBox reflects the effect of clipping I
can use text extractor to get what I need. Otherwise it forces me to
process elements.
In our client's PDF, the text elements within a clipped group are
wider than the presentation space allowed, so apparently the PDF
author (3rd party) is using a group clipping feature to keep
adjacent columns of text from overwriting each other. Thus all the
text is in text elements, but we have to truncate on the right when
we render with our proprietary render engine. We know how to clip
in our render engine, but we are not sure how to get the clipping
from the PDF page, especially when using text extractor.
--