I've got a project I'm involved in where we're looking at trying to automate extracting text in a PDF document where markup (annotations done via highlighting) is made. I had a read of the PDF spec, and it seems that text is on one layer and annots are on another. I can extract text, I can extract the annots and any text commenting that is done using those, but the only thing that looks like it might work is the coordinates of the highlighting that was done. However, I'm unsure how I would go about mapping the X,Y coordinates of the highlight to the text.
Has anybody every tried to do this before? If so, do you have any pointers or code samples you could share on how to do this?