Hi Dan,
It looks like the superscript characters are rendered as regular
characters but smaller and in an offset position. An alternative for
the PDF author would be to use unicode superscript characters, but
sadly it seems they haven't.
The standard text extraction in pdf-reader attempts to layout the
characters as plain text, where unfortunately there's no way to
differentiate size, so I don't think you'll be able to detect these
superscript characters.
It would be possible to build an alternative text extract algorithm
that examines the size and position of each character to identify
superscript. As a starting point, I'd suggest creating a custom
version of this class: lib/pdf/reader/page_text_receiver.rb
James
> --
> You received this message because you are subscribed to the Google Groups
> "PDF::Reader" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
pdf-reader+...@googlegroups.com.
> To post to this group, send email to
pdf-r...@googlegroups.com.
> Visit this group at
https://groups.google.com/group/pdf-reader.
> For more options, visit
https://groups.google.com/d/optout.