Is the sentence, as it appears in the html the same textual entity as
when it appears in the PDF?
Larry's answer (at that time) was no - there are two textual entities.
That's fine - it does mean that there has to be a third thing which is
what is in common between the two.
That discussion got me thinking about what the common thing is. In
particular consider italicization and other marks of emphasis. Are
they to be considered an element of the common thing? One argument
would say no - it is just the ordering of words.
Countering that argument is that other marks, such as "!" seem to have
the same purpose - they are elements which carry over elements of
spoken intonation into the text.
What are your thoughts?
Suppose we agree that emphasis such as this should be consider an
element of text - does someone have a suggestion of a resource,
article, text book that would list such elements, ideally with an
international perspective so that we can capture the generalization
appropriately?
-Alan