Hi David,
Not sure if you are a member of the IIIF Slack (
http://bit.ly/iiif-slack) but there was a discussion on this topic recently in the #cookbook and #accessibility channels:
To try to summarise, the question was whether an HTML `alt` attribute on an image was the appropriate place to put the text in the image - where that image is OCR of a printed book page, or a transcription of a manuscript, rather than (for example) a photograph of a street scene that might contain some text in signage. IIIF's most common use case appears to fall into a bit of a gap when trying to follow guidance to describe the image for accessibility, because the significant text is not a description of the image but (usually) the text in the image, and any description of the image itself ("text of page 137") is not, in this case, the main attraction.
I think there are two recipes here. The first is one of IIIF modelling - how are transcriptions conveyed? How are captions, for AV, conveyed? These are works in progress; the first is well-established.
The second recipe is the HTML representation of these IIIF models - how should that OCRed text be represented in HTML to make it most accessible to screen readers and other accessibility mechanisms, bearing in mind that those mechanisms cannot be expected to know anything about IIIF?
This second question does not have a clear answer, and it needs one! All the pieces are in place to make accessible billions of words of digitised text out there, already modelled as IIIF resources; what should happen at the last step, where the IIIF model is transformed to HTML? Both for simple HTML representations, and representations created by UV, Mirador and other viewers; what's the best thing they can offer up to assistive technologies for the text of all those books and manuscripts?
Tom