Great!
> In order to standardize the output from Cuneiform, I want to follow
> the standard as close as possible.
> Ocropus refers to this page for the standard:
> http://docs.google.com/View?docid=dfxcv4vc_67g844kf
>
> I have not been able to find any other spec so I suppose this is still
> the official standard (last update 2007).
Yes, that's the official document.
> Who would be the owner of the hocr spec?
I maintain it.
> Are any changes foreseen/planned?
No; most of the hard parts of OCR output formats (styles, fonts,
script-dependent issues) are taken care of by the HTML spec. hOCR
just describes how to denote OCR-specific information like bounding
boxes.
If there is something completely different you need (e.g.,
bibliographic markup, etc.), just use and/or define a separate
microformat to represent it.
If there is something engine-specific you need, pick an ocrx_... tag
that doesn't conflict with an existing one.
ocr_... tags are intended to represent engine-independent information,
so for that, it's probably a good idea to talk about it before picking
a new tag.
Tom