Unfortunately the tool is trying to import PyXML, which is a dead
project [1]. This is filed as issue #2 in the hocr-tools bug tracker
[2]. The second fork you mention is actually a fork of my fork, within
which I had ported both hocr-check and hocr-combine to use lxml, a more
widely used Python XML library. Unfortunately, I do not believe that
anybody has ported the remaining modules to use a recent library (but I
have not looked at all the forks).
It should be a manageable amount of work to port everything, one module
at a time, to lxml or another library. There are a number of such
libraries listed at [3], but it would be nice to standardize on one and
have the entire code base use it for consistency. So I think the next
steps forward are:
1. Collect the code from the various forks, pulling all existing
improvements into a common repository;
2. Create a wiki page where people vote on which tools they would like
to see ported;
3. Encourage volunteers to fork the repository and port any of the
existing tools to lxml (or whatever library we standardize on).
4. Any time somebody is generous enough to do such volunteer work, pull
it into the common repository as quickly as possible.
What do people think?
Cheers,
Jim
[1]
http://georgik.sinusgear.com/2011/01/10/dead-project-warning-pyxml-does-not-work-with-python2-6/
http://mail.python.org/pipermail/xml-sig/2010-November/012245.html
[2]
https://code.google.com/p/hocr-tools/issues/detail?id=2
[3]
http://wiki.python.org/moin/PythonXml