Hi,
I'm using the Mercurial code and getting good results! One big
problem is that words seem to run together. (I'm putting the image
and text together with hocrtopdf)
"are in an informal setting in a conference room, we must"
is recognized as:
"areinaninformalset6nginaconferenceroom,wemust g"
See
http://hero.com/ken/trainme.pdf
1) Can I fix this with training?
2) How do I generate the different file types in
http://ocropus.googlegroups.com/web/lines.tgz to start training?
3) Is my problem related to the fact ground truth files, like
lines/0001/0080.gt.txt
don't contain spaces either? E.g.,
"SNL,andF.HarveyDove,PNL,fortheirsuggestions"
is image text that reads:
"SNL, and F. Harvey Dove, PNL, for their suggestions"
Thanks,
Ken