Training

1 view
Skip to first unread message

kenberland

unread,
Jun 26, 2009, 11:57:56 AM6/26/09
to ocropus
Hi,

I'm using the Mercurial code and getting good results! One big
problem is that words seem to run together. (I'm putting the image
and text together with hocrtopdf)

"are in an informal setting in a conference room, we must"

is recognized as:

"areinaninformalset6nginaconferenceroom,wemust g"

See http://hero.com/ken/trainme.pdf

1) Can I fix this with training?

2) How do I generate the different file types in

http://ocropus.googlegroups.com/web/lines.tgz to start training?

3) Is my problem related to the fact ground truth files, like

lines/0001/0080.gt.txt

don't contain spaces either? E.g.,

"SNL,andF.HarveyDove,PNL,fortheirsuggestions"

is image text that reads:

"SNL, and F. Harvey Dove, PNL, for their suggestions"

Thanks,
Ken
Reply all
Reply to author
Forward
0 new messages