> What is the recommended procedure for manually correcting cseg.gt.png
> files? Is there a utility that I am overlooking?
There isn't one yet; we've been working on it.
> When generating text for training images, should this include spaces?
Yes; however, the space handling in OCRopus is currently inconsistent
so that the spaces are ignored.
> My overall procedure : I have spent some time training ocropus on a
> custom font, images from JPGs. I am using the following methods :
>
> 1) Generate a variety of single line training images programatically
> 2) Manually type the text contained in each training image
If you generate it, why not save the text?
> 3) Places these in a directory training/0000 or training/0001 etc
> 4) run ocropus lines2fsts training
> 5) replace the generate txt files with my txt files and run ocropus
> align training to generate cseg.png
> 6) run ocropus trainseg on training to generate a new model file
> 7) goto 1 using the new training model
If you can write a script that takes a text file and font and
generates a book directory full of binary line images, corresponding
csegs, and corresponding Unicode strings, that would be useful.
Tom