On Monday, November 12, 2012 1:35:58 AM UTC+1, MattJ wrote:
The line basically says that if there is a space in the transcription, there shouldn't be a corresponding set of pixels in the segmentation. I'm not sure why this is happening, but if it happens rarely, it's probably safe to skip such lines. All you care about with alignment is to get a large amount of training data.
ocropus-align implements Viterbi alignment. In the long term, we'll probably move to forward-backward training, which tends to be better behaved.
OCRopus 0.7 will contain a new recognizer based on recurrent neural networks; training that is much simpler and may be a better match to your needs.
Tom