Hi,
Could someone help me understand why I am getting the following error when using tesstrain with the START_MODEL option?
Failed to continue from: data/micr_ref/micr.lstm
data
├── micr-ground-truth
│ ├── micr-1.gt.txt
│ ├── micr-1.tif
│ ├── micr-2.gt.txt
│ └── micr-2.tif
└── micr_proto-ground-truth
├── micr.gt.txt
└── micr.tif
I am using what is in 'micr_proto-ground-truth' to build my proto model, which I then use as a START_MODEL for training the micr model from 'micr-ground-truth'.
More specifically, I issued the following commands from my tesstrain repo:
gmake tesseract-langdata
gmake proto-model MODEL_NAME=micr_proto
mkdir -p usr/share/tessdata
cp data/micr_proto/micr_proto.traineddata usr/share/tessdata
gmake training MODEL_NAME=micr START_MODEL=micr_proto
The final command fails with the following error:
Failed to continue from: data/micr_proto/micr.lstm
gmake: *** [Makefile:327: data/micr/checkpoints/micr_checkpoint] Error 1
Can anyone tell me what I am doing wrong?
Background Info
My ultimate goal is to train tesseract to OCR the MICR line from the bottom of check images with 99+% accuracy.
For my test/training set, I have more than 20K tif check images which I have cropped and cleaned using opencv to include only the bottom portion which contains the MICR line. I also have the gt.txt file for each cropped image.
I also tried using tesstrain directly as follows with my entire training set in the data directory:
qmake training MODEL_NAME=micr
but the resulting micr.traineddata yielded even worse results.
So now I am trying to build my proto model as described above using a single reference image, and then to use that as the START_MODEL for my training, but I am hitting the error I mentioned above.
Is my approach incorrect? If yes, can you please direct me? I am not finding the documentation extremely clear, so I obviously may be doing something stupid.
Thanks much for the help,
Keith
BTW, I am attaching the data.zip (contents of my data directory) in case someone wants to reproduce this.