ocropus trainseg my.model out
which took all day but finally produced a model. From the log:
[info] updateModel 236200 samples, 6600 features, 127 classes
[info] updateModel memory status 1755 Mbytes, 1558 Mvalues
[info] training content classifier
[info] [mapped 123 to 53 classes]
[info] mlp training n 47020 nc 53
[info] mlp round 0 err 0.0198 nhidden 80
...
[info] mlp round 7 err 0.0112 nhidden 159
[info] training junk classifier
[info] mlp training n 231200 nc 2
[info] mlp round 0 err 0.0042 nhidden 50
...
[info] mlp round 7 err 0.001 nhidden 23
[info] trained 53140 characters, 2430 lines
[warn] 35120 old csegs
[info] saving my.model
Also in the log were a ton of "transcript doesn't agree with cseg
(transcript 4, cseg 25)" type messages.
But since I had a model, I thought things were ok. Then I ran:
debug=info,transcript cmodel=my.model ocropus lines2fsts out
but every single line in the log read like:
[warn] skipping out/train/0001/0001 (CHECK ocr-line/glclass.cc:1743
Training incomplete for all classes)
I checked out that source location and it's in the LatinClassifier
class!
Three questions:
1. What do those error messages from trainseg mean? How can I get
training to complete?
2. Is lines2fsts correct in using LatinClassifier? I expected
MlpClassifier.
3. Am I doing this right?
Thank you.