Training - Output traneddata always have the same size as input

24 views

Skip to first unread message

Luan Fernandes

unread,

Apr 7, 2020, 8:26:33 AM4/7/20

to tesseract-ocr

Good morning everyone,

First of all I found a similar problem on this post, although the solutions didn't seem to help me:

https://groups.google.com/forum/#!msg/tesseract-ocr/O8EEFSSj7_I/aRCIzGbvAgAJ

So the question is, after various iterations on hundreds of pages, shound't the output traneddata size be diferent than the input? Mine is always the same. I'm training using my own set of images, here's what i'm doing:

1 - Create box files

2 - Create lstm models

3 - start lstm training using:

    lstmtraining \
	--model_output output/por \
	--continue_from   por.lstm \
	--traineddata  tesseract/tessdata/por.traineddata \
	--max_iterations 400\
	--train_listfile train/por.training_files.txt

4 - after training is complete:

    lstmtraining \
	--stop_training \
	--continue_from output/por_checkpoint \
	--traineddata tesseract/tessdata/por.traineddata \
	--model_output por_NEW.trainneddata

Am I doing something wrong? Or the trained files(input and result) should really have the same EXACTLY size?

Thanks in advance

Reply all

Reply to author

Forward

0 new messages