Caching in TrainLineRecognizer?

38 views

Skip to first unread message

Jens Weibler

unread,

Mar 5, 2017, 1:32:36 AM3/5/17

to tesseract-ocr

Hi,

I'm new to tesseract and wondered why the lstm dataset creation for the training process has to write the file again and again in TrainLineRecognizer. I've seen 200MB/s IO on the disk while creating the training data set.

As far I can see for the training case it would be sufficient to just load it once and write it at the end. The same applies to the box and tif file - but these are only read and not written...

Thanks,

Jens Weibler

Reply all

Reply to author

Forward

Message has been deleted

0 new messages