You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hi,
I'm new to tesseract and wondered why the lstm dataset creation for the training process has to write the file again and again in TrainLineRecognizer. I've seen 200MB/s IO on the disk while creating the training data set.
As far I can see for the training case it would be sufficient to just load it once and write it at the end. The same applies to the box and tif file - but these are only read and not written...