Small script to generate all boxes for ocrd-train

86 views

Skip to first unread message

Lorenzo Bolzani

unread,

Sep 18, 2019, 6:38:21 AM9/18/19

to tesser...@googlegroups.com

Hi,

I wrote this small script to speed up OCRD-train training startup.

It generates the boxes for all the images provided on the command line (it works only for single line images).

It is a simple conversion of the generate_line_box.py from ocrd-train. I used it once, it seems to work fine.

Currently with OCR-D the boxes and lstmf generation is very slow because it starts a new process for each image.

I execute this script before calling the makefile.

I do the "shell expansion" in python so that it can handle a very long list of files.

So you need to call it in this way:

python generate_all_line_boxes.py -i 'data/train/*.tif'

with single quotes to prevent shell expansion.

BTW, it would be nice to have the same thing for the lstmf files.

Bye

Lorenzo

generate_all_line_boxes.py

Shree Devi Kumar

unread,

Sep 18, 2019, 7:07:48 AM9/18/19

to tesseract-ocr

Please submit as a PR to https://github.com/tesseract-ocr/tesstrain

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwTnogqM0C1jk69QKX3hMFvk7nuMJLYAbvw%2BsL%3DZdsQcA%40mail.gmail.com.

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Reply all

Reply to author

Forward

0 new messages