Small script to generate all boxes for ocrd-train

86 views
Skip to first unread message

Lorenzo Bolzani

unread,
Sep 18, 2019, 6:38:21 AM9/18/19
to tesser...@googlegroups.com

Hi,
I wrote this small script to speed up OCRD-train training startup.

It generates the boxes for all the images provided on the command line (it works only for single line images).

It is a simple conversion of the generate_line_box.py from ocrd-train. I used it once, it seems to work fine.

Currently with OCR-D the boxes and lstmf generation is very slow because it starts a new process for each image.

I execute this script before calling the makefile.

I do the "shell expansion" in python so that it can handle a very long list of files.

So you need to call it in this way:

python generate_all_line_boxes.py -i 'data/train/*.tif'

with single quotes to prevent shell expansion.


BTW, it would be nice to have the same thing for the lstmf files.



Bye

Lorenzo

generate_all_line_boxes.py

Shree Devi Kumar

unread,
Sep 18, 2019, 7:07:48 AM9/18/19
to tesseract-ocr
Please submit as a  PR to https://github.com/tesseract-ocr/tesstrain

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwTnogqM0C1jk69QKX3hMFvk7nuMJLYAbvw%2BsL%3DZdsQcA%40mail.gmail.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages