Training Tesseract LSTM from real images

605 views
Skip to first unread message

Stefan Weil

unread,
Aug 29, 2019, 8:39:41 AM8/29/19
to tesseract-dev
Training from real images is supported with `ocrd-train` (https://github.com/OCR-D/ocrd-train) for example. I'd prefer to have that code in the `tesseract-ocr` code base. See also this discussion:https://github.com/OCR-D/ocrd-train/issues/48.

What do you think? If there is consensus on that: What would be a good place for that code? Should it be part of the `tesseract` repository? Or better a new repository, maybe named `tesstrain`?

Zdenko Podobny

unread,
Aug 29, 2019, 2:16:37 PM8/29/19
to tesser...@googlegroups.com
IMO in tesseract-ocr project we should have parts/repositories/code that are supported with tesseract team. We make experience with OpenCL (users are interested to use it, but there is no developer for it)... So if the code will be supported/maintained by some active tesseract contributor, I have no problem with including it to project.

Advantage of extra repository for training scripts (bash and python) could be they can be develop and release without waiting on tesseract c++ development/release. So it depends on future plans of developers...

Zdenko


št 29. 8. 2019 o 14:39 'Stefan Weil' via tesseract-dev <tesser...@googlegroups.com> napísal(a):
Training from real images is supported with `ocrd-train` (https://github.com/OCR-D/ocrd-train) for example. I'd prefer to have that code in the `tesseract-ocr` code base. See also this discussion:https://github.com/OCR-D/ocrd-train/issues/48.

What do you think? If there is consensus on that: What would be a good place for that code? Should it be part of the `tesseract` repository? Or better a new repository, maybe named `tesstrain`?

--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/c3c7046d-e8b7-4888-a17c-83794ddad6c7%40googlegroups.com.

ShreeDevi Kumar

unread,
Aug 31, 2019, 5:56:00 AM8/31/19
to tesseract-dev
I think it is a good idea to make the ocrd-train script part of tesseract-ocr project.

A separate repo under the project with all training scripts will be a welcome addition.


On Thursday, August 29, 2019 at 11:46:37 PM UTC+5:30, Zdenko Podobný wrote:
IMO in tesseract-ocr project we should have parts/repositories/code that are supported with tesseract team. We make experience with OpenCL (users are interested to use it, but there is no developer for it)... So if the code will be supported/maintained by some active tesseract contributor, I have no problem with including it to project.

Advantage of extra repository for training scripts (bash and python) could be they can be develop and release without waiting on tesseract c++ development/release. So it depends on future plans of developers...

Zdenko


št 29. 8. 2019 o 14:39 'Stefan Weil' via tesseract-dev <tesser...@googlegroups.com> napísal(a):
Training from real images is supported with `ocrd-train` (https://github.com/OCR-D/ocrd-train) for example. I'd prefer to have that code in the `tesseract-ocr` code base. See also this discussion:https://github.com/OCR-D/ocrd-train/issues/48.

What do you think? If there is consensus on that: What would be a good place for that code? Should it be part of the `tesseract` repository? Or better a new repository, maybe named `tesstrain`?

--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Stefan Weil

unread,
Oct 3, 2019, 3:24:17 PM10/3/19
to tesseract-dev
In the meantime ocrd-train was moved to https://github.com/tesseract-ocr/tesstrain/.
Reply all
Reply to author
Forward
0 new messages