make training does nothing when run

58 views
Skip to first unread message

Keith M

unread,
Jan 8, 2021, 2:12:42 AM1/8/21
to tesseract-ocr
I'm sure I'm making a beginner mistake here, but I'm struggling quite a bit.

I've built straight from source, both version 4.1.1 and 5.0.0 on Ubuntu 18.04, and Ubuntu 20.04(fresh install, never used, but properly updated). All exhibit the same behavior. I installed all the dependencies following the build/installation guides. No error during the build that I can see.

"make training" and "make training-install" both succeed when run initially. Clearly it's building and finishing without error.

At this point, all I'm trying to do is train using the example here:


using groundtruth files.

After placing the groundtruth files in a folder called data/foo-ground-truth inside the main tesseract repo folder, I unzip the .TIFs and .gt.txt's.

When either "make training MODEL_NAME=foo" is run nothing happens. It just returns almost instantly and does nothing. 4.1.1 goes through directories and then says there's nothing that needs done. 5.0.0 reports "make: Nothing to be done for 'training'."

Also tried incanting as such " make training MODEL_NAME=<MODEL_NAME> START_MODEL=eng PSM=7 TESSDATA=/usr/local/share/tessdata"

Same result.

I'm clearly doing something wrong here. I must not have the files in the right directory. I've tried putting data/foo-ground-truth in the root, I tried putting it in tessdata inside the root folder, I tried putting it in /usr/local/share/tessdata.

eng.trainneddata has been copied to the tessdata folder.

There's something obvious I'm doing wrong, but heck if I can find it.....

Help!@#

Keith


Shree Devi Kumar

unread,
Jan 8, 2021, 3:12:36 AM1/8/21
to tesseract-ocr
>After placing the groundtruth files in a folder called data/foo-ground-truth inside the main tesseract repo folder, 

  data/foo-ground-truth  needs to be under the tesstrain folder not tesseract folder.

You can use ground-truth in a different location, in that case you have to refer to it while calling make.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6238fb08-4631-43f0-8e32-29ebb0c8c0f4n%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Keith

unread,
Jan 8, 2021, 10:35:13 AM1/8/21
to tesser...@googlegroups.com
Shree,

Thank you for your reply. I should have gone to bed (it was like 2 AM my time on a work night) instead of continuing to bang my head.

When I saw your message this morning, I was thinking, "What tesstrain folder? There's no tesstrain folder in the repo." Which was exactly when it occurred to me that tesstrain is a separate repo and needs checked out individually.

All is well. It's working.

The phrase "tesstrain" doesn't show up on any of the (4) Compiling and Installation pages. There's lots of mention about installing the dependencies to support training, but no mention about actually installing it.

Do you think that's worthy of filing an issue?

I'm probably not the only bonehead out there.

Thanks,
Keith

Shree Devi Kumar

unread,
Jan 8, 2021, 11:25:53 AM1/8/21
to tesseract-ocr
The original training scripts in tesseract repo is `tesstrain.sh` and all training tutorials refer to that.

Make based `tesstrain` repo is a later addition and tesseract documentation has not been updated for it. 

You can contribute by creating a PR to add missing info regarding training for the `tessdoc` repo. 

Max Richey

unread,
Jan 8, 2021, 1:34:00 PM1/8/21
to tesser...@googlegroups.com, kmo...@gmail.com
Keith,

Thank you so much for this. You are not alone.  You have just clued me in.  I am about ready to start my first training run.  Then I saw this in my email box.

You may be a life saver for doing this.  How are we supposed to know these things if the docs are not updated.  After looking inside my own tesseract folder (I have tesseract 5.00 on Ubuntu 18.04), I don't even see the training subfolder that I expected to see.  

When you cloned the tesstrain repo, where did you place the tesstrain folder?  Is it a subfolder inside of the ~/tesseract folder itself, or does it stand alone outside of the tesseract folder structure?

Thanks again for doing this.  I would have been going bonkers within a few days without the clue-in.

Max Richey


Max Richey

unread,
Jan 8, 2021, 1:59:10 PM1/8/21
to Keith M, tesser...@googlegroups.com
That is perfect.  I also see that you are taking advantage of OCR-D.  I firmly believe that true intelligence is revealed in the formation of the right question more than in the one who reserves the answer.

Thank you for sharing.  You have made my life much easier by suffering the pain for yourself.  Not only are you clearly intelligent, you are also willing to spare others the same measure of pain.  That's top notch in my book.  Integrity is valued far above specific knowledge.  Keep the faith.

Much respect,

Max

On Fri, Jan 8, 2021 at 11:47 AM Keith M <kmo...@gmail.com> wrote:
Max,

Glad to hear I have company in bone-heads-anonymous. :)

I just placed tesstrain directly in the tesseract folder.

root@ubuntu:/home/<redacted>/tesseract/tesstrain/data/foo-ground-truth#
ls | head

(contents from ocrd-testset.zip
<https://github.com/tesseract-ocr/tesstrain/blob/master/ocrd-testset.zip>
extracted below)

alexis_ruhe01_1852_0018_022.box
alexis_ruhe01_1852_0018_022.gt.txt
alexis_ruhe01_1852_0018_022.lstmf
alexis_ruhe01_1852_0018_022.tif
alexis_ruhe01_1852_0035_019.box
alexis_ruhe01_1852_0035_019.gt.txt
alexis_ruhe01_1852_0035_019.lstmf
alexis_ruhe01_1852_0035_019.tif
alexis_ruhe01_1852_0087_027.box
alexis_ruhe01_1852_0087_027.gt.txt

Hope that helps

Keith
>     <shree...@gmail.com <mailto:shree...@gmail.com>> wrote:
>
>         >After placing the groundtruth files in a folder
>         called/data/foo-ground-truth/inside the main/tesseract /repo

>         folder,
>
>         data/foo-ground-truth  needs to be under the tesstrain folder
>         not tesseract folder.
>
>         You can use ground-truth in a different location, in that case
>         you have to refer to it while calling make.
>
>         On Fri, Jan 8, 2021 at 12:42 PM Keith M <kmo...@gmail.com
>         <mailto:kmo...@gmail.com>> wrote:
>
>             I'm sure I'm making a beginner mistake here, but I'm
>             struggling quite a bit.
>
>             I've built straight from source, both version 4.1.1 and
>             5.0.0 on Ubuntu 18.04, and Ubuntu 20.04(fresh install,
>             never used, but properly updated). All exhibit the same
>             behavior. I installed all the dependencies following the
>             build/installation guides. No error during the build that
>             I can see.
>
>             "make training" and "make training-install" both succeed
>             when run initially. Clearly it's building and finishing
>             without error.
>
>             At this point, all I'm trying to do is train using the
>             example here:
>
>             https://github.com/tesseract-ocr/tesstrain
>             <https://github.com/tesseract-ocr/tesstrain>
>
>             using groundtruth files.
>
>             After placing the groundtruth files in a folder called
>             /data/foo-ground-truth/ inside the main /tesseract /repo
>             <mailto:tesseract-oc...@googlegroups.com>.

>             To view this discussion on the web visit
>             https://groups.google.com/d/msgid/tesseract-ocr/6238fb08-4631-43f0-8e32-29ebb0c8c0f4n%40googlegroups.com

>
>
>
>         --
>
>         ____________________________________________________________
>         भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>         <http://bhajans.ramparivar.com>
>         --
>         You received this message because you are subscribed to the
>         Google Groups "tesseract-ocr" group.
>         To unsubscribe from this group and stop receiving emails from
>         it, send an email to
>         tesseract-oc...@googlegroups.com
>         <mailto:tesseract-oc...@googlegroups.com>.

>         To view this discussion on the web visit

>
>     --
>     You received this message because you are subscribed to the Google
>     Groups "tesseract-ocr" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to tesseract-oc...@googlegroups.com

>     To view this discussion on the web visit
Reply all
Reply to author
Forward
0 new messages