my training fails

41 views
Skip to first unread message

Shavkat Sultanov

unread,
Jan 6, 2026, 3:15:00 PM (2 days ago) Jan 6
to tesseract-ocr
Hi there,


thanks in advance. 

this is what happens, when I try running the training script:

running_train_script.png
it is apparently failing to read the traineddata-file

I downloaded it from here:


my exact command to run it would be:

sudo make training RATIO_TRAIN=1.0 MODEL_NAME=gg_custom_1 DATA_DIR=./data GROUND_TRUTH_DIR=./data/gg_custom_1-ground-truth START_MODEL=eng TESSDATA=/usr/local/share/tessdata MAX_ITERATIONS=500


Screenshot 2026-01-06 185705.png
as you can see from this image, I am using tesseract version 5.5.2

my computer is running Ubuntu 24.04.3 but I tried windows before, failed aswell, but a little further down the process ... .
Screenshot 2026-01-06 190626.png

I have no clue why this is happenning. I would really like this to work though, because I have a particular problem, that is very monotone (easy, reading numbers off the screen, with the same font), but not being solved by the original eng.traineddata - model .

Please help! I can provide additional info, if you ask me. 

Thanks in advance! again.


Kind regards,
Shavkat Sultanov

Shavkat Sultanov

unread,
Jan 7, 2026, 2:17:40 AM (yesterday) Jan 7
to tesseract-ocr
Hi again, 


I wouldn't know, how this might help, but here is my (small just to test functionality) dataset:


I did everything exactly as it is said in your manual for tesseract 5.x.x .
this one:

Please help. 


Dear regards,
Shavkat Sultanov

Zdenko Podobny

unread,
Jan 7, 2026, 2:19:40 AM (yesterday) Jan 7
to tesser...@googlegroups.com
Hello,

what is output of 

ls -l /usr/Iocal/share/tessdata/eng. traineddata


Zdenko


ut 6. 1. 2026 o 21:14 Shavkat Sultanov <waldapo...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/0756c4e6-e0d1-4250-8764-80cbf9b94fefn%40googlegroups.com.

Shavkat Sultanov

unread,
Jan 7, 2026, 2:22:05 AM (yesterday) Jan 7
to tesseract-ocr
*except downgrading to a later version from github. I am not entirely sure, how to do that yet ... might need to try that next, but I am literally on this training task for days now, not getting it to run on any machine. I did try later version on windows before, but failed unfortunately aswell there. I could also try my mac aswell, but I fear I might not be technically experienced enough for that one aswell (in that regard). 

Shavkat Sultanov

unread,
Jan 7, 2026, 2:24:25 AM (yesterday) Jan 7
to tesseract-ocr
Screenshot 2026-01-07 082333.png

no such file or directory ...

Shavkat Sultanov

unread,
Jan 7, 2026, 2:26:17 AM (yesterday) Jan 7
to tesseract-ocr
Screenshot 2026-01-07 082509.png
if this helps ... 

it may be a my machine issue then? I don't know linux that well yet, I am sorry. 

Zdenko Podobny

unread,
Jan 7, 2026, 2:36:24 AM (yesterday) Jan 7
to tesser...@googlegroups.com
You mentioned:

But the github file has 14.7 MB, your has 195584 bytes... What did you download?


Zdenko


st 7. 1. 2026 o 8:26 Shavkat Sultanov <waldapo...@gmail.com> napísal(a):
Screenshot 2026-01-07 082509.png
if this helps ... 

it may be a my machine issue then? I don't know linux that well yet, I am sorry. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Shavkat Sultanov

unread,
Jan 7, 2026, 2:44:08 AM (yesterday) Jan 7
to tesseract-ocr
I did wget from here:


I did so again into another folder I will wite another message for that one . 

Thanks for the help so far. 

Shavkat Sultanov

unread,
Jan 7, 2026, 2:47:12 AM (yesterday) Jan 7
to tesseract-ocr
Screenshot 2026-01-07 084606.png
here I tried to put it somewhere else. 

if I should use another traineddata I can do so aswell of course. 


Kind regards,
Shavkat Sultanov

Shavkat Sultanov

unread,
Jan 7, 2026, 2:53:36 AM (yesterday) Jan 7
to tesseract-ocr
Hi again,


I tried this one aswell:

(sudo wget ...)

same output.
Thanks so far though. I really hope to resolve this issue, I need this to work. I think easy_ocr finetuning is very complicated. 


Shavkat Sultanov

Shavkat Sultanov

unread,
Jan 7, 2026, 2:55:35 AM (yesterday) Jan 7
to tesseract-ocr
when I put it somewhere else:


shavkat95@ubuntu:/mnt/tesseract/tessdata$ ls -l /mnt/tesseract/tessdata/eng.traineddata
-rw-r--r-- 1 root root 195484 Jan  7 07:35 /mnt/tesseract/tessdata/eng.traineddata
shavkat95@ubuntu:/mnt/tesseract/tessdata$ cd ..
shavkat95@ubuntu:/mnt/tesseract$ cd ..
shavkat95@ubuntu:/mnt$ ls
no_limit  tesseract  tesstrain  this_our_first_fr_tho
shavkat95@ubuntu:/mnt$ cd tesstrain/
shavkat95@ubuntu:/mnt/tesstrain$ sudo make training RATIO_TRAIN=1.0 MODEL_NAME=gg_custom_1 DATA_DIR=./data GROUND_TRUTH_DIR=./data/gg_custom_1-ground-truth START_MODEL=eng TESSDATA=/mnt/tesseract/tessdata MAX_ITERATIONS=500
You are using make version: 4.3
combine_tessdata -u /mnt/tesseract/tessdata/eng.traineddata data/eng/gg_custom_1
Failed to read /mnt/tesseract/tessdata/eng.traineddata
make: *** [Makefile:207: data/eng/gg_custom_1.lstm-unicharset] Error 1



(forgot to post this before, sorry)

Shavkat Sultanov

unread,
Jan 7, 2026, 3:29:12 AM (yesterday) Jan 7
to tesseract-ocr
Hi again Zdenko,


you are right though, about the file size. why does wget get me a smaller file and what do I do? if I download it on my windows, it gets me the 14.6 MB file. can I get it somehow from github to my linux, or do I have to download it here and transfer it over manually somehow?

Please help!


Kind regards,
Shavkat Sultanov

Shavkat Sultanov

unread,
Jan 7, 2026, 5:06:37 AM (yesterday) Jan 7
to tesseract-ocr
Hi again,


I copied over the actual eng-traineddata (14.6MB) file now. It does not have a problem with opening it anymore. It goes further down the training process but fails with the Python installation, it requires Pillow. My next problem is, when I use Python, I usually use a virtual environment, also because of this reason. I don't know how to edit the existing python installation, i.e.: adding libraries to it. I use Ubuntu. I might have to read up on that next I guess... .

Thanks so far though, copying the file manually worked. 


Kind regards,
Shavkat Sultanov
Reply all
Reply to author
Forward
0 new messages