Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Tesseract Training: Error 'Integer (fast) model' When Using Apex.lstm

44 views
Skip to first unread message

Mitya

unread,
Mar 21, 2025, 1:55:59 PMMar 21
to tesseract-ocr

I’ve been following this tutorial from YouTube: Guide to Tesseract Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its corresponding GitHub repository: astutejoe/tesseract_tutorial. https://github.com/astutejoe/tesseract_tutorial

The tutorial walks through the process of training a custom Tesseract model, but I've run into an issue when trying to continue training the model

What we tried: Setup: I followed the steps in the tutorial to set up the environment, downloaded the necessary files, and began the training process using the base eng.traineddata model.

Training Command: After preparing the training data and ground truth, I ran the following command to initiate the training:

make training MODEL_NAME=Apex START_MODEL=eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=100

Model Generation: This command successfully generated the Apex.lstm model file. However, I encountered an issue when trying to use the Apex.lstm file for further training.

Error: When attempting to continue training the model, I received the following error:Error, data/eng/Apex.lstm is an integer (fast) model, cannot continue training

**What we faced:**I have verified that the eng.traineddata file is located correctly in /usr/share/tesseract-ocr/5/tessdata/ (path may differ depending on installation).Despite following the tutorial and using the correct paths for the eng.traineddata,

I’m getting an error related to the model being an "integer model" and unable to continue training.I tried downloading the latest eng.traineddata from GitHub, but the error persists.

Questions: What does the "integer (fast) model" error mean, and how can I resolve it? Is there something I missed in the training process that would allow me to continue training Apex.lstm? Any advice or insights would be greatly appreciated. Environment: Tesseract version: 5.3.0 OS: Ubuntu 20.04 (MacBook Pro) Tesseract Data Path: /usr/share/tesseract-ocr/5/tessdata/Base Model: eng.traineddata Makefile: https://github.com/tesseract-ocr/tesstrain/blob/43ff10012af31914bb5b72304d9c21c8fdf4f464/Makefile

Thank you in advance for your help!


Zdenko Podobny

unread,
Mar 22, 2025, 2:59:37 PMMar 22
to tesser...@googlegroups.com
Hello,

I notice there may be some gaps in your understanding of Tesseract and its training requirements. Training Tesseract effectively requires careful adherence to its documentation and established processes. Proceeding without this foundation risks wasting both your time and ours. Anyway I put some notes below (inline with blue color) 

Kind regards,

Zdenko

pi 21. 3. 2025 o 18:56 Mitya <mityah...@gmail.com> napísal(a):

I’ve been following this tutorial from YouTube: Guide to Tesseract Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its corresponding GitHub repository: astutejoe/tesseract_tutorial. https://github.com/astutejoe/tesseract_tutorial

The tutorial walks through the process of training a custom Tesseract model, but I've run into an issue when trying to continue training the model

If the tutorial doesn't produce working results, you should contact its author.

What we tried: Setup: I followed the steps in the tutorial to set up the environment, downloaded the necessary files, and began the training process using the base eng.traineddata model.

Training Command: After preparing the training data and ground truth, I ran the following command to initiate the training:

make training MODEL_NAME=Apex START_MODEL=eng TESSDATA=../tesseract/tessdata MAX_ITERATIONS=100

Model Generation: This command successfully generated the Apex.lstm model file. However, I encountered an issue when trying to use the Apex.lstm file for further training.

What does the statement ' Model Generation: This command successfully ...' mean? Which command did you run? What is the Apex.lstm model file? Tesseract uses traineddata files for models, correct?"

Error: When attempting to continue training the model,

Could you describe how you attempted to continue training the model? Also, can you specify which part of the Tesseract documentation (https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html) or the tesstrain step (https://github.com/tesseract-ocr/tesstrain) you were referring to?"

I received the following error:Error, data/eng/Apex.lstm is an integer (fast) model, cannot continue training

**What we faced:**I have verified that the eng.traineddata file is located correctly in /usr/share/tesseract-ocr/5/tessdata/ (path may differ depending on installation).Despite following the tutorial and using the correct paths for the eng.traineddata,

Not sure what you try to communicate with this as you use  `../tesseract/tessdata` for training which seems to be a different location than `/usr/share/tesseract-ocr/5/tessdata/`.

I’m getting an error related to the model being an "integer model" and unable to continue training.I tried downloading the latest eng.traineddata from GitHub, but the error persists.

Questions: What does the "integer (fast) model" error mean, and how can I resolve it? Is there something I missed in the training process that would allow me to continue training Apex.lstm? Any advice or insights would be greatly appreciated. Environment: Tesseract version: 5.3.0 OS: Ubuntu 20.04 (MacBook Pro) Tesseract Data Path: /usr/share/tesseract-ocr/5/tessdata/Base Model: eng.traineddata Makefile: https://github.com/tesseract-ocr/tesstrain/blob/43ff10012af31914bb5b72304d9c21c8fdf4f464/Makefile

Thank you in advance for your help!


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/d09b45da-1e8a-4194-ad28-505857f0ad54n%40googlegroups.com.

ZeroCool Zero

unread,
May 7, 2025, 5:23:15 PMMay 7
to tesseract-ocr
You should use  eng.traineddata file from the tesseract "best" repository as your requirement

for that error you may use a wrong eng.traineddata file

ในวันที่ วันอาทิตย์ที่ 23 มีนาคม ค.ศ. 2025 เวลา 1 นาฬิกา 59 นาที 37 วินาที UTC+7 zdenop เขียนว่า:
Reply all
Reply to author
Forward
0 new messages