Tesseract Custom Model Not Recognized after Training

497 views
Skip to first unread message

demian kim

unread,
Sep 17, 2023, 11:41:45 AM9/17/23
to tesseract-ocr

Body:

Hello Tesseract Community,

I am facing a challenge with my custom-trained Tesseract model, and I'm hoping for some guidance on resolving this issue.

Background:

  1. I've successfully trained a custom model (ocrtensor.traineddata).
  2. The training finished without any error and I've copied the generated .traineddata file to /usr/share/tesseract-ocr/4.00/tessdata/.
  3. I'm trying to use this model in a Jupyter Notebook container with the pytesseract Python package.

Problem:

Even though the model was working fine previously, I am now encountering an error when trying to use the model. The error suggests that Tesseract can't initialize with the custom model:

vbnetCopy code
TesseractError: (1, "Error: LSTM requested, but not present!! Loading tesseract. Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/ocrtensor.traineddata!! Failed loading language 'ocrtensor' Tesseract couldn't load any languages! Could not initialize tesseract.")

Steps Tried:

  1. Ensured the Tesseract version compatibility (using version 4).
  2. Checked file permissions (even tried with chmod 777).
  3. Restarted Jupyter Notebook container multiple times.
  4. Tried executing Tesseract from the terminal directly.
  5. Made sure the TESSDATA_PREFIX environment variable is set correctly.
  6. Tried Tesseract with logging enabled for additional error details.

I'm unsure why the model suddenly isn't recognized when it was working just a while ago. If anyone has insights or suggestions on what might be going wrong, I would greatly appreciate it.

Thank you for your assistance.

Ali hussain

unread,
Sep 17, 2023, 1:11:36 PM9/17/23
to tesseract-ocr
You can try in VietOCR once and check the traineddata right now is not corrupted. if works in VietOCR then possible problem in your code.

Des Bw

unread,
Sep 17, 2023, 2:54:00 PM9/17/23
to tesseract-ocr
One possibility is that you used the fast model as starter model. You need to continue or start from the best model. 

Zdenko Podobny

unread,
Sep 18, 2023, 8:19:06 AM9/18/23
to tesser...@googlegroups.com
Unfortunately you hid all important information (e.g. how did you run training? how did you run tesseract (including tesseract options, exact command or code,...)? , so just some hints:
Error: LSTM requested, but not present!!
This implies that the requested traineddata file does not contain needed LSTM components.

Loading tesseract. Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/ocrtensor.traineddata!!
This implies that the requested traineddata file does not contain needed legacy components.

I never saw these 2 messages together. Typically people either follow some old outdated tutorial and train tesseract legacy components or train for LSTM engine (without legacy components), but ask tesseract to use legacy engine...
Based on this I guess your ocrtensor.traineddata is not a valid tesseract file.

Zdenko


ne 17. 9. 2023 o 17:41 demian kim <dem...@datalift.co.kr> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/eac448cf-79f3-4b41-9400-397710fb43c7n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages