unicharset is not returning anything

34 views
Skip to first unread message

Dicol Oliviu

unread,
Jun 10, 2023, 1:13:57 PM6/10/23
to tesseract-ocr
I want to train my tesseract 4.1 for a new font, when I reached the step to create a unicharset file, it is just not returning anything

I am not sure if this is a problem, but I noticed that my "Box Coordinates" and "Box Data" are different in the jTessBoxEditor. I tried to rewrite everything manually, but then I got an error:
PS D:\Bots\AccountReseter\TesseractTraining> tesseract train.my.exp0.tif train.my.exp0 box.train
Tesseract Open Source OCR Engine v4.1.0-elag2019 with Leptonica
Page 1
Bad box coordinates in boxfile string! . 14 27 7 7 0
Bad box coordinates in boxfile string! 0 56 6 21 29 0
Bad box coordinates in boxfile string! 2 157 6 20 28 0
Bad box coordinates in boxfile string! 3 2i0 6 20 29 0
Bad box coordinates in boxfile string! 4 264 6 22 28 0
Bad box coordinates in boxfile string! 5 319 6 21 29 0
Screenshot_1.png
Screenshot_1.png
Screenshot_1.png

Zdenko Podobny

unread,
Jun 11, 2023, 10:05:08 AM6/11/23
to tesser...@googlegroups.com
Hello,
  1. Version 4.x is old, outdated, and unsupported. Use the current tesseract version (5.3.1)
  2. Which official training procedure do you follow?
  3. Do you intentionally try to train the legacy engine (I assume based on your box file)? BTW: Legacy training was broken and it is fixed in 5.3.1
Zdenko


so 10. 6. 2023 o 19:14 Dicol Oliviu <lif...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1097ca67-9b03-4ac8-9c1c-b651fe0e5220n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages