I would like to improve the Thai numbers.

Tang Adorable

unread,

May 23, 2023, 1:00:03 AM5/23/23

to tesseract-ocr

I followed this tutorial: https://youtu.be/KE4xEzFGSU8.

I want to train Thai numbers and improve the accuracy of reading Thai numbers from images without modifying the existing good quality Thai text characters in `tha.traineddata` provided by tesseract-ocr/tessdata_best.

Which data do I need to modify or which part of the script should I change to train specifically for Thai numbers?
------------------------------------------------------------------
Here are the steps I have tried:

I prepared the data in the folder named "xx-ground-truth."
The samples in the folder consist of 4 files with the following extensions:
tha_0.box
tha_0.gt.txt
tha_0.tif
tha_0.lstmf
For the next training steps, what should I do? Could you please provide some guidance?
1684808748333.jpg

Thank you so much..

Zdenko Podobny

unread,

May 23, 2023, 2:01:11 AM5/23/23

to tesser...@googlegroups.com

Please follow official training:

https://github.com/tesseract-ocr/tesstrain

Zdenko

ut 23. 5. 2023 o 6:59 Tang Adorable <tangc...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fdbdec1a-23f5-4cef-9742-9b56e1f882b8n%40googlegroups.com.

Tang Adorable

unread,

May 24, 2023, 10:51:25 PM5/24/23

to tesseract-ocr

I want to merge font1 and font2 training together as How does xxx..traineddata work?

ในวันที่ วันอังคารที่ 23 พฤษภาคม ค.ศ. 2023 เวลา 13 นาฬิกา 01 นาที 11 วินาที UTC+7 zdenop เขียนว่า:

Reply all

Reply to author

Forward