The most simple way to train tesseract 4.0

491 views
Skip to first unread message

Trong

unread,
Mar 29, 2019, 5:41:55 AM3/29/19
to tesseract-ocr
Hi friends,
I'm using Tesseract 4.0 to ocr some limit form (ID card, passport).
Currenly the result is 80% correct and I need to improve. (there are constan words in images but it didn't be corrected ex: Name, Date Of Birth..)
(It  take a lot of my time to try on windows, before I knewn Tess 4 trainning tool dose not support windows :(   )
I visited https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 to known how to train tesseract but i did not successfully.

If you have a same problem, please help me by sharing the most simple way to train tesseract 4.

Env: Ubuntu 18, Tesseract 4.0

Thank you

Kristóf Horváth

unread,
Mar 29, 2019, 8:32:34 AM3/29/19
to tesseract-ocr

Shanshan Wang

unread,
Mar 29, 2019, 11:13:55 AM3/29/19
to tesseract-ocr
Damn! I wish I could see this one week ago! Thank you very much for sharing this amazing tutorial!

Nitesh kc

unread,
Apr 13, 2019, 5:02:15 AM4/13/19
to tesseract-ocr
How are you planning to classify contents from (ID,passport)???

Trong

unread,
Apr 14, 2019, 5:59:09 AM4/14/19
to tesseract-ocr
My input also has param (to indicate ID card/passport). I just need to improve my result.
(Language in IDCard, passport is vie. The existed vie.trainedata dose not contain some fonts (ex: OcrB)

Vào 16:02:15 UTC+7 Thứ Bảy, ngày 13 tháng 4 năm 2019, Nitesh kc đã viết:

MANUSHWETA RAO

unread,
Jan 13, 2020, 3:06:28 AM1/13/20
to tesseract-ocr
hey can i ask you for the commands to train passport ocr
Reply all
Reply to author
Forward
0 new messages