The most simple way to train tesseract 4.0

Trong

unread,

Mar 29, 2019, 5:41:55 AM3/29/19

to tesseract-ocr

Hi friends,

I'm using Tesseract 4.0 to ocr some limit form (ID card, passport).

Currenly the result is 80% correct and I need to improve. (there are constan words in images but it didn't be corrected ex: Name, Date Of Birth..)

(It take a lot of my time to try on windows, before I knewn Tess 4 trainning tool dose not support windows :( )

I visited https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 to known how to train tesseract but i did not successfully.

If you have a same problem, please help me by sharing the most simple way to train tesseract 4.

Env: Ubuntu 18, Tesseract 4.0

Thank you

Kristóf Horváth

unread,

Mar 29, 2019, 8:32:34 AM3/29/19

to tesseract-ocr

I recommend this guide https://docs.google.com/document/d/1qDqbnlptcCPVIvMOHwfNws-CQat-llZLOTHC6S94Vec/edit

Shanshan Wang

unread,

Mar 29, 2019, 11:13:55 AM3/29/19

to tesseract-ocr

Damn! I wish I could see this one week ago! Thank you very much for sharing this amazing tutorial!

Nitesh kc

unread,

Apr 13, 2019, 5:02:15 AM4/13/19

to tesseract-ocr

How are you planning to classify contents from (ID,passport)???

Trong

unread,

Apr 14, 2019, 5:59:09 AM4/14/19

to tesseract-ocr

My input also has param (to indicate ID card/passport). I just need to improve my result.

(Language in IDCard, passport is vie. The existed vie.trainedata dose not contain some fonts (ex: OcrB)

Vào 16:02:15 UTC+7 Thứ Bảy, ngày 13 tháng 4 năm 2019, Nitesh kc đã viết:

MANUSHWETA RAO

unread,

Jan 13, 2020, 3:06:28 AM1/13/20

to tesseract-ocr

hey can i ask you for the commands to train passport ocr

Reply all

Reply to author

Forward