Best Trained data for Non MRZ data

97 views
Skip to first unread message

Tintu Jacob

unread,
Aug 20, 2019, 11:49:17 AM8/20/19
to tesseract-ocr
Hi

We are trying to read national id card using tesseract and able to read mrz side of id card image. But unable to read non mrz side of card using this trained data. Could you please share which is the best trained data to read non mrz data from id cards?


Reagrds
Tintu

ElGato ElMago

unread,
Aug 20, 2019, 8:54:55 PM8/20/19
to tesseract-ocr
It isn't OCRB then.  Pick your local language from the best traineddata.  You can try that first.

2019年8月21日水曜日 0時49分17秒 UTC+9 Tintu Jacob:

Tintu Jacob

unread,
Aug 22, 2019, 2:46:33 AM8/22/19
to tesseract-ocr
Hi,
We tried below mentioned english trained data and ocr accuracy is less. Is any other best trained data tp read for English? Please suggest

https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata



Shree Devi Kumar

unread,
Aug 22, 2019, 7:54:28 AM8/22/19
to tesseract-ocr
Share a sample image.

If the rest of the ID is in similar type of font, try finetuning with it for all characters.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2ad98d39-857f-4455-a5fa-20c7d326c3c5%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Message has been deleted

Tintu Jacob

unread,
Aug 22, 2019, 3:42:44 PM8/22/19
to tesseract-ocr
Please find sample file
https://drive.google.com/file/d/0B93Vnm9ZxkpyUnd4V0VxdUg1QmV6NGVNMWVwcFpuWGxLVjE0/view?usp=drivesdk

And we r trying only kuwait civil id card and its ocr accuracy in non mrz page(front page ) is less.

ElGato ElMago

unread,
Aug 22, 2019, 8:25:36 PM8/22/19
to tesseract-ocr
What did you do and what was your result?

2019年8月23日金曜日 4時42分44秒 UTC+9 Tintu Jacob:

Tintu Jacob

unread,
Aug 26, 2019, 1:21:14 AM8/26/19
to tesseract-ocr
Please find the result obtained from
tesseract

Check Nationality,dob n expiry date

<<Civil FRONT OCR Result :- >>

STATE oF KUWAIT evi no = 5%

CDN 285031504457 wat hon

Pallas La; daa ~~
Name MOHAMMAD RAHAT
ABDUL KHALIQ
ON +L yaonaity IND SA a
< b Sex " Aa so i

Burth Date 1503/1985 SA fol
EpayDate {8/0412020 LST pL


Please find the code to do ocr in tesseract

Tesseract instance = new Tesseract();
// SET THE TESSDATA PATH
instance.setDatapath(tesseractPath);
instance.setOcrEngineMode(TessOcrEngineMode.OEM_LSTM_ONLY);
instance.setLanguage("eng");
instance.setPageSegMode(TessPageSegMode.PSM_AUTO);

instance.setTessVariable("load_freq_dawg", "true");
instance.setTessVariable("load_system_dawg", "true");
instance.setTessVariable("tessedit_char_whitelist","AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz0123456789/<");

data = instance.doOCR(image);

ElGato ElMago

unread,
Aug 26, 2019, 8:18:07 PM8/26/19
to tesseract-ocr
It's just an idea but tessdata_best seems to fit better than tessdata if you specify OEM_LSTM_ONLY.  Did you try that?

2019年8月26日月曜日 14時21分14秒 UTC+9 Tintu Jacob:
Reply all
Reply to author
Forward
0 new messages