MRZ/MRP (Machine-readable zone/passport) dataset for tesseract v4

2,963 views
Skip to first unread message

Mamadou

unread,
May 27, 2019, 1:38:11 AM5/27/19
to tesseract-ocr
Hello,

We have open sourced (BSD license) MRZ/MRP (Machine-readable zone/passport) dataset and models for Tesseract v4.
The dataset contains more than #7 thousands images (.tif) with ground truth (.gt.txt) from Google image augmented with few synthetic data.
It's ready to be used to train with Tesseract v4.
If you're lazy and don't want to train the models by yourself then, try the ones under tessdata_best (float-model) or tessdata_fast (int-model) folders.

Accuracy: 99.7%

Regards,

Lorenzo Bolzani

unread,
May 29, 2019, 4:08:53 AM5/29/19
to tesser...@googlegroups.com
Hi Mamadou,
this sounds very interesting. How did you do the training and accuracy measurements? What parameters did you use for the model?


Thanks, bye

Lorenzo

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a92ec47e-5055-4ffe-a174-f437d3c7ccf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mamadou

unread,
May 29, 2019, 5:12:57 PM5/29/19
to tesseract-ocr
Hello Lorenzo,
We're fine tuning en.traineddata without modifications with charset restriction within [A-Z0-9]. We're using the default parameters and the model converges very fast.
We have #1376 images from Google image used to test the accuracy. The reported accuracy is min(detector, recognizer). These #1376 images can't be directly used with tesseract and requires a detector and preprocessor.


On Wednesday, May 29, 2019 at 10:08:53 AM UTC+2, Lorenzo Blz wrote:
Hi Mamadou,
this sounds very interesting. How did you do the training and accuracy measurements? What parameters did you use for the model?


Thanks, bye

Lorenzo

Il giorno lun 27 mag 2019 alle ore 07:38 Mamadou <diopm...@doubango.org> ha scritto:
Hello,

We have open sourced (BSD license) MRZ/MRP (Machine-readable zone/passport) dataset and models for Tesseract v4.
The dataset contains more than #7 thousands images (.tif) with ground truth (.gt.txt) from Google image augmented with few synthetic data.
It's ready to be used to train with Tesseract v4.
If you're lazy and don't want to train the models by yourself then, try the ones under tessdata_best (float-model) or tessdata_fast (int-model) folders.

Accuracy: 99.7%

Regards,

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Shree Devi Kumar

unread,
May 30, 2019, 12:22:15 PM5/30/19
to tesser...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages