Hi Peter,
Sorry for the lack of response, I think us regulars here are all
quite busy at the moment.
Have you searched the archives of this mailing list? I seem to
recall someone previously deciding to go with a different project
which focused just on MRZ recognition.
Tesseract will do a reasonable job, as you have found, but perhaps
a dedicated program could do even better (and for less effort on
your part).
As far as improving your Tesseract results, though, I'd recommend
looking into user_patterns. It isn't well documented, but if the
format you're expecting is predictable it should help. Also have you
set up a unicharambigs file? That may help a little too (not much,
but it's probably worth adding for the common cases of 5 -> S, 8 ->
B, etc).
> One more unrelated question. How to read data from image with non-standard
> orientation
> (upside down, rotated left/right by 90 degrees)? How to use OSD feature?
I confess I don't actually know. I think Tesseract might try to
guess this entirely by itself. Does anyone else here know any
better?
Once you're happy your MRZ training is as good as it will get, would
you be happy to have it added to the main Tesseract repository? If
so (and it'd be great if you were) open an issue on the bug tracker
with the training file, and add some comments to the top of
mrz.config about how it was created and where the source files for
it are (see my grc.traineddata for an example).
Thanks Peter, and sorry again for not getting back to you sooner,
Nick
P.S. One other thing I just thought of: is the DPI you're feeding
into Tesseract the same as the DPI you trained with (300)? Ideally
it should be. Also you're right to preprocess using thresholding;
Tesseract isn't particularly good at that step and it's much better
if you can do it first.
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to
tesser...@googlegroups.com
> To unsubscribe from this group, send email to
>
tesseract-oc...@googlegroups.com
> For more options, visit this group at
>
http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to
tesseract-oc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>