sara waheed
unread,Jan 27, 2024, 5:19:42 AM1/27/24Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
I am trying to read the passport mrz string from the image i am using Tesseract and OpenCV for image processing i have tried three different ways none of them worked
**Attempt 1**
I have this image when i do ocr on it teseract read as
IDAUT10000999<6<<<<<<<<<<<<<<<
7109094F1112315AUT<<<<<<xcc<<6
MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
which is incorrect it treats <<< as x or c or k when I use the `mrz-java` library to read the details from the string it gives the following error
[error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 IDAUT10000999<6<<<<<<<<<<<<<<<
[error] 7109094F1112315AUT<<<<<<xcc<<6
[error] MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
[error] at 24-25,1: Invalid character in MRZ record: x
**Attempt 2**
then I converted the image to grayscale and binarized it using `OpenCV` Here is the below code
val roiImagePath = "src/main/resources/ocr/passport/two-page-passport-mrz-detected.jpeg"
val grayScaleROI = new Mat()
val roiImage = Imgcodecs.imread(roiImagePath)
Imgproc.cvtColor(roiImage, grayScaleROI, Imgproc.COLOR_BGR2GRAY)
val roiGaryImagePath = "src/main/resources/ocr/passport/two-page-passport-mrz-detected-gray.jpeg"
Imgcodecs.imwrite(roiGaryImagePath, grayScaleROI)
val binary = new Mat()
Imgproc.adaptiveThreshold(grayScaleROI, binary, 255, Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY , 15, 25)
val roiBinaryImagePath = "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary.jpeg"
Imgcodecs.imwrite(roiBinaryImagePath, binary)
val tesseract = new Tesseract()
tesseract.setDatapath("/usr/share/tesseract-ocr/4.00/tessdata")
tesseract.setVariable("user_defined_dpi", "600")
val result = tesseract.doOCR(new File(roiBinaryImagePath))
val mrzStr = result.replace(" ", "")
println(s"two page passport mrz string is: "+mrzStr)
it created the following binary image
and the code output is
tesseract reads mrz string from the binary image as
IDAUT1DODD999<E<KK<KKKKEKEKEK
7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
MUSTERFRAUSKISOLDEKKKKKKKKKKK
and `mrz-java` reads the string and generates the following error
[error] Error parsing MRZ string: Failed to parse MRZ null IDAUT1DODD999<E<KK<KKKKEKEKEK
[error] 7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
[error] MUSTERFRAUSKISOLDEKKKKKKKKKKK
[error] at 0-0,0: Different row lengths: 0: 29 and 1: 30
**Attempt 3**
then I resized the image
Val width = 1000 // Increase width proportionately (adjust based on your needs)
val height = (width * binary.rows()) / binary.cols() // Maintain aspect ratio
val resizedRoiImage = new Mat()
Imgproc.resize(binary, resizedRoiImage, new Size(width, height), 0.0, 0.0, Imgproc.INTER_NEAREST)
val resizedImageROIPath = "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary-resized_image.jpg"
Imgcodecs.imwrite(resizedImageROIPath, resizedRoiImage)
mrz string read by Tesseract
TOAUTIOOOOIISKhcceccccddddddce
FIOPOSAFIFESSISAUTReececeececs
MUSTERFRAUCCKISOLDECKccccdcddd
and the error is
[info] 15:54:04.200 633 [main] MrzParser INFO - Check digit verification failed for document number: expected 0 but got h
[error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 TOAUTIOOOOIISKhcceccccddddddce
[error] FIOPOSAFIFESSISAUTReececeececs
[error] MUSTERFRAUCCKISOLDECKccccdcddd
[error] at 15-16,0: Invalid character in MRZ record: c
can anyone please help how I read the text properly also I have tried one regex to convert c or k back to <<< it did not work either if anyone can suggest some workaround or any improvement in code please help me with that thanks
two-page-passport-mrz-detected-binary-resized_image.jpg
two-page-passport-mrz-detected-gray.jpeg
two-page-passport-mrz-detected.jpeg
two-page-passport-mrz-detected-binary.jpeg