Tesseract 4 not reading Arabic numbers accurately using custom trained data file

1,362 views
Skip to first unread message

Mobeen Ali

unread,
Sep 25, 2019, 9:44:42 AM9/25/19
to tesseract-ocr
Current Behavior:
I've followed the wiki and details given in the wiki/Training Tesseract - 4.00. There were no errors in creation of the traineddata file. 

I wanted to create my own ara_custom.traineddata file specifically to read dates in arabic, so it has "٠١٢٣٤٥٦٧٨٩" (0-9 numeric characters in arabic) with a "/" forward slash only.

The format for arabic date is:
٢٠١٩/٠٩/٢٥
yyyy/mm/dd

My ara.training_text file is: attached as ara.training_text.txt (for uploading only else i use the file without txt extension)

My ara.wordlist file is:  attached as ara.wordlist.txt (for uploading only else i use the file without txt extension)

Text in image: ٢٠٠٩/١١/١٢ (32.jpg)
Tesseract reads: ٢٤٠٩/١١/١٢ (32.txt)

Text in image: ١٩٧٩/٠١/٢٨ (24.jpg)
Tesseract reads: ١٦٩٧٦    //٠١//٧٢٨ (24.txt)

Text in image: ٢٠١٥/١١/٢٢ (12.jpg)
Tesseract reads: ٢٠١٥/١١/٧٢ (12.txt)

What i observed is I've issue in my training_text file. I've attached the file above. Please guide me for this error as i have failed to find any solution myself.

P.s. I've studied the Hallucination effect also which is given in the wiki and tried to implement it as i understood, but no luck.

Béchir Gmati

unread,
Sep 27, 2019, 3:29:11 AM9/27/19
to tesser...@googlegroups.com
hi plz i have this error when i execute the command line of combine-lang-model  how i can fix it
Capture.JPG
Capture1.JPG
 --
   GMATI Béchir
   Élève Ingénieur Business Intelligence & Big Data
  


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6878322a-c591-481b-b0d8-0befd76cbd22%40googlegroups.com.

Shree Devi Kumar

unread,
Sep 27, 2019, 4:01:11 AM9/27/19
to tesseract-ocr



--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

nourhan magdy

unread,
May 15, 2020, 5:27:36 AM5/15/20
to tesseract-ocr
how can i use this text file? i downloaded ara folder and coppied it to my tessdata but it didnt work
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Piyush Chandra

unread,
May 15, 2020, 8:20:46 AM5/15/20
to tesseract-ocr
You need to put the radical stroke file in your script_dir folder.

nourhan magdy

unread,
May 15, 2020, 10:48:43 PM5/15/20
to tesser...@googlegroups.com

It's a text file not trained, I put it in the scripts file but it didn't work
Returns empty
And the same with ara characters returns empty or incorrect answer


To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7064e149-0f32-4072-8f50-9101ba341a51%40googlegroups.com.

write2...@gmail.com

unread,
Oct 27, 2020, 6:02:06 AM10/27/20
to tesseract-ocr
Do anyone still have Arabic date traineddata ?
pls share the tesseract weight files if anyone able to extract attached example img.

date format:
٢٠١٩/٠٩/٢٥


dob.jpg
doi.jpg
Reply all
Reply to author
Forward
0 new messages