Training Tesseract Arabic/Hindi Digits using JTessBoxEditor in window 10

688 views
Skip to first unread message

Marwa M. Khan

unread,
Jul 19, 2018, 5:52:24 AM7/19/18
to tesseract-ocr
Hello, 

   I am trying to train the Tesseract 4.0 with LTSM on  Arabic/Hindi Digits in windows OS. I found that I need to create box file. Thus, I'm using JTessBoxEditor 2.0 for creating tiff and box files. However, it fails when I used JTessBoxEditor 2.0  to generate the .traindata file. Note that I choose combine_tessdata.exe as tesseract executable, ara.arial.exp0.box as training data, and training with existing box as a training mode. 


The output is the followings:

esseract Open Source OCR Engine v4.0.0-beta.1-108-gf291 with Leptonica
Page 1
Bad box coordinates in boxfile string! ١ ٤٥٤ ٣١٦٣ ٤٦٣ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٢ ٤١٣ ٣١٦٣ ٤٢٨ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٣ ٣٧٣ ٣١٦٣ ٣٩٣ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٤ ٣٣٨ ٣١٦٣ ٣٥٠ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٥ ٢٩٨ ٣١٦٨ ٣١٤ ٣١٨٥ ٠

Bad box coordinates in boxfile string! ٦ ٢٥٨ ٣١٦٣ ٢٧٣ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٧ ٢١٩ ٣١٦٣ ٢٣٨ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٨ ١٨٠ ٣١٦٣ ٢٠٠ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٩ ١٤٥ ٣١٦٣ ١٥٩ ٣١٩٠ ٠

Bad box coordinates in boxfile string! ٠ ١٠٩ ٣١٦٧ ١١٧ ٣١٧٨ ٠

Bad box coordinates in boxfile string! ١ ٤٥٤ ٣٠١٥ ٤٦٣ ٣٠٤٢ ٠

Bad box coordinates in boxfile string! ٢ ٤١٣ ٣٠١٥ ٤٢٨ ٣٠٤٢ ٠

Bad box coordinates in boxfile string! ٣ ٣٧٣ ٣٠١٥ ٣٩٣ ٣٠٤٢ ٠

Bad box coordinates in boxfile string! ٤ ٣٣٨ ٣٠١٥ ٣٥٠ ٣٠٤٢ ٠

Bad box coordinates in boxfile string! ٥ ٢٩٨ ٣٠٢٠ ٣١٤ ٣٠٣٧ ٠

Bad box coordinates in boxfile string! ٦ ٢٥٨ ٣٠١٥ ٢٧٣ ٣٠٤٢ ٠
 

Could you please tell me where I did wrong or how to fix this error? 


Best Regards, 
Marwa M. Khan  

chandra churh chatterjee

unread,
Jul 19, 2018, 5:54:46 AM7/19/18
to tesser...@googlegroups.com
Bad box error might be due to the images that you are using to train in jtess box editor .
Check the resolution of the images.

Chandra Churh Chatterjee

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1fa30bc6-17ab-4eda-b124-e400057afe3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

m....@t2.sa

unread,
Nov 25, 2018, 3:09:33 AM11/25/18
to tesseract-ocr
Hi Marwa M. Khan 

Have you generated any tessdataa for arabic-indian number ?

I'm trying to generate one but JTessBoxEditor  does not take arabic-indian numbers, how to fix it ?

Soufiane Sabiri

unread,
Feb 25, 2019, 10:57:05 AM2/25/19
to tesseract-ocr
Have you had any luck training tesseract for arabic letters or numbers?

Shree Devi Kumar

unread,
Feb 25, 2019, 11:10:52 AM2/25/19
to tesser...@googlegroups.com

You can try the new traineddata from  there alongwith the PR https://github.com/tesseract-ocr/tesseract/pull/2266

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages