What would be the best resolution (dpi) and recommended size of each characters to train in Tesseract

111 views
Skip to first unread message

Naga raja

unread,
Mar 12, 2019, 12:47:00 AM3/12/19
to tesseract-ocr
Hi All,

As we are working in some ancient scripts, even after training lots of data we found that the accuracy is very very less.

We would like to know what is the recommended box size of each character and what would be the best resolution to train in tesseract.

Thanks in Advance.

Shree Devi Kumar

unread,
Mar 12, 2019, 1:00:49 AM3/12/19
to tesser...@googlegroups.com
1. Which ancient scripts are you trying to train?

2. Are you trying to train base tesseract (3.0x) or LSTM tesseract?

3. Are you using synthetic traineddata using text2image or scanned images?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ad7658a8-fabf-4632-a290-5b7a7d97680b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Naga raja

unread,
Mar 12, 2019, 9:54:44 PM3/12/19
to tesseract-ocr
Thanks for the response. Following are the response to your questions.
Basically we had doubt on sizes while training a new script sample in LSTM.

On Tuesday, March 12, 2019 at 2:00:49 PM UTC+9, shree wrote:
1. Which ancient scripts are you trying to train?

Tamil scripts which present in inscriptions 

2. Are you trying to train base tesseract (3.0x) or LSTM tesseract?

.LSTM tesseract 
 
3. Are you using synthetic traineddata using text2image or scanned images?

Scanned Images.

Shree Devi Kumar

unread,
Mar 13, 2019, 12:32:53 AM3/13/19
to tesser...@googlegroups.com
Inscriptions will be like training for handwriting.

I have not tried any LSTM training with scanned images. 


You can try experimenting with different network specs based on the type of input you have. 





For more options, visit https://groups.google.com/d/optout.

irtmem intellect

unread,
Mar 13, 2019, 7:44:58 AM3/13/19
to tesseract-ocr
Hi All,

In the same lines, we have a doc with 12 px characters, which DPI is recommended for tesseract OCR in this case?
The language is English, there are no tables etc in the mage, basically it is text plus signature, 

Thanks in advance
Reply all
Reply to author
Forward
0 new messages