Traineed non unicode font with tesseract

45 views
Skip to first unread message

gopal bhalala

unread,
Apr 4, 2018, 2:55:50 PM4/4/18
to tesseract-ocr
Hi I am new in tesseract-ocr. I want trainned non unicode font using tesseract, I tried with to trained it with jTextboxeditor to trained that data but did not get any sucess.

LMG-ARUN.TTF

ShreeDevi Kumar

unread,
Apr 4, 2018, 9:55:08 PM4/4/18
to tesser...@googlegroups.com
Training tesseract is only supported using unicode fonts.

On Thu 5 Apr, 2018, 12:25 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi I am new in tesseract-ocr. I want trainned non unicode font using tesseract, I tried with to trained it with jTextboxeditor to trained that data but did not get any sucess.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dc1825db-ef94-4bfd-bb3e-9e98d11faf07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gopal bhalala

unread,
Apr 5, 2018, 2:34:03 PM4/5/18
to tesser...@googlegroups.com
Hi Shree,

Thanks for the quick response, is there any way to train non unicode font PDF AND IMAGE? 
i have non unicode pdf file and image for ocr shall i box it and assing the uniode font charcter is it right way to do non unicode pdf or image to OCR.

On 05-Apr-2018 7:25 AM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
Training tesseract is only supported using unicode fonts.

On Thu 5 Apr, 2018, 12:25 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi I am new in tesseract-ocr. I want trainned non unicode font using tesseract, I tried with to trained it with jTextboxeditor to trained that data but did not get any sucess.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Apr 5, 2018, 3:51:15 PM4/5/18
to tesser...@googlegroups.com
Are you trying to recognize the text from a pdf or image with non unicode font?

That is possible to do.

If you want to train using non-unicode font, that is not possible.

On Fri 6 Apr, 2018, 12:03 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi Shree,

Thanks for the quick response, is there any way to train non unicode font PDF AND IMAGE? 
i have non unicode pdf file and image for ocr shall i box it and assing the uniode font charcter is it right way to do non unicode pdf or image to OCR.
On 05-Apr-2018 7:25 AM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
Training tesseract is only supported using unicode fonts.

On Thu 5 Apr, 2018, 12:25 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi I am new in tesseract-ocr. I want trainned non unicode font using tesseract, I tried with to trained it with jTextboxeditor to trained that data but did not get any sucess.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

gopal bhalala

unread,
Apr 6, 2018, 2:35:01 AM4/6/18
to tesser...@googlegroups.com
Yes Shree. I am trying to recognized text from a PDF or image with non unicode font. I tried with make box and to do that but did not get sucess, Can you please give me any guidence on that how to do that?

Best Regards & Thanking you,
Gopal Dhanjibhai Bhalala

On Fri, Apr 6, 2018 at 1:20 AM, ShreeDevi Kumar <shree...@gmail.com> wrote:
Are you trying to recognize the text from a pdf or image with non unicode font?

That is possible to do.

If you want to train using non-unicode font, that is not possible.

On Fri 6 Apr, 2018, 12:03 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi Shree,

Thanks for the quick response, is there any way to train non unicode font PDF AND IMAGE? 
i have non unicode pdf file and image for ocr shall i box it and assing the uniode font charcter is it right way to do non unicode pdf or image to OCR.
On 05-Apr-2018 7:25 AM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
Training tesseract is only supported using unicode fonts.

On Thu 5 Apr, 2018, 12:25 AM gopal bhalala, <gopalb...@gmail.com> wrote:
Hi I am new in tesseract-ocr. I want trainned non unicode font using tesseract, I tried with to trained it with jTextboxeditor to trained that data but did not get any sucess.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Apr 6, 2018, 3:30:18 AM4/6/18
to tesser...@googlegroups.com

For Indian languages, use tesseract-4.0.0beta.1 
with the traineddata files from

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Reply all
Reply to author
Forward
0 new messages