Questions about recognize Chinese characters

72 views
Skip to first unread message

易鑫

unread,
Apr 7, 2019, 10:42:44 PM4/7/19
to tesseract-ocr
Hello,everyone:

      Good day!I have trained a chi_sim model to recognize the Chinese characters.You can find the sample image in the attach file.

I find that the two Chinese characters are a little connected and the image is very very clear. But tesseract  regarded as one Chinese character ,so it got the wrong result.

I think tesseract 4.0.0 is based on LSTM + CTC. It can split the characters well. 
How to solve this problems?

Thanks in advance.Sorry for my poor English. 


67.png

易鑫

unread,
Apr 9, 2019, 2:54:11 AM4/9/19
to tesseract-ocr
Does some one know the reason? thanks.

易鑫 <yixinl...@gmail.com> 于2019年4月8日周一 上午10:42写道:
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6fdd08ad-20f8-4bf9-8ca9-0ef9829b7cde%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Shieh

unread,
Apr 10, 2019, 12:53:43 AM4/10/19
to tesseract-ocr
I get '焊接' with the following:
tesseract 67.png o -l chi_tra --oem 0 --psm 7

i'm using tesseract 4.1.0 64-bit build on windows 10, and traineddata from https://github.com/tesseract-ocr/tessdata

Shree Devi Kumar

unread,
Apr 10, 2019, 2:40:55 AM4/10/19
to tesser...@googlegroups.com
I think you will get better results with --oem 1.

The legacy models are better only in limited cases. For complex scripts the LSTM engine and models are better, as far as I can tell.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Aaron Shieh

unread,
Apr 10, 2019, 3:17:54 AM4/10/19
to tesseract-ocr
I tried using --oem 1 but the results are really bad, that's why I resorted to legacy mode. Do you have any luck with LSTM models?
Reply all
Reply to author
Forward
0 new messages