New JPN_VERT traineddata (for 4.0)

142 views

Skip to first unread message

Seokbong Choi

unread,

Oct 15, 2018, 5:27:32 PM10/15/18

to tesseract-ocr

Hello all,

During 2 weeks, I trained JPN_VERT little bit further.

I included heart symbols, which are commonly used in Japanese comic books.

Whenever I tried to OCR, the entire sentence got weird. So, I got around the issue by training those symbols.

I also trained casual conversations more. The existing training set had too formal sentences.

I hope it useful for Japanese comic book fans.

I cannot provide eval data, but I am sure that this works better whenever I read Japanese comic books.

https://github.com/zodiac3539/pythontesseract/

Shree Devi Kumar

unread,

Oct 15, 2018, 6:14:33 PM10/15/18

to tesser...@googlegroups.com

Thank you for sharing.

It will be helpful if you add this info to the readme file in your github repo also.

Please share the training options that you used, number of fonts, iterations etc. It will be useful as reference .

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a445cfe4-f1de-453f-b9a5-ace89d36e67c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages