New fonts/teminologies in Chinese

29 views

Skip to first unread message

Bo Tang

unread,

Feb 12, 2020, 4:59:06 AM2/12/20

to tesseract-ocr

Hello, everyone, I am doing OCR to detect sanned supplier certificate. On the image, there are Chinese simple and traditional and English languages. With standard OCR api, the accuracy is not high, since there are lots of noise, red/blue seal/circles, special terminologies on image. Pleas help me, experts.

For example: we need to extract the company name, address, valid date

Q1： how to do image preprocessing

Q2: how to extract the texts we need

Q3: if I use tesseract API, do I need to prepare teminologies to add to the language data

Thank you

Reply all

Reply to author

Forward

0 new messages