New fonts/teminologies in Chinese

29 views
Skip to first unread message

Bo Tang

unread,
Feb 12, 2020, 4:59:06 AM2/12/20
to tesseract-ocr
Hello, everyone, I am doing OCR to detect sanned supplier certificate. On the image, there are Chinese simple and traditional and English languages. With standard OCR api, the accuracy is not high, since there are lots of noise, red/blue seal/circles, special terminologies on image. Pleas help me, experts. 
For example: we need to extract the company name, address, valid date 
Q1: how to do image preprocessing
Q2: how to extract the texts we need
Q3: if I use tesseract API, do I need to prepare teminologies to add to the language data

Thank you 

01.jpg



Reply all
Reply to author
Forward
0 new messages