Hello, everyone, I am doing OCR to detect sanned supplier certificate. On the image, there are Chinese simple and traditional and English languages. With standard OCR api, the accuracy is not high, since there are lots of noise, red/blue seal/circles, special terminologies on image. Pleas help me, experts.
For example: we need to extract the company name, address, valid date
Q1: how to do image preprocessing
Q2: how to extract the texts we need
Q3: if I use tesseract API, do I need to prepare teminologies to add to the language data
Thank you
