For the Chinese words, I found that binarization in tesseract makes really bad results.
I use -c tessedit_write_image=1 to get the result image from tesseract's binarization.
As attachments,
original
tess_bin -> tesseract binarize the original.png
my_bin -> my preprocessing to the original.png
tess_my_bin -> tesseract binarize the my_bin.png
You can find that some characters disappear.
Before I pass all the images to the tesseract, I want to use my own function (pre-processing) first.
But tesseract's binarization make result worse.
I want to handle the image preprocessing part by mysl
How can I disable tesseract's image preprocessing? ....Or the only chance to do this is to modify the source code?
Thanks!!