Hi shree,
1 - The results are the same with --oem 0 or --oem 1
2 - No, it's very similar. I saw this and was because of this I decided to ask if it's necessary to train the same lang with other fonts. Or I need to do something with the files in lang data, like copy to my installation?
3 - I use the sample.jpg (attached) and after I convert the image with this command: convert -density 300 sample.jpg -background white -compress none -colorspace Gray test.tif
After: tesseract --oem 3 test.tif output -l por
And the output(attached) is the text extracted from tesseract, as you can see my name Maicon doesn't appear. How I can provide the truth data? txt?
4 - I attached the files
On Wednesday, May 17, 2017 at 6:46:08 AM UTC-3, shree wrote:
1. Which --oem are you using with tesseract 4, legacy engine or lstm?
--oem 0 or --oem 1
3. Provide a sample image with it's ground truth and point out the errors in it. Is the image at 300 dpi?
4. Please share the box/tiff pair to test for training.
Hello!
Guys I have tesseract 4 on Ubuntu 16.04.
Running the tesseract with -l por (portuguese from Brazil) I don't have the good results. The image use other font than the trained data (I think).
My question is. It's necessary to train tesseract again? I created the tif and box file with jtesseditor but I don't what I need to do with these files and how to write a good training data. I sow the
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 but I didn't found any case similar with mine.
Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.