--I have a weird niche project here, essentially I have about 4,000 images, each with 2 numbers between 0 and 127.I've tweaked the images in a million different ways and I can't get tesseract to recognized individual numbers, with the exception of 2, all other 1 digit numbers are not recognized.Also, for some reason if I use tesseract directly I get way worse results, whereas if I convert to pdf first and use ocrmypdf, which apparently uses tesseract, I get WAY better results, which I don't understand.The font is very straight-forward I think, so I'm not sure if training would be helpful, but I'm open to the idea if needed.Here are the sample images I'm using for testing, before and after I modified them:Before: https://imgur.com/a/PhjWXXKAfter: https://imgur.com/a/sCRE67SOkay some of them failed to upload but that's the gist.Thanks,Jack
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7be5ed42-df44-4530-b7a2-0d0fa340918e%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7be5ed42-df44-4530-b7a2-0d0fa340918e%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/934d89f8-a455-4787-8d8d-8986cc615059%40googlegroups.com.
Ah, now I see it has something to do with the way you doctored the images, I get the same output as you did when I ran your pic through. So what's the secret?
On Saturday, August 31, 2019 at 11:24:23 AM UTC-5, Jack wrote:I have a weird niche project here, essentially I have about 4,000 images, each with 2 numbers between 0 and 127.I've tweaked the images in a million different ways and I can't get tesseract to recognized individual numbers, with the exception of 2, all other 1 digit numbers are not recognized.Also, for some reason if I use tesseract directly I get way worse results, whereas if I convert to pdf first and use ocrmypdf, which apparently uses tesseract, I get WAY better results, which I don't understand.The font is very straight-forward I think, so I'm not sure if training would be helpful, but I'm open to the idea if needed.Here are the sample images I'm using for testing, before and after I modified them:Before: https://imgur.com/a/PhjWXXKAfter: https://imgur.com/a/sCRE67SOkay some of them failed to upload but that's the gist.Thanks,Jack
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bca34cd1-5092-4601-bd98-ee3e3d2aa5ab%40googlegroups.com.