Hi Tom,
Thanks for having a look at this. The challenge is that I don't know which of those languages the title is using.
Let me remove pytesseract from the picture.
If I run tesseract title.jpg stdout --psm 7 --oem 1 -l eng+fra+spa+deu+ita+por+jpn+kor+rus+chi_sim+chi_tra it takes 0.9 second and returns the right title ("Advance Scout")
The title is in English.
If I run tesseract title.jpg stdout --psm 7 --oem 1 -l eng+fra+spa+deu it's faster (0,3s) and the title is still correct.
If I run tesseract title.jpg stdout --psm 7 --oem 1 -l eng+fra+spa+deu it's even faster (0.25) but the title is wrong ("AVEO Segue")
If I run tesseract title.jpg stdout --psm 7 --oem 1 -l eng it's crazy fast! (0,09s) but title is wrong again ("clyzinee Segue")
If I use just "deu" it's super fast and correct.
I can't batch the pictures as the client is waiting for the reply before sending the next one.
So I was thinking about running each of them in parallel. I'm able to get a reply in 300ms! Thats 3 times faster, and it gives me this:
clyzinee Segue
ANVanee Scout
AVEO EU
Advance Scout:
YAVanicc Sco
Advance So ui
eV2pe22)らの016
여00200606 20600ㄷ
Ао\алее Эсодиь
二司多5
和NOU2COCOUUE
But then I don't know which one I should take from those. I see the one from DEU is the good one. But I don't have a way to confirm that in the script.
So multiple questions here.
- Can tesseract work like a shell? I send a picture, I get the txt. I send a picture, I get the text. Without ever closing tesseract?
- Can I get the "confidence" level for each of those predictions? It might help to figure which one is the most probable?
Thanks,
JMS