Best settings to OCR an image of some cyphered text (base64)

212 views
Skip to first unread message

Tom Vercauteren

unread,
Jul 10, 2025, 8:39:00 AMJul 10
to tesseract-ocr
Hi,
I was trying to OCR the text printed on a uniqlo T-shirt:
https://www.uniqlo.com/uk/en/products/E480814-000/00

The source image I used is attached. It looks quite clean to me but I am still facing issues to properly transcribe it.

Here is the command line I used:
tesseract akamai.png - -c load_system_dawg=false -c load_freq_dawg=false -c tessedit_char_whitelist=' 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#!/"$(-+<='

I am missing the single quote character (') in my whitelist but wasnn't sure how to provide this. In any case, there are still some obvious mistakes ("PEACE FOR ALL" for example becomes "BEACENFORNALD").

Are there better settings to use for such a use case?

Best wishes,
Tom
akamai.png

Tom Morris

unread,
Jul 13, 2025, 6:46:34 PMJul 13
to tesseract-ocr
Since no one else has replied, I'll offer a couple of suggestions.

On Thursday, July 10, 2025 at 8:39:00 AM UTC-4 tom.ver...@gmail.com wrote:
I was trying to OCR the text printed on a uniqlo T-shirt:
https://www.uniqlo.com/uk/en/products/E480814-000/00

Why? Would it be more cost effective to just have it double/triple-keyed and compare the transcriptions? 
 
The source image I used is attached. It looks quite clean to me but I am still facing issues to properly transcribe it.

You don't say what pre-processing you did. Did you remove all the orange? Anything else?
 
I am missing the single quote character (') in my whitelist but wasnn't sure how to provide this.

That's a basic shell quoting issue that the documentation for your shell should cover.
 
Are there better settings to use for such a use case?

Maybe? But there may also be much more efficient ways to crack this particular nut than using OCR.

Tom

Tom Morris

unread,
Jul 13, 2025, 6:51:07 PMJul 13
to tesseract-ocr
But there may also be much more efficient ways to crack this particular nut than using OCR.

Such as a little searching on the web:


Tom
 

Graham Toal

unread,
Jul 14, 2025, 2:39:57 AMJul 14
to tesser...@googlegroups.com
Indeed. took me 30 seconds to find it (and that was just from typing in the first line and deciding if it was a 1 or an l ....)



Tom Vercauteren

unread,
Jul 14, 2025, 3:09:29 AMJul 14
to tesseract-ocr
> Why? Would it be more cost effective to just have it double/triple-keyed and compare the transcriptions? 

Indeed. took me 30 seconds to find it (and that was just from typing in the first line and deciding if it was a 1 or an l ....)


But there may also be much more efficient ways to crack this particular nut than using OCR.

Such as a little searching on the web:


 
Thanks for the reply. Yes, I did find these existing transcriptions. I was more interested in understanding how to get this to work with an OCR pipeline.
 

> You don't say what pre-processing you did. Did you remove all the orange? Anything else?

I provide the command line I used and the image to reproduce the issue. As shown there, no pre-processing yet.


> That's a basic shell quoting issue that the documentation for your shell should cover.

Thanks. I thought so, I was just flagging that I knew this wasn't yet covered in te command line I provided.


Are there better settings to use for such a use case?

> Maybe? But there may also be much more efficient ways to crack this particular nut than using OCR.

That's what I am interested in. If not OCR or human transcription, what would you suggest?


Best wishes,
Tom


 

Tom Morris

unread,
Jul 16, 2025, 5:33:53 PMJul 16
to tesseract-ocr
On Monday, July 14, 2025 at 3:09:29 AM UTC-4 tom.ver...@gmail.com wrote:

Are there better settings to use for such a use case?

> Maybe? But there may also be much more efficient ways to crack this particular nut than using OCR.

That's what I am interested in. If not OCR or human transcription, what would you suggest?

As you discovered an internet search is the most efficient way to handle this use case. 
If you're trying to extend this to a different use case, please describe it.

Best,
Tom 

Fly Night Society

unread,
Jul 17, 2025, 2:35:06 PMJul 17
to tesseract-ocr
I already have, and yes to all.
Reply all
Reply to author
Forward
0 new messages