tesseract via gosseract returns empty text for one image, but CLI detects correctly ("NO SMOKING")

10 views
Skip to first unread message

Harshit Goel

unread,
Oct 31, 2025, 1:45:08 PM (2 days ago) Oct 31
to tesseract-ocr

Hi team

I’m facing an issue where Tesseract OCR works correctly from the CLI, but returns an empty string when called programmatically using Go (via gosseract).

For this particular image: https://pmi-api.ubconnex.ca/files/icons/2025-03/11c6051eec503f52c43f0de382980d31.png, the OCR always returns an empty string when running programmatically. Yet when I run the exact same image manually using Tesseract from terminal by command: tesseract /tmp/ocr-3678469497.png stdout

It correctly detects and returns NO SMOKING

Environment

I've tried with the following approaches but still no effect:
  • Different PSM modes (SPARSE_TEXT, SINGLE_BLOCK, etc.)

  • Preprocessing (grayscale, contrast enhancement, flattening transparency).

  • Verified that the image file is saved correctly and readable by Tesseract.

  • Tried increasing image size and contrast.

Is there any known discrepancy between the CLI binary and the gosseract API in how page segmentation modes or image preprocessing are handled internally?

Any insight on why Tesseract detects text in CLI but gosseract binding returns empty output would be very helpful.

Best Regards,

Harshit Goel

Ger Hobbelt

unread,
Nov 1, 2025, 6:39:28 PM (2 days ago) Nov 1
to tesser...@googlegroups.com
I expect you're in for a debug session.

I do not use Go, so here's just a few general tidbits:

- you tested with the tesseract CLI. Excellent! So that proves things can go well at the core; one major problem area less to worry about.
- next is the gosseract library/layer itself: how does it talk to tesseract, what does it pass (and what doesn't it), etc.: from a very swift glance at the code, there's nothing blatantly obviously wrong in their bindings.cpp, AFAICT. Haven;t looked any further than that.
- my own usage of tesseract as a library has shown me that getting the parameters right can be a bit of a hassle sometimes; one of the potential failure modes is not noting that tesseract does not receive the same config baseline setup as when it ran via CLI: this is where debugging is mandatory.

My first guess would be to make very sure your tesseract config files are loaded the same way. While that can be a bit harsh to do when you're not comfortable with running this stuff in a debugger, here's a preparation step I would definitely look at if I were you: 
1. tesseract via your Go code doesn't produce *anything*, while
2. tesseract CLI does deliver text ("No smoking")
which MAY be due to tesseract not finding any text word bounding-boxes when run via the Go-code route.

I see they (gosseract) present a GetBoundingBoxes API, so I would first try to run that one to see if I get any boxes at all, and if any, where they are in the image (i.e.: do I get: (a) no boxes, (b) only get gibberish boxes only or (c) at least the ones covering "NO" and "SMOKING", or what? Then try the same for the CLI (IIRC vanilla tesseract has an option to cough up bboxes only; haven't used that in a while and I'm running a customized tesseract here, so check code and documentation, don't take me at my word!)

To see what I was looking at:

If the bounding boxes don't show up in your Go run, then it smells like a config/setup bit not making it into the tesseract engine, so it's debugging the gosseract bindings.cpp interlayer to see what happens, really. Are CLI and Go code really, really pointing at the same config search paths, for example?
If the bounding boxes show up and match the set in the CLI, we have a serious conundrum.

Either way, that's the road I'd travel if walking in your shoes.
(If you can debug-step the tesseract CLI the same way, you can more easily compare both, perhaps, as the CLI is using the same APIs gosseract is using (with some differences, but my current bet is those are not relevant). 

Also monitor the gosseract/tesseract run for error and warning messages from tesseract, as well. If it is silent, maybe force it once to barf a hairball, just so you know the error/warning/info outputs are working. Whatever you do, my bet is you have some debugging on the road ahead.

Note: I don't do Go, so haven't used gosseract. This would be my general tactic though, anyway.


Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages