tesseract via gosseract returns empty text for one image, but CLI detects correctly ("NO SMOKING")

Harshit Goel

unread,

Oct 31, 2025, 1:45:08 PMOct 31

to tesseract-ocr

Hi team

I’m facing an issue where Tesseract OCR works correctly from the CLI, but returns an empty string when called programmatically using Go (via gosseract).

For this particular image: https://pmi-api.ubconnex.ca/files/icons/2025-03/11c6051eec503f52c43f0de382980d31.png, the OCR always returns an empty string when running programmatically. Yet when I run the exact same image manually using Tesseract from terminal by command: tesseract /tmp/ocr-3678469497.png stdout

It correctly detects and returns NO SMOKING

Environment

OS: Linux (Server)
Tesseract version: tesseract 5.x (CLI works fine)
Go binding: github.com/otiai10/gosseract/v2
Go version: go1.23.x

I've tried with the following approaches but still no effect:

Different PSM modes (SPARSE_TEXT, SINGLE_BLOCK, etc.)
Preprocessing (grayscale, contrast enhancement, flattening transparency).
Verified that the image file is saved correctly and readable by Tesseract.
Tried increasing image size and contrast.

Is there any known discrepancy between the CLI binary and the gosseract API in how page segmentation modes or image preprocessing are handled internally?

Any insight on why Tesseract detects text in CLI but gosseract binding returns empty output would be very helpful.

Best Regards,

Harshit Goel

Ger Hobbelt

unread,

Nov 1, 2025, 6:39:28 PMNov 1

to tesser...@googlegroups.com

I expect you're in for a debug session.

I do not use Go, so here's just a few general tidbits:

- you tested with the tesseract CLI. Excellent! So that proves things can go well at the core; one major problem area less to worry about.

- next is the gosseract library/layer itself: how does it talk to tesseract, what does it pass (and what doesn't it), etc.: from a very swift glance at the code, there's nothing blatantly obviously wrong in their bindings.cpp, AFAICT. Haven;t looked any further than that.

- my own usage of tesseract as a library has shown me that getting the parameters right can be a bit of a hassle sometimes; one of the potential failure modes is not noting that tesseract does not receive the same config baseline setup as when it ran via CLI: this is where debugging is mandatory.

My first guess would be to make very sure your tesseract config files are loaded the same way. While that can be a bit harsh to do when you're not comfortable with running this stuff in a debugger, here's a preparation step I would definitely look at if I were you:

1. tesseract via your Go code doesn't produce *anything*, while

2. tesseract CLI does deliver text ("No smoking")

which MAY be due to tesseract not finding any text word bounding-boxes when run via the Go-code route.

I see they (gosseract) present a GetBoundingBoxes API, so I would first try to run that one to see if I get any boxes at all, and if any, where they are in the image (i.e.: do I get: (a) no boxes, (b) only get gibberish boxes only or (c) at least the ones covering "NO" and "SMOKING", or what? Then try the same for the CLI (IIRC vanilla tesseract has an option to cough up bboxes only; haven't used that in a while and I'm running a customized tesseract here, so check code and documentation, don't take me at my word!)

To see what I was looking at:

https://github.com/otiai10/gosseract/blob/main/tessbridge.cpp#L108

If the bounding boxes don't show up in your Go run, then it smells like a config/setup bit not making it into the tesseract engine, so it's debugging the gosseract bindings.cpp interlayer to see what happens, really. Are CLI and Go code really, really pointing at the same config search paths, for example?

If the bounding boxes show up and match the set in the CLI, we have a serious conundrum.

Either way, that's the road I'd travel if walking in your shoes.

(If you can debug-step the tesseract CLI the same way, you can more easily compare both, perhaps, as the CLI is using the same APIs gosseract is using (with some differences, but my current bet is those are not relevant).

Also monitor the gosseract/tesseract run for error and warning messages from tesseract, as well. If it is silent, maybe force it once to barf a hairball, just so you know the error/warning/info outputs are working. Whatever you do, my bet is you have some debugging on the road ahead.

Note: I don't do Go, so haven't used gosseract. This would be my general tactic though, anyway.

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/54875e13-9f91-4f45-9eb8-ee8eec4e5846n%40googlegroups.com.

Harshit Goel

unread,

Nov 3, 2025, 8:47:58 AMNov 3

to tesser...@googlegroups.com

Hi Ger,

Thanks a lot for the detailed guidance — it was really helpful.

I ran deeper diagnostics and confirmed a few things:

Running Tesseract CLI directly works perfectly and extracts: NO SMOKING
However, when using gosseract from Go, I still get empty text output and a single empty bounding box like:
Text: [ ], Box: (1476397136,32579)-(1476956064,32579)
```
The image being processed is a valid 8-bit/16-bit PNG (confirmed via file command).
```
Setting TESSDATA_PREFIX or SetTessdataPrefix("/usr/share/tessdata") works correctly — no language load errors.
Even after forcing engine mode with tessedit_ocr_engine_mode = 1 (LSTM only) and using PSM_SPARSE_TEXT, gosseract still returns empty text.
This makes me think gosseract is initializing Tesseract differently (maybe not loading the same configs or missing something in the setup phase), because the CLI and Go layer are using the same image and tessdata.

Do you have any suggestions for checking whether gosseract is properly initializing TessBaseAPI with the same defaults as CLI?

Thanks again for your help — your earlier hint about checking bounding boxes and configuration alignment was spot on.

Best regards,
Harshit

To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60foBhh_8kWyiP9-zVyfO8JrxwgDmvm%3DZH5pnE3sHYiu_1g%40mail.gmail.com.

Ger Hobbelt

unread,

Nov 5, 2025, 3:02:03 AMNov 5

to tesseract-ocr

Sorry, can't help further. Like I said before: this reads as having to run this in a debugger and see what happens.

What DOES jump into the eye are those very odd (HUGE) b-box coordinate numbers: what you would expect to be X/y pixel coordinates of the original image and /nobody/ has images with over a billion pixels in the horizontal axis! All those 4 numbers are suspect, which leads me to suspect the binary API interface between go and c++ is possibly broken. No certainty but this smells pretty bad.

For reference and to aid your debugging efforts, go and see what tesseract cli outputs re X/y coordinates in hocr of tav output modes. The bbox numbers should fall in the same price range, so to speak. ;-)

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CADRW4UeJiWeZa6aO%2BS2pZoqG1zkMX0q18Rg0efCk7irb5u6Zsw%40mail.gmail.com.

Ger Hobbelt

unread,

Nov 5, 2025, 3:04:42 AMNov 5

to tesseract-ocr

"tav output modes": typo! I meant to say "TSV output mode". Sorry.

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Reply all

Reply to author

Forward