Here is my code:
api_ = new tesseract::TessBaseAPI();
api_->Init(tessdata_path.c_str(), "eng", tesseract::OEM_LSTM_ONLY);
Pix* image = pixRead(m_img_filename.c_str());
api_->SetImage(image);
api_->Recognize(nullptr);
auto i = 0;
tesseract::ResultIterator* ri = api_->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (ri != 0) {
do {
i++;
const char* word = ri->GetUTF8Text(level);
float conf = ri->Confidence(level);
int x1, y1, x2, y2;
ri->BoundingBox(level, &x1, &y1, &x2, &y2);
...
/* do something with the word and coordinates */
...
delete[] word;
} while (ri->Next(level));
}
pixDestroy(&image);
api_->End();
I am working with floor plans which contains some text, lines and other objects. Here is what happens. If I cut a smaller piece from a large image, save it to a file
(e.g. image_crop.jpeg) and run the above code, all of the text blocks get detected and OCRed, here is an example:

Note the three "HDU2" blocks (borders coloring represents the confidence)
If I run the same exact code against the whole image, which is 5400x3600 pixels here is what I get (this is not the whole image, just the problematic part):

Only one of the three "HDU2" pieces got detected.
Most of the text on the large images is recognized correctly (it has same size, same font) but there are problematic parts like this.
DPI is the same (150) on both the whole drawing and the crop.
I've tried all the engine modes, all page segmentation modes and a few random variables from the "tesseract --print-parameters" list.
There must be a trick to make it work. I mean it obviously can detect this text and yet for some reason it won't.
Any suggestion would be much appreciated.