Checking HasNext on Tesseract API to avoid getting error when there is no item in the iterator.

132 views

Skip to first unread message

Kurniawan Kurniawan

unread,

Sep 21, 2018, 12:42:52 AM9/21/18

to tesseract-ocr

* **Tesseract Version**: 3.5.0 and 4.0

* **Platform**: mac & ubuntu 18.04

I am iterating over recognize api as suggested here. "Result iterator example"

https://github.com/tesseract-ocr/tesseract/wiki/APIExample

Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  api->Init(NULL, "eng");
  api->SetImage(image);
  api->Recognize(0);
  tesseract::ResultIterator* ri = api->GetIterator();
  tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
  if (ri != 0) {
    do {
      const char* word = ri->GetUTF8Text(level);
      float conf = ri->Confidence(level);
      int x1, y1, x2, y2;
      ri->BoundingBox(level, &x1, &y1, &x2, &y2);
      printf("word: '%s';  \tconf: %.2f; BoundingBox: %d,%d,%d,%d;\n",
               word, conf, x1, y1, x2, y2);
      delete[] word;
    } while (ri->Next(level));
  }

it seems we do before calling next. If I am calling next first, it will skip over the first detections.

Unfortunately when I get this error,

**'NoneType' object is not iterable issue**

When there is no detection.

Here is the code in python-tesserocr

if (iterator != None):

next = True

while (next):

x,y,right,bottom = iterator.BoundingBox(level)

next = iterator.Next(level)

it throws an error on iterator.BoundingBox saying that the iterator is empty.

If I call "next = iterator.Next(level)" first, it avoids that error, but it also skip over the first detection

Here is the image, just the whitespace image

![white-space](https://user-images.githubusercontent.com/698547/45849852-8c796280-bce8-11e8-9b06-196d8909d297.png)

Is there any API to check the hasNext()?

Thanks

Zdenko Podobny

unread,

Sep 21, 2018, 9:13:22 AM9/21/18

to tesser...@googlegroups.com

No there is not such function. there is IsAtBeginningOf (TessPageIteratorIsAtBeginningOf) and IsAtFinalElement (TessPageIteratorIsAtFinalElement).

If you are interesting in BoundingBoxes you can check this "Getting the bounding box of the recognized words using python-tesseract".

If you are interesting at handling of tesseract iterator in python than you can find some inspiration in pyocr code, or nidaba.

Zdenko

pi 21. 9. 2018 o 6:42 Kurniawan Kurniawan <kku...@gmail.com> napísal(a):

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/359b2d1f-359e-4d26-8224-06747e62ac5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages