Checking HasNext on Tesseract API to avoid getting error when there is no item in the iterator.

132 views
Skip to first unread message

Kurniawan Kurniawan

unread,
Sep 21, 2018, 12:42:52 AM9/21/18
to tesseract-ocr
* **Tesseract Version**: 3.5.0 and 4.0
* **Platform**: mac & ubuntu 18.04


I am iterating over recognize api as suggested here. "Result iterator example"

Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  api->Init(NULL, "eng");
  api->SetImage(image);
  api->Recognize(0);
  tesseract::ResultIterator* ri = api->GetIterator();
  tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
  if (ri != 0) {
    do {
      const char* word = ri->GetUTF8Text(level);
      float conf = ri->Confidence(level);
      int x1, y1, x2, y2;
      ri->BoundingBox(level, &x1, &y1, &x2, &y2);
      printf("word: '%s';  \tconf: %.2f; BoundingBox: %d,%d,%d,%d;\n",
               word, conf, x1, y1, x2, y2);
      delete[] word;
    } while (ri->Next(level));
  }

it seems we do before calling next. If I am calling next first, it will skip over the first detections.

Unfortunately when I get this error, 
**'NoneType' object is not iterable issue**
When there is no detection.

Here is the code in python-tesserocr
 if (iterator != None):
        next = True
        while (next):
            x,y,right,bottom = iterator.BoundingBox(level)
                        
            next = iterator.Next(level)

it throws an error on iterator.BoundingBox saying that the iterator is empty.

If I call "next = iterator.Next(level)" first, it avoids that error, but it also skip over the first detection


Here is the image, just the whitespace image


Is there any API to check the hasNext()?

Thanks

Zdenko Podobny

unread,
Sep 21, 2018, 9:13:22 AM9/21/18
to tesser...@googlegroups.com
No there is not such function. there is IsAtBeginningOf (TessPageIteratorIsAtBeginningOf) and  IsAtFinalElement (TessPageIteratorIsAtFinalElement).

If you are interesting in BoundingBoxes you can check this "Getting the bounding box of the recognized words using python-tesseract".

If you are interesting at handling of tesseract iterator in python than you can find some inspiration in pyocr code, or nidaba.

Zdenko


pi 21. 9. 2018 o 6:42 Kurniawan Kurniawan <kku...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/359b2d1f-359e-4d26-8224-06747e62ac5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages