Choice Iterator in Tesseract 5 and spaces (word bounderies)

93 views
Skip to first unread message

jochen....@gmail.com

unread,
Sep 3, 2020, 2:10:53 AM9/3/20
to tesseract-ocr
Hi all,
I am using the new choice iterator in tesseract 5 to get the confidences for all choices for each symbol of my text. But spaces (word bounderies) are not shown, so I have no way to know when a space is between symbols. Is there a way to for example combine the word iterator with the choice iterator or any other way to know when a new word starts?

jochen....@gmail.com

unread,
Sep 3, 2020, 2:18:09 AM9/3/20
to tesseract-ocr
forgot to mention that I am using tesseract C++ API:

        tesseract::ResultIterator* res_it = api->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;                        
        tesseract::ChoiceIterator ci(*res_it);
        do {
  if (ci.Confidence() >= 0) {
    Choice* c = new Choice();
    const char* ch = ci.GetUTF8Text();  
  }
          } while (ci.Next());

jochen....@gmail.com

unread,
Sep 15, 2020, 2:42:23 AM9/15/20
to tesseract-ocr
I have found the solution, if anyone else is interested. Here is a sample application:

int main()
{
PIX* image = pixRead("R:/P12_0.jpg");

string outText = "";
tesseract::TessBaseAPI* api = new tesseract::TessBaseAPI();
api->Init(NULL, "deu", tesseract::OcrEngineMode::OEM_LSTM_ONLY);
api->SetVariable("tessedit_char_whitelist", "0123456789 .-LBME");
api->SetImage(image);
api->SetSourceResolution(300);

api->SetPageSegMode(tesseract::PSM_AUTO);
api->SetVariable("lstm_choice_mode", "2");

string text = api->GetUTF8Text();
cout << "text: " << text << endl;

tesseract::ResultIterator* res_it = api->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;

int i = 0;
if (res_it != 0) {
do {
string word = res_it->GetUTF8Text(tesseract::RIL_WORD);
tesseract::ChoiceIterator ci(*res_it);
do {
if (ci.Confidence() > 60) {
const char* ch = ci.GetUTF8Text();

cout << ch;
}


} while (ci.Next());

i++;
if (i == word.length()) {
cout << " ";
i = 0;
}
} while (res_it->Next(level));

}
}

Reply all
Reply to author
Forward
0 new messages