Hi All,
Currently am doing OCR line by line and getting words details from ResultIterator like below
tessAPI->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
tessAPI->SetRectangle(iXmin, iYmin, iW, iH); //these line boxes are being calculated by our pre-processing and segmentation code)
tessAPI->Recognize(nullptr);
tesseract::ResultIterator* rst_iter = tessAPI->GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if (nullptr != rst_iter)
{
do
{
const char* text = rst_iter->GetUTF8Text(level);
rst_iter->WordFontAttributes(&is_bold, &is_italic, &is_underlined, &is_monospace, &is_serif, &is_smallcaps, &pointsize, &font_id);
//here I want to get the line & para of the current word belongs to from tess API
} while (rst_iter->Next(level));
}
I can get paras/lines/words using tessAPI->GetComponentImages() function, but for words only can get block/paras only. Somehow I am mapping those words with lines, but still getting some garbage.
Is there any way to get the line & para of the current word belongs to?
Thanks in advance,
Lakshman.