Is there a way to disable the thresholding (binarization) from command line or Tesseract C++ API?

43 views
Skip to first unread message

epiphany27

unread,
Oct 8, 2018, 6:19:01 PM10/8/18
to tesseract-ocr
Hi

I have been trying to figure out if there's a way to disable the default thresholding done by leptonica during pre-preprocessing. I think in some cases when the scan quality of PDFs is good enough, the thresholding step ends up deteriorating the OCR accuracy. I have a feeling that thresholding is not really needed for all cases. The only way I could disable the thresholding step is by commenting out the following lines in baseapi.cpp . Is there any other way?

/*if (!thresholder_->IsBinary()) {
tesseract_->set_pix_thresholds(thresholder_->GetPixRectThresholds());
tesseract_->set_pix_grey(thresholder_->GetPixRectGrey());
} else { */
tesseract_->set_pix_thresholds(nullptr);
tesseract_->set_pix_grey(nullptr);
//}
Reply all
Reply to author
Forward
0 new messages