
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7e3e4683-2c35-42d2-ba72-8df2773d15b9n%40googlegroups.com.

If your environment supports something like opencv, and if the numbers are a consistent color, you could try to get the bottom numbers by leveraging the color to extract them. I have attached a simple example from using the code below.
Best,
art
---
import cv2
import numpy as np
image = cv2.imread('test.png')
lower_white = np.array([200, 200, 200], dtype = "uint8")
upper_white= np.array([255, 255, 255], dtype = "uint8")
mask = cv2.inRange(image, lower_white, upper_white)
detected_output = cv2.bitwise_and(image, image, mask = mask)
cv2.imwrite("white.png",detected_output)
From: tesser...@googlegroups.com <tesser...@googlegroups.com>
On Behalf Of Aftab
Sent: Wednesday, December 20, 2023 7:01 AM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: Re: [tesseract-ocr] Numbers detection
|
You don't often get email from afta...@gmail.com. Learn why this is important |
Thank you for the response.
This is my original image, any pointers on how to remove the non-text part would be helpful.

On Wednesday, December 20, 2023 at 10:56:16 AM UTC+5:30 pankaj....@gmail.com wrote:
Hello
Can you help us with what steps needs to be taken with image
Fonts size are unequal
Somewhere background colors are dark text are light
What can we increase the accuracy in this ?
On Tuesday 19 December 2023 at 21:45:18 UTC+5:30 Zdenko Podobny wrote:
Hello,
For Tesseract you need to remove all non-text parts (graphics element). IMO also the outline number would be problematic.
It would be better to post the original image so people can play with preprocessing...
See e.g. this discussion https://groups.google.com/g/tesseract-ocr/c/YqW9XhbWC_8/m/75juLKoJDwAJ (not sure if this is possible with javascript)
Zdenko
ut 19. 12. 2023 o 17:08 Aftab <afta...@gmail.com> napísal(a):
Hey guys,
I am very new to image processing & OCR. But after a lot of trial and error.
I have reached to this point. I have a small image, cropped from larger input and the image is pre-processed to maximise the visibility of the number.
It is able to detect 10000 at the top, but it is not able to detect the number on the bottom.
Here is the processed image I am working with.
.
I am running this in Browser using the tesseract.js node module, and here is my code for the detection: Tried with default pageseg_mode, as well as various other modes. 11 worked best out of all.
async function recognizeText(image) {
const worker = await createWorker('eng');
await worker.setParameters({
tessedit_char_whitelist: '0123456789',
tessedit_pageseg_mode: '11',
});
const ret = await worker.recognize(image);
console.log(ret.data.text);
await worker.terminate();
}
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7e3e4683-2c35-42d2-ba72-8df2773d15b9n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/417e7c17-0ef4-4cb2-a7e9-2ae707c91cbbn%40googlegroups.com.