Reading image from Rubber

Taresh Chaudhari

unread,

Nov 25, 2024, 5:22:27 AM11/25/24

to tesseract-ocr

Hi,
I am trying to read the characters from the image, which has characters with black color in the background. Attaching the code which i used to extract, currently its giving the partial output. Can you help me to guide how to make it accurate?

import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
# Paths to your images
image_paths = [
'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg']

# Function to process an image and extract text
def extract_text_from_image(image_path):
# Open the image
img = Image.open(image_path)

# Use pytesseract to perform OCR
extracted_text = pytesseract.image_to_string(img, config='--psm 6') # PSM 6 assumes a block of text
return extracted_text.strip()

# Process all images and print results
for img_path in image_paths:
text = extract_text_from_image(img_path)
print(f"Text extracted from {img_path}: {text}")

Taresh Chaudhari

unread,

Nov 25, 2024, 7:12:37 AM11/25/24

to tesseract-ocr

Attaching a image for reference.

crop1.png

محمود محمد‎

unread,

Nov 25, 2024, 2:01:29 PM11/25/24

to tesser...@googlegroups.com

To improve the accuracy of text extraction, you can preprocess the image before passing it to the OCR engine. Preprocessing techniques like converting the image to grayscale, enhancing contrast, or applying filters can help reduce noise and improve readability. Additionally, tweaking the pytesseract settings like changing the --psm value may also improve the results.

Here’s an updated version of your code with some preprocessing steps:
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'

# Path to your image
image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'

def extract_text_from_image(image_path):
# Open the image
img = Image.open(image_path)

    # Convert the image to grayscale to improve text-background contrast
    img = img.convert('L') # Convert image to grayscale
    img = ImageEnhance.Contrast(img).enhance(2) # Increase contrast
    img = img.filter(ImageFilter.SHARPEN) # Sharpen the image

# Use pytesseract to extract text

extracted_text = pytesseract.image_to_string(img, config='--psm 6') # PSM 6 assumes a block of text
return extracted_text.strip()

# Extract and print text
text = extract_text_from_image(image_path)
print(f"Text extracted from {image_path}: {text}")

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com.

Taresh Chaudhari

unread,

Nov 26, 2024, 8:01:34 AM11/26/24

to tesseract-ocr

Thanks Mahmoud for sharing. I did apply these techniques, but still results are not good and still trying to solve this problem. Let me see how does it proceed.

محمود محمد‎

unread,

Dec 11, 2024, 8:23:17 AM12/11/24

to tesser...@googlegroups.com

Hello I want make or generated with you a simple file trainddata by jtessboxeditor for Tesseract and test it can you inform me time to discuss The steps. Thanks

To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com.

Taresh Chaudhari

unread,

Dec 20, 2024, 1:08:38 AM12/20/24

to tesseract-ocr

HI,

Sure, can we connect tomorrow around 11:30 am IST at Google meet. My Id is "tareshc...@gmail.com".

محمود محمد‎

unread,

Dec 20, 2024, 1:14:45 AM12/20/24

to tesser...@googlegroups.com

OK thanks

To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/f4cda1a1-15e8-49b9-9cd0-b37c791cdf9bn%40googlegroups.com.

Reply all

Reply to author

Forward