Reading image from Rubber

112 views
Skip to first unread message

Taresh Chaudhari

unread,
Nov 25, 2024, 5:22:27 AM11/25/24
to tesseract-ocr
Hi, 
I am trying to read the characters from the image, which has characters with black color in the background. Attaching the code which i used to extract, currently its giving the partial output. Can you help me to guide how to make it accurate? 


import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
# Paths to your images
image_paths = [
   'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg']

# Function to process an image and extract text
def extract_text_from_image(image_path):
    # Open the image
    img = Image.open(image_path)
   
    # Use pytesseract to perform OCR
    extracted_text = pytesseract.image_to_string(img, config='--psm 6')  # PSM 6 assumes a block of text
    return extracted_text.strip()

# Process all images and print results
for img_path in image_paths:
    text = extract_text_from_image(img_path)
    print(f"Text extracted from {img_path}: {text}")

Taresh Chaudhari

unread,
Nov 25, 2024, 7:12:37 AM11/25/24
to tesseract-ocr
Attaching a image for reference.
crop1.png

محمود محمد

unread,
Nov 25, 2024, 2:01:29 PM11/25/24
to tesser...@googlegroups.com

To improve the accuracy of text extraction, you can preprocess the image before passing it to the OCR engine. Preprocessing techniques like converting the image to grayscale, enhancing contrast, or applying filters can help reduce noise and improve readability. Additionally, tweaking the pytesseract settings like changing the --psm value may also improve the results.

Here’s an updated version of your code with some preprocessing steps:
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'

# Path to your image
image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'

def extract_text_from_image(image_path):
    # Open the image
    img = Image.open(image_path)

    # Convert the image to grayscale to improve text-background contrast
    img = img.convert('L')  # Convert image to grayscale
    img = ImageEnhance.Contrast(img).enhance(2)  # Increase contrast
    img = img.filter(ImageFilter.SHARPEN)  # Sharpen the image

    # Use pytesseract to extract text


    extracted_text = pytesseract.image_to_string(img, config='--psm 6')  # PSM 6 assumes a block of text
    return extracted_text.strip()

# Extract and print text
text = extract_text_from_image(image_path)
print(f"Text extracted from {image_path}: {text}")


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com.

Taresh Chaudhari

unread,
Nov 26, 2024, 8:01:34 AM11/26/24
to tesseract-ocr
Thanks Mahmoud for sharing. I did apply these techniques, but still results are not good and still trying to solve this problem. Let me see how does it proceed.

محمود محمد

unread,
Dec 11, 2024, 8:23:17 AM12/11/24
to tesser...@googlegroups.com

Hello I want make or generated with you a simple file trainddata by jtessboxeditor for Tesseract and test it can you inform me time to discuss The steps.  Thanks


Taresh Chaudhari

unread,
Dec 20, 2024, 1:08:38 AM12/20/24
to tesseract-ocr
HI,
Sure, can we connect tomorrow around 11:30 am IST at Google meet.  My Id is "tareshc...@gmail.com".

محمود محمد

unread,
Dec 20, 2024, 1:14:45 AM12/20/24
to tesser...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages