I'm using Tesseract with Python. I have an image with 1-6 words in it and need to read the text. Sometimes the character "C", which look the same in upper and lower case, is detected as lower case c instead of upper case C. I see the problem, but in context to the following letters it should be possible to detect the right notation. Is there any configuration or something to improve this?
I had a look at the configuration options of config='-psm x' with different values for x, but nothing fits to my problem
I'm using Tesseract with Python. I have an image with 1-6 words in it and need to read the text. Sometimes the character "C", which look the same in upper and lower case, is detected as lower case c instead of upper case C. I see the problem, but in context to the following letters it should be possible to detect the right notation. Is there any configuration or something to improve this?
I had a look at the configuration options of config='-psm x' with different values for x, but nothing fits to my problem
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ef0e07cc-5f7c-4ff3-bb07-ffdda4c68321%40googlegroups.com.
I'm using Tesseract with Python. I have an image with 1-6 words in it and need to read the text. Sometimes the character "C", which look the same in upper and lower case, is detected as lower case c instead of upper case C. I see the problem, but in context to the following letters it should be possible to detect the right notation. Is there any configuration or something to improve this?
I had a look at the configuration options of config='-psm x' with different values for x, but nothing fits to my problem
--
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c8271c49-77a3-4081-9418-0a822be1f8c7%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com.
import pytesseract
import argparse
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image to be OCR'd")
args = vars(ap.parse_args())
# load the example image and convert it to grayscale
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# write the grayscale image to disk as a temporary file so we can
# apply OCR to it
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
text = pytesseract.image_to_string(gray)
print("Output: " + text)
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com.
But therefore I get empty strings now, because it occurs a symbol that tesseract does not know. I had this problem before as well, but could fix it for whatever reason with config='--psm 7'. This doesn't work now anymore... Do you have an idea for this as well? I don't need to detect the symbol, I just want that the rest of the string is not "thrown away"...
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/29f63b14-e2f4-481b-89f6-bd8149e71138%40googlegroups.com.
I realized that it also occurs for strings without the symbol. The image given below for example returns an empty string as well. But in this case, it is recognized correctly with config='--psm 7' But unfortunately I cannot presume generally for this case that it is only one line text. Maybe the problem is because it is no word given in the dictionary? I found out that it is possible to enable the dictionary and to get back the single letters with the highest accuracy, but I did not get how to do this. I tried it with this config:
text = pytesseract.image_to_string(gray, config='load_system_dawg=0')
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ebbdd84b-0928-43b1-a0d8-d7c9308f7616%40googlegroups.com.