Maybe use a different segmentation mode? Try changing the line:
text = pytesseract.image_to_string(cropped_image, lang='eng').strip()
to:
text = pytesseract.image_to_string(cropped_image, lang='eng', config='--psm 6').strip()
That should help.
art
From: tesser...@googlegroups.com <tesser...@googlegroups.com>
On Behalf Of Paulus Present
Sent: Sunday, October 29, 2023 4:21 PM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: [tesseract-ocr] Poor results of Tesseract performing a play card evaluation
|
You don't often get email from present...@gmail.com. Learn why this is important |
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/9c2e162e-dce2-4a81-8138-5268b4e16423n%40googlegroups.com.
Hi Paulus,
Yes, I am not sure why Tesseract struggles with the first all caps region in that section. The colors are so clean in that image that you might be able to use something like opencv to extract regions based on color in addition to location. One other idea is to leverage Tesseract’s accuracy metrics. These are available in the API and also in the hocr output. For example, the first word “LOOK” is rendered as:
<span class='ocrx_word' id='word_1_1' title='bbox 9 70 85 101; x_wconf 11'>010]</span>
Tesseract doesn’t fare well but it does give a low confidence value (“11”) and the coordinates of the word “9 70 85 101”. You could consider using those to extract the region for the word(s) and using Tesseract on that on its own.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/af702564-f222-44bf-b574-82452d066208n%40googlegroups.com.
Wow, thanks, it will take me a while to parse this but it sounds very promising.
art
From: tesser...@googlegroups.com <tesser...@googlegroups.com>
On Behalf Of Paulus Present
Sent: Monday, November 20, 2023 4:56 AM
To: tesseract-ocr <tesser...@googlegroups.com>
Subject: Re: [tesseract-ocr] Poor results of Tesseract performing a play card evaluation
|
You don't often get email from present...@gmail.com. Learn why this is important |
Previous message >>
Now, I suspect that in future development of the OpenAI API and their models it will become possible to query a custom GPT version which could be pretrained manually by simply coversing with it about a sample
text you provide and telling it what it transcribed wrong and how to correct. I already tested this in the browser interface and my findings are that if you present GPT-4 in the browser just 1 image which you ask to transcribe it does a fine job. You then
proceed by poinying out it's errors and ask it to correct thereby helping GPT-4 to extend it's knowledge base in that conversation. You then feed it a 2nd text and do the same process. After a certain set of images it will transcribe perfectly from the first
attempt, simply because it has acquired the skills to read the documentation based on your specific steering and instructions. When you would then be able to query this model in the API, your prompt would simply need to be an image and it would know what to
do and result you the transcribed text back perfectly. For now I still need the long prompt ;).
I thought this info might be useful to you, but can imagine that you are well aware of this already seen as you are already in this field.
I have attached my code and the docs you need to run it. All paths in the code need to be changed of course to the locations where you will put the source files.
Also you need your own OpenAI API key which you can get here
https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key (It is a paid service so you need to add min $5 on your account)
Some additional info:
- The script uses a combination of OCR, classic image recognition and GPT-4 Vision to get all different datapoints. Where OCR or image recognition sufficed I applied this because a deterministic procedure seems preferably when sufficient.
- The special symbols in the text were found with classical image recognition and put in a dictionary based on location in the body text. I then used this dict to replace all symbol placeholders in the GPT-4 transcription with the actual symbols from the image
recognition dict. This is overly complex, but was the only way to get the accuracy I wanted and being able to only prompt vanilla GPT-4 Vision. When you will be able to query your own custom trained GPT-4 Vision the replacement step will not be necessary as
with training, it can learn to recognize the symbols itself. I have tested this in the browser interface and this is the case. When I correct a symbol transcription once it remembers this for future transcriptions in the same chat.
- The script part for GPT-4 Vision already implements a batching method, but as batching is not yet allowed on the OpenAI API side, the batch size is set to '1'. However in the browser interface you can upload up to 10 images in 1 prompt, so I suspect this
will become available via the API sometime in the future. It suffices then to increase the BATCH_size on line 489 to start using this option.
So, this was a very long explanation and I am sorry for that. I am however very enthousiastic about the results and you surely helped me along. Thanks again! Also curious to see how you would use this in your field :)
Let me know what you think! :)
Kind regards
Paulus
>> Attachments
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b5385ca-2607-4dd3-9fad-4aa3dc6cbc79n%40googlegroups.com.