detect decimal point in amount with psm 11

284 views
Skip to first unread message

Kumar Rajwani

unread,
Apr 21, 2021, 12:10:37 PM4/21/21
to tesseract-ocr
Hey,
I am using tesseract to identify amounts in my forms. You can look below image for sample. i am getting perfect amount with decimal in psm 6.
but when i use psm 11 i am getting follwing output. I have to use psm 11 as it identify more text with compare to psm 6 in my images.
250,941
00
00
-75,282
175,659
00
-15,072
00
2,860
00
00
163,447
00
The code i am using.
print(pytesseract.image_to_string(image.crop((2000,1570,2500,2000)),
                                  lang="eng",
                                  config = '-c tessedit_do_invert=0 --psm 11').replace("\n\n","\n"))

I want to ask if there is any changes i can do to get decimal point with psm 11.
download.png

Zdenko Podobny

unread,
Apr 21, 2021, 3:04:20 PM4/21/21
to tesser...@googlegroups.com
Try to use better config parameters. e.g:

$ tesseract download.png - --psm 6 --oem 0
will produce:
$ 250,941.00
$ -75,282.00
$ 175,659.00
$ -15,072 00
$ 2,860.00
$ 0.00
$ 163,447.00


legacy engine could be better for numbers

Zdenko


st 21. 4. 2021 o 14:10 Kumar Rajwani <kumarraj...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4d793afb-b554-4322-83ef-4ff94accc85en%40googlegroups.com.

Kumar Rajwani

unread,
Apr 21, 2021, 3:45:12 PM4/21/21
to tesseract-ocr
Hi Zdenop, As i said i know psm 6 working better in number but it not able to get all text in image. where psm 11 does better. So this the reason i want to with psm 11 but i am getting wrong amount that's the only problem i am facing with psm 11. So can you tell me how can i achive same result as you in psm 11.
Thanks

Zdenko Podobny

unread,
Apr 21, 2021, 5:05:09 PM4/21/21
to tesser...@googlegroups.com
  1. You got the result for the image you provided.
  2. I suggest you to use other oem
  3. I know that invoice digitalizator use different parameters for parsing numbers. 

Zdenko


st 21. 4. 2021 o 17:45 Kumar Rajwani <kumarraj...@gmail.com> napísal(a):

Kumar Rajwani

unread,
Apr 22, 2021, 6:41:59 AM4/22/21
to tesseract-ocr
Hey zdenop that was the portion of full image which was not detected properly by tesseract. In full image there is lot's of information that's the reason i didn't share. All information are important so psm 11 is working great there. If i am using psm 6 then it will miss some lines so i can't use that. 
i have tried the psm 11 with oem 0,1,2,3 but none of them work as i want. 
For me the best choice is psm 11 but number are issue can you advise something on this?
Thanks

Kumar Rajwani

unread,
Apr 23, 2021, 7:47:20 AM4/23/21
to tesseract-ocr
Can you tell is there any way we can make psm 11 parameter to recognize numbers well. It will be great than.

Kumar Rajwani

unread,
Apr 23, 2021, 8:16:44 AM4/23/21
to tesseract-ocr
Hi , can you please look into this image so we can get more clear idea why i want to go with psm 11 .
If you try this image with psm 6 then 
It will miss the first line and date will be wrong also the numbers .40 will converted into AQ  but same image with psm 11 can give better results.
Can you suggest something that would be great?

download.png
Message has been deleted

Kumar Rajwani

unread,
Apr 23, 2021, 3:40:51 PM4/23/21
to tesseract-ocr
hey zdenop can you please see tesseract /content/img.png out2 --psm 11 -c textord_min_linesize=3 this command it's working for me. Please tell me if i change this parameter can i get better results or it will be mess something else.
Can you also tell me some parameter that will work for me?
Thanks

Kumar Rajwani

unread,
Apr 27, 2021, 6:59:34 AM4/27/21
to tesseract-ocr
Hey can you please suggest something how can i achive better results.
Reply all
Reply to author
Forward
0 new messages