tesseract shifts UZN coordinates for OCR

64 views
Skip to first unread message

Philcat

unread,
Apr 26, 2019, 6:13:46 AM4/26/19
to tesseract-ocr
I am able to create a UZN file from rectangle coordinates drawn around text.
Example (pretend the blue background is a rectangle):

Unwanted text.
This is the text I want    
It could be an address   
Mr. Smith                       
10 Fake Street               
Fake Town                      
Phone: 123456 54545    
More unwanted text.

The result I get will be something like:

Unwanted text.
This is the text I want    
It could be an address   
Mr. Smith                       
10 Fake Street               
Fake Town    

This would be the same for each line in the UZN file. How do I fix this without manually adjusting the UZN coordinates for each text box?

Thanks.             

Zdenko Podobny

unread,
Apr 26, 2019, 6:23:00 AM4/26/19
to tesser...@googlegroups.com
Provide testing case (image + uzn) + details about version of tesseract language data.

Zdenko


pi 26. 4. 2019 o 12:13 Philcat <philipl...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b83b759-6e6b-4f83-b85f-609b8328e9be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philcat

unread,
Apr 26, 2019, 8:22:39 AM4/26/19
to tesseract-ocr
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Load.tiff
Load.uzn

Philcat

unread,
Apr 26, 2019, 8:24:57 AM4/26/19
to tesseract-ocr
Latest default version of tesseract.
Thanks

Philcat

unread,
Apr 28, 2019, 4:20:03 AM4/28/19
to tesseract-ocr
Any suggestion? Provided example in the thread.
Thanks

On Friday, April 26, 2019 at 12:23:00 PM UTC+2, zdenop wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
Apr 28, 2019, 8:20:15 AM4/28/19
to tesser...@googlegroups.com
Your uzn file is wrong. Did you tried to visualize?  
Load_visualize_uzn.png
If I tried attached uzn file (tesseract Load.tiff - --psm 4) I got this result:

LOAD CONFIRMATION
Load# 11928
Date 02042019
Equipment Reefer
Equipment Length ~~ 53'
Temperature 55°F
Weight 28923 Ibs.
Commodity Dry Goods (Food)
Distance 328 miles

9393 W 110th Street
51 Corporate Woods Suite 500 #5093
Overland Park, KS 66210
Docket: MC053431

COYNE INC
32830 IH 10 WEST
BOERNE, TX 75006

RECEIVING / TRAFFIC
Email: carlsleordermat@metroscg. com

Zdenko


ne 28. 4. 2019 o 10:20 Philcat <philipl...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Zdenko Podobny

unread,
Apr 28, 2019, 8:24:41 AM4/28/19
to tesser...@googlegroups.com
Which mean if you create correct uzn file you will get what you need...
Zdenko


ne 28. 4. 2019 o 14:19 Zdenko Podobny <zde...@gmail.com> napísal(a):

Philcat

unread,
Apr 28, 2019, 8:46:13 AM4/28/19
to tesseract-ocr
Hey thanks a lot for your help! 
FYI I get the uzn from a QT rubberband/rectangle.
It looks like your results are what I want but the visualization is wrong.
For me the visualization is correct but the results are wrong.
Anyway, you put me on the right track.
Thanks again.

Philcat

unread,
Apr 29, 2019, 2:11:02 PM4/29/19
to tesseract-ocr
By the way, what application are you using to visualize the UZN?
Thanks


On Sunday, April 28, 2019 at 2:24:41 PM UTC+2, zdenop wrote:

Zdenko Podobny

unread,
Apr 29, 2019, 2:44:38 PM4/29/19
to tesser...@googlegroups.com
I did with python ;-)


Zdenko


po 29. 4. 2019 o 20:11 Philcat <philipl...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Zdenko Podobny

unread,
Apr 29, 2019, 2:57:03 PM4/29/19
to tesser...@googlegroups.com
here is code

Zdenko


po 29. 4. 2019 o 20:44 Zdenko Podobny <zde...@gmail.com> napísal(a):
visualize_uzn.py

Philcat

unread,
Apr 29, 2019, 3:30:04 PM4/29/19
to tesseract-ocr
Very nice! You make it look easy.
Maybe I can use it to see why Qt mouse positions are different to what tesseract reads.
Thanks!
here is code

Zdenko


Reply all
Reply to author
Forward
0 new messages