bad result on tesseract(4.0) with lstm

100 views
Skip to first unread message

לאה למד

unread,
Jun 20, 2017, 6:53:54 AM6/20/17
to tesseract-ocr

hi
* Attached line from the original image 

 command  tesseract file.tiff output --oem 2 -l heb --psm 6
resulte "אומדן / שווי ההתקשרות: 6 ₪ לפני מע"מ. ₪"

 command  tesseract file.tiff output --oem 0 -l heb --psm 6
resulte "אןמדן ושווי ההתקשרות: 16,656 ₪ לפניימע"מ. ₪”"

So for people that don't read hebrew i can tell that extract the sentence are more good with the lstm but for a unknown reason the extract number absolutely wrong
any ideas?

and not connect question , how i can do "hocr" in  the new tesseract?
 thank you
Screenshot from 2017-06-20 10-25-16.png

ShreeDevi Kumar

unread,
Jun 20, 2017, 7:43:53 AM6/20/17
to tesser...@googlegroups.com
Your input image quality needs to be improved.

Also test with --oem 1 alone.

and see if you get similar results.

for hocr, just adding hocr to the command line should work - as long as you have the hocr config file in your tessdata directory.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bfa31f55-a8b4-43f5-9049-417cf0f20229%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages