Issue 1500 in tesseract-ocr: Tesseract OCR force pattern

9 views

Skip to first unread message

tesser...@googlecode.com

unread,

Aug 10, 2015, 3:23:42 AM8/10/15

to tesserac...@googlegroups.com

Status: New
Owner: ----

New issue 1500 by leopold....@gmail.com: Tesseract OCR force pattern
https://code.google.com/p/tesseract-ocr/issues/detail?id=1500

What steps will reproduce the problem?
1. Follow the bazaar tutorial
2. Test with simple image and pattern TEST/A/A/d/d/d
3. No filter at the result

What is the expected output? What do you see instead?

Expected : TESTAB123

See : TESTAB123
TESTABC12
TESTA1234
TEST12345
TESTABCD1

What version of the product are you using? On what operating system?
Tesseract 3
Windows 8

I want to read a specific character sequence with Tesseract wich contains
the word "TEST" followed by 2 characters and 3 digits.

I have tried bazaar matching pattern in Tesseract with the pattern

TEST\A\A\d\d\d

and ocr still recognize other words which doesn't match.

I have tried to use the "tessedit_char_whitelist" parameter but I can't
choose the position of the characters with that.

I launch the command : tesseract image.jpg result -l eng bazaar And I have
no error message, just :

"Tesseract Open Source OCR Engine v3.01 with Leptonica"

The result : TESTAB123 TESTABC12 TESTA1234 TEST12345 TESTABCD1

So it is wrong, I just wanted to catch the sequence "TESTAB123".

Can somebody tell me why the regular expression in my user-patterns file as
no effect ? For the configuration, I have STRICTLY followed the bazaar
tutorial.

Attachments:
image.jpg 31.0 KB

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply all

Reply to author

Forward

0 new messages