Not able to force a specific sequence length

30 views

Skip to first unread message

Fernando

unread,

Nov 22, 2019, 4:08:16 AM11/22/19

to tesseract-ocr

Hello everyone!

I am trying to use tesseract-ocr (pytesseract) to detect some specific codes and I receive as input a single word at a time.

Those codes have always the same length (8) and I want to receive as output only sequences with 8 characters.

I have tried all the solution described in the manual https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#CONFIGFILE without success.

More in details I tried to :

Create a CONFIGFILE, referring to a user pattern file
Pass directly the file with the --user-patterns option

I also tried few different regular expression (I read that tesseract supports only a subset).

The ideal regex will be something like that ^.{8}$ because I want only to select the length, not a specific set of character (all unicode char)

I also tried some very general regex that I read are supported, such as \d that should return only sequences made of digits but it seems to be ignored.

I am missing something or it is not possible to force a sequence output length?

Thank you in advance

Reply all

Reply to author

Forward

0 new messages