Hello everyone!
I am trying to use tesseract-ocr (pytesseract) to detect some specific codes and I receive as input a single word at a time.
Those codes have always the same length (8) and I want to receive as output only sequences with 8 characters.
More in details I tried to :
- Create a CONFIGFILE, referring to a user pattern file
- Pass directly the file with the --user-patterns option
I also tried few different regular expression (I read that tesseract supports only a subset).
The ideal regex will be something like that ^.{8}$ because I want only to select the length, not a specific set of character (all unicode char)
I also tried some very general regex that I read are supported, such as \d that should return only sequences made of digits but it seems to be ignored.
I am missing something or it is not possible to force a sequence output length?
Thank you in advance