I am very new to coding so forgive me.
I have been having an extremely low success rate with tesseract.
Here are 3 examples both pre- and post- processing:
These were scanned as "a" ,"Ss30", and "moh" respectively.
I consider the yellow one a success, as I can just regex the 30 out of the result, but I still don't understand how it could be so off for the rest.
I've tried different traineddatas, even including one that I trained myself on over 200 data examples.
I have three theories as to why I couldn't train it:
1. The different colours are processed differently, causing differently shaped characters. (Red looks bold and yellow looks thin)
2. The different sizes of the images causes the characters to be slightly differently shaped when cropped.
3. Tesseract assumes that the two lines of text are one, and reads them together.
Could someone please give me a hint on what to try? I don't want to spend another day training it on just blue ones (for example) only to find that colour isn't the problem.
Thanks