Hey, so I am trying to train a new Tesseract model to only recognize certain UTF-8 symbols as I want an OCR that only recognizes these symbols and not other English letters etc. I realize there are two ways I can do this - one is to fine tune Tesseract over the normal English model and then blacklist the English text or train a completely new model that only recognizes this text. I was wondering if I could get some input into which of these - or another method, is better for ease, time and accuracy.
The context is I will have some various texts on a board and I want to recognize the locations of the symbols. However, I don't want to recognize any of the English or anything else as this may mess with my post processing. I have tried a few locations (like restricting where these symbols can be on the board and then only scanning the text in those strips) but I am not satisfied with the results. Additionally, I can also control the font and the size of the text on the board and everything else, except the actual codes.
Thanks for the help!