Hi,
I was trying to train tesseract 4 to recognize trademark symbol ™. I was following examples on wiki:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Examplehttps://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-LayerI use German language for testing. With the traineddata from repository the trademark symbol is usually recognized as '" or some other variation of quotes. So I created training text that includes trademark symbol and started the training process. I replaced only the top layer as it is in the example, however the trademark symbol is still not recognized properly. With the newly generated traineddata the symbol is recognized as TM. I have several questions.
1. Is it needed to replace more layers?
2. How large should be the training text? (mine is based on the one that is in langdata/deu directory)
3. I noticed that there are symbols © and ®. Why is trademark symbol missing?
Any other hints would be appreciated.
Thank you for your time,
Martin
P.S. Also thanks for the great work on the tesseract OCR.