Recognition of trademark symbol

Martin Fadrhons

unread,

Mar 13, 2017, 12:03:59 PM3/13/17

to tesseract-ocr

Hi,

I was trying to train tesseract 4 to recognize trademark symbol ™. I was following examples on wiki:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer

I use German language for testing. With the traineddata from repository the trademark symbol is usually recognized as '" or some other variation of quotes. So I created training text that includes trademark symbol and started the training process. I replaced only the top layer as it is in the example, however the trademark symbol is still not recognized properly. With the newly generated traineddata the symbol is recognized as TM. I have several questions.

1. Is it needed to replace more layers?
2. How large should be the training text? (mine is based on the one that is in langdata/deu directory)
3. I noticed that there are symbols © and ®. Why is trademark symbol missing?

Any other hints would be appreciated.

Thank you for your time,
Martin

P.S. Also thanks for the great work on the tesseract OCR.

ShreeDevi Kumar

unread,

Mar 17, 2017, 5:24:01 AM3/17/17

to tesser...@googlegroups.com

Please see https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951 for more details about LSTM training.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8c1c548b-3c39-4622-99be-0bfbe5f486cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

shree

unread,

Jul 24, 2017, 11:37:05 PM7/24/17

to tesseract-ocr

Martin,

Please test again with the latest code from github. Ray has posted a fix for this.

See https://github.com/tesseract-ocr/tesseract/commit/b0ead95d64a3667339775b2f99ac37e97e90c2a0

Reply all

Reply to author

Forward