Recognition of trademark symbol

閲覧: 143 回
最初の未読メッセージにスキップ

Martin Fadrhons

未読、
2017/03/13 12:03:592017/03/13
To: tesseract-ocr
Hi,

I was trying to train tesseract 4 to recognize trademark symbol ™. I was following examples on wiki:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer

I use German language for testing. With the traineddata from repository the trademark symbol is usually recognized as '" or some other variation of quotes. So I created training text that includes trademark symbol and started the training process. I replaced only the top layer as it is in the example, however the trademark symbol is still not recognized properly. With the newly generated traineddata the symbol is recognized as TM. I have several questions.

1. Is it needed to replace more layers?
2. How large should be the training text? (mine is based on the one that is in langdata/deu directory)
3. I noticed that there are symbols © and ®. Why is trademark symbol missing?

Any other hints would be appreciated.

Thank you for your time,
Martin

P.S. Also thanks for the great work on the tesseract OCR.

ShreeDevi Kumar

未読、
2017/03/17 5:24:012017/03/17
To: tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8c1c548b-3c39-4622-99be-0bfbe5f486cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

shree

未読、
2017/07/24 23:37:052017/07/24
To: tesseract-ocr
Martin,

Please test again with the latest code from github. Ray has posted a fix for this.

全員に返信
投稿者に返信
転送
新着メール 0 件