Can user-words file have punctuation? Should it have or not?

39 views
Skip to first unread message

Mark

unread,
Nov 23, 2015, 3:34:51 PM11/23/15
to tesseract-ocr
Hello,

I use Tesseract 3.04 on Ubuntu 12.04.

I know the words of all the papers I need to scan and I'm gonna put all of them in the user-words file but many of them are included in brackets "()" or have "-" and I also have words like these:

Anti-HAV
(Angoron)
AMH/MIS
Aντισ.Εναντι
(B-HCG)
C1
+CD56-NK
DHEA’S
HPL(
MTHFR-G20210-FV-LEIDEN
Resistance(VLEIDEN)
(TIB.C)
V(H1299R(R2))
β-FIBRINOGEN(-455G-A)
Pallidum)
ΤΟΞΟΠΛΑΣΜΑ(Τ.gondii)-ΑΝΙΧΝΕΥΣΗ
μgr/dl
Women"

Should I clean these words from punctuation or should I leave them like this? I am only gonna find these words with this exact punctuation.

Are all of the examples above legit to be put in the user-words file?

Also because I need to scan papers with English and Greek, I'm using parameter " -l eng+ell " so I also put greek words in my eng.user-words file. Is that ok?


Reply all
Reply to author
Forward
0 new messages