Hello,
I use Tesseract 3.04 on Ubuntu 12.04.
I know the words of all the papers I need to scan and I'm gonna put all of them in the user-words file but many of them are included in brackets "()" or have "-" and I also have words like these:
Anti-HAV
(Angoron)
AMH/MIS
Aντισ.Εναντι
(B-HCG)
C1
+CD56-NK
DHEA’S
HPL(
MTHFR-G20210-FV-LEIDEN
Resistance(VLEIDEN)
(TIB.C)
V(H1299R(R2))
β-FIBRINOGEN(-455G-A)
Pallidum)
ΤΟΞΟΠΛΑΣΜΑ(Τ.gondii)-ΑΝΙΧΝΕΥΣΗ
μgr/dl
Women"
Should I clean these words from punctuation or should I leave them like this? I am only gonna find these words with this exact punctuation.
Are all of the examples above legit to be put in the user-words file?
Also because I need to scan papers with English and Greek, I'm using parameter " -l eng+ell " so I also put greek words in my eng.user-words file. Is that ok?