Why having three different forms for a word in eng.lstm-word-dawg?

22 views
Skip to first unread message

Hongyu Zhou

unread,
Apr 23, 2019, 10:19:54 AM4/23/19
to tesseract-ocr
It seems like there are three forms for a word stored in the eng.lstm-word-dawg.
For example the word 'book' has three different forms: lower case (book), upper case (BOOK) and caption case (Book).
When we check whether a word is in the dictionary or not, do we really care about their forms?
When we add new words into custom word list, do we need to do the similar things that having three different forms? 

Reply all
Reply to author
Forward
0 new messages