The Output Using Multiple Languages

21 views
Skip to first unread message

Layne Wang

unread,
Jun 7, 2018, 4:36:03 AM6/7/18
to tesseract-ocr
Hi,
I'm using Tesseract 4.0.0-alpha on Ubuntu 16.04.
I refer to https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage, Using Multiple Languages section.
In the wiki, it says the sequence of  the arg <lang1+lang2> matters the output, and there is a priority for these languages.

My questions are
  • What does "primary language" mean? I know it will affect the spacing and probably which character to output, but I'm not sure how it really works.
  • How does tesseract choose the 'best' character among all the languages? Is it based on the confidence/score? And how does the sequence of the <lang1+lang2> arg affect the output?
Thanks in advance!
Layne
Reply all
Reply to author
Forward
0 new messages