On 19/03/2021 10:11, Charles Cho wrote:
> I'm working on a ocr android app based on tesseract.
> I want to add feature that detects language automatically and recognize
> at least 2 languages at once.
> I have investigated on that for a while so I know that I have to specify
> language for tesseract.
> Then how can I implement auto detection of language?
Not exactly a mobile use case, but you can read how the Internet Archive
does this (I coined it "autonomous mode", where the software just
figures out the scripts and languages):
And the code is available, here (I plan to split out the archive.org
specific code from the python code that invokes Tesseract and performs
heuristics like script detection):
the tl;dr is to first perform script detection, and use the detected
script to OCR the page - then use language detection libraries to guess
the languages on the page.
> And tesseract on google play store can recognize 3 languages at once.
> Is it maximum?
I am not sure what you're finding on google play store, but I have found
there to be no limitation to the amount of languages that can be used
during OCR. Keep in mind that using more languages will slow down the
> Any help and advice would be really appreciated.
Hope this helps.