I am trying to train a language currently not present in Tesseract.
Working
with python on Ubuntu 16.04 LTS, tesseract version 3.04.01 ( installed
with sudo apt install tesseract-ocr , and is working perfectly for
english language)
I have tested with the following command :
tesseract procssed_image.png stdout -l vie
The output is 90% correct except for some characters that are not in the vietnam language.
Then,
I have created the bazaar file (/usr/share/tesseract-ocr/tessdata/configs/):
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
created a text file with my custom list of words (around 150 words, one word in each line) and named it as vie.user-words
And then ran the following command:
tesseract procssed_image.png stdout -l vie bazaar
The result was same.
Then when I tried with :
tesseract procssed_image.png stdout -l vie bazaar -c tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî
tessedit_char_whitelist <- Here, I am trying to put all the list of characters that is present in my language and other symbols present in the image file.
It shows the following errors and also prints the output ( result is same as before )
read_params_file: Can't open c
read_params_file: Can't open tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789àâêî
Please tell me how to fix this issue? Thank you for your time.