Hello,
When I've attempted to use a custom non-english characters dictionary on a Ubuntu 16.04 Linux box, I'm getting an error that make it seem like it cannot match the special characters.
The dictionary contains the "PÅ" and the transcription contains the "PÅ". This happens to be the first instance of a special character in my test data.
Below I'm including my command line and some file type information. I've been concerns that inconsistencies in the text file format might impact the results and tried some variations.
Any pointers appreciated.
mcgarrah@fave:~/FAVE/FAVE-align$ python FAAValign.py -v -i ~/DATA/CUSTOM_DIC.txt ~/DATA/READING.wav ~/DATA/READING.txt ~/DATA/READING.TextGrid
Read dictionary from file model/dict.
Added all entries in file CUSTOM_DIC.txt to CMU dictionary.
Read dictionary from file added_dict_entries.txt.
Added new entries from file CUSTOM_DIC.txt to file added_dict_entries.txt.
Encoding is UTF-16!
Encoding is UTF-8!
Read transcription file READING.txt.
Checking format of input transcription file...
Checking dictionary entries for all words in the input transcription...
Please enter the Arpabet transcription of word PÅ, or enter [s] to skip.
mcgarrah@fave:~/DATA$ file READING.txt
READING.txt: UTF-8 Unicode text
READING_GMAIL.txt: ISO-8859 text
mcgarrah@fave:~/DATA$ file CUSTOM_DIC*.txt
CUSTOM_DIC.txt: ISO-8859 text
CUSTOM_DIC_DropBox.txt: Non-ISO extended-ASCII text, with CR line terminators
CUSTOM_DIC_GMAIL.txt: ISO-8859 text
CUSTOM_DIC_ORIG.txt: Non-ISO extended-ASCII text