Here's a list of the problematic languages in UDHR:
1. The following raises UnicodeDecodeError when trying udhr.words("language"):
Arabic_Alarabia-Arabic
Burmese_Myanmar-UTF8
Chinese_Mandarin-UTF8
Czech_Cesky-UTF8
Gujarati-UTF8
Hungarian_Magyar-Unicode
Lao-UTF8
Magahi-UTF8
Marathi-UTF8
Tamil-UTF8
2. The following returns a list of str, not unicode:
Abkhaz-Cyrillic+Abkh
Amahuaca
Amharic-Afenegus6..60375
Armenian-DallakHelv
Azeri_Azerbaijani_Cyrillic-Az.Times.Cyr.Normal0117
Azeri_Azerbaijani_Latin-Az.Times.Lat0117
Bhojpuri-Agra
Burmese_Myanmar-WinResearcher
Chinese_Mandarin-HZ
Czech-Latin2-err
Esperanto-T61
Japanese_Nihongo-EUC
Japanese_Nihongo-JIS
Lithuanian_Lietuviskai-Baltic
Magahi-Agra
Navaho_Dine-Navajo-Navaho-font
Russian_Russky-UTF8~
Tigrinya_Tigrigna-VG2Main
Turkish_Turkce-Turkish
Vietnamese-TCVN
Vietnamese-VIQR
Vietnamese-VPS
3. The following languages are duplicated. I know some of them are because the language has different scripts or several versions, but I doubt that this is the case for all of them. E.g., there are 5 different versions for Czech, which sounds implausible.
- Abkhaz-Cyrillic+Abkh
Abkhaz-UTF8
- Amahuaca
Amahuaca-Latin1
- Azeri_Azerbaijani_Cyrillic-Az.Times.Cyr.Normal0117
Azeri_Azerbaijani_Latin-Az.Times.Lat0117
- Belorus_Belaruski-Cyrillic
Belorus_Belaruski-UTF8
- Bosnian_Bosanski-Cyrillic
Bosnian_Bosanski-Latin2
Bosnian_Bosanski-UTF8
- Bulgarian_Balgarski-Cyrillic
Bulgarian_Balgarski-UTF8
- Burmese_Myanmar-UTF8
Burmese_Myanmar-WinResearcher
- Cashibo-Cacataibo-Latin1
Cashinahua-Latin1
- Catalan-Latin1
Catalan_Catala-Latin1
- Chinanteco-Ajitlan-Latin1
Chinanteco-UTF8
- Chinese_Mandarin-GB2312
Chinese_Mandarin-HZ
Chinese_Mandarin-UTF8
- Czech-Latin2
Czech-Latin2-err
Czech-UTF8
Czech_Cesky-Latin2
Czech_Cesky-UTF8
- Esperanto-T61
Esperanto-UTF8
- Farsi_Persian-UTF8
Farsi_Persian-v2-UTF8
- Greek_Ellinika-Greek
Greek_Ellinika-UTF8
- HaitianCreole_Kreyol-Latin1
HaitianCreole_Popular-Latin1
- Hebrew_Ivrit-Hebrew
Hebrew_Ivrit-UTF8
- Hindi-UTF8
Hindi_web-UTF8
- Hmong_Miao-Sichuan-Guizhou-Yunnan-Latin1
Hmong_Miao-SouthernEast-Guizhou-Latin1
Hmong_Miao_Northern-East-Guizhou-Latin1
- Hungarian_Magyar-Latin1
Hungarian_Magyar-Latin2
Hungarian_Magyar-UTF8
Hungarian_Magyar-Unicode
- Italian-Latin1
Italian_Italiano-Latin1
- Japanese_Nihongo-EUC
Japanese_Nihongo-JIS
Japanese_Nihongo-SJIS
Japanese_Nihongo-UTF8
- Kazakh-Cyrillic
Kazakh-UTF8
- Kinyamwezi_Nyamwezi-Latin1
Kinyarwanda-Latin1
- Latin_Latina-Latin1
Latin_Latina-v2-Latin1
- Magahi-Agra
Magahi-UTF8
- Mapudungun_Mapuzgun-Latin1
Mapudungun_Mapuzgun-UTF8
- Mongolian_Khalkha-Cyrillic
Mongolian_Khalkha-UTF8
- Norwegian-Latin1
Norwegian_Norsk-Bokmal-Latin1
Norwegian_Norsk-Nynorsk-Latin1
- Nyanja_Chechewa-Latin1
Nyanja_Chinyanja-Latin1
- OccitanAuvergnat-Latin1
OccitanLanguedocien-Latin1
- Polish-Latin2
Polish_Polski-Latin2
- Romani-Latin1
Romani-UTF8
Romanian-Latin2
Romanian_Romana-Latin2
- Russian-Cyrillic
Russian-UTF8
Russian_Russky-Cyrillic
Russian_Russky-UTF8
Russian_Russky-UTF8~
- Serbian_Srpski-Cyrillic
Serbian_Srpski-Latin2
Serbian_Srpski-UTF8
- Slovak-Latin2
Slovak_Slovencina-Latin2
- Spanish-Latin1
Spanish_Espanol-Latin1
- Tonga-Latin1
Tongan_Tonga-Latin1
- Turkish_Turkce-Turkish
Turkish_Turkce-UTF8
- Uighur_Uyghur-Latin1
Uighur_Uyghur-UTF8
- Ukrainian-Cyrillic
Ukrainian-UTF8
- Vietnamese-ALRN-UTF8
Vietnamese-TCVN
Vietnamese-UTF8
Vietnamese-VIQR
Vietnamese-VPS
- Zapoteco-Latin1
Zapoteco-SanLucasQuiavini-Latin1
/Peter