Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
UDHR unicode problems
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  1 message - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
peter ljunglöf  
View profile  
 More options Jan 12 2012, 3:29 am
From: peter ljunglöf <peter.ljung...@heatherleaf.se>
Date: Thu, 12 Jan 2012 09:29:59 +0100
Local: Thurs, Jan 12 2012 3:29 am
Subject: UDHR unicode problems
Here's a list of the problematic languages in UDHR:

1. The following raises UnicodeDecodeError when trying udhr.words("language"):

Arabic_Alarabia-Arabic
Burmese_Myanmar-UTF8
Chinese_Mandarin-UTF8
Czech_Cesky-UTF8
Gujarati-UTF8
Hungarian_Magyar-Unicode
Lao-UTF8
Magahi-UTF8
Marathi-UTF8
Tamil-UTF8

2. The following returns a list of str, not unicode:

Abkhaz-Cyrillic+Abkh
Amahuaca
Amharic-Afenegus6..60375
Armenian-DallakHelv
Azeri_Azerbaijani_Cyrillic-Az.Times.Cyr.Normal0117
Azeri_Azerbaijani_Latin-Az.Times.Lat0117
Bhojpuri-Agra
Burmese_Myanmar-WinResearcher
Chinese_Mandarin-HZ
Czech-Latin2-err
Esperanto-T61
Japanese_Nihongo-EUC
Japanese_Nihongo-JIS
Lithuanian_Lietuviskai-Baltic
Magahi-Agra
Navaho_Dine-Navajo-Navaho-font
Russian_Russky-UTF8~
Tigrinya_Tigrigna-VG2Main
Turkish_Turkce-Turkish
Vietnamese-TCVN
Vietnamese-VIQR
Vietnamese-VPS

3. The following languages are duplicated. I know some of them are because the language has different scripts or several versions, but I doubt that this is the case for all of them. E.g., there are 5 different versions for Czech, which sounds implausible.

- Abkhaz-Cyrillic+Abkh
  Abkhaz-UTF8
- Amahuaca
  Amahuaca-Latin1
- Azeri_Azerbaijani_Cyrillic-Az.Times.Cyr.Normal0117
  Azeri_Azerbaijani_Latin-Az.Times.Lat0117
- Belorus_Belaruski-Cyrillic
  Belorus_Belaruski-UTF8
- Bosnian_Bosanski-Cyrillic
  Bosnian_Bosanski-Latin2
  Bosnian_Bosanski-UTF8
- Bulgarian_Balgarski-Cyrillic
  Bulgarian_Balgarski-UTF8
- Burmese_Myanmar-UTF8
  Burmese_Myanmar-WinResearcher
- Cashibo-Cacataibo-Latin1
  Cashinahua-Latin1
- Catalan-Latin1
  Catalan_Catala-Latin1
- Chinanteco-Ajitlan-Latin1
  Chinanteco-UTF8
- Chinese_Mandarin-GB2312
  Chinese_Mandarin-HZ
  Chinese_Mandarin-UTF8
- Czech-Latin2
  Czech-Latin2-err
  Czech-UTF8
  Czech_Cesky-Latin2
  Czech_Cesky-UTF8
- Esperanto-T61
  Esperanto-UTF8
- Farsi_Persian-UTF8
  Farsi_Persian-v2-UTF8
- Greek_Ellinika-Greek
  Greek_Ellinika-UTF8
- HaitianCreole_Kreyol-Latin1
  HaitianCreole_Popular-Latin1
- Hebrew_Ivrit-Hebrew
  Hebrew_Ivrit-UTF8
- Hindi-UTF8
  Hindi_web-UTF8
- Hmong_Miao-Sichuan-Guizhou-Yunnan-Latin1
  Hmong_Miao-SouthernEast-Guizhou-Latin1
  Hmong_Miao_Northern-East-Guizhou-Latin1
- Hungarian_Magyar-Latin1
  Hungarian_Magyar-Latin2
  Hungarian_Magyar-UTF8
  Hungarian_Magyar-Unicode
- Italian-Latin1
  Italian_Italiano-Latin1
- Japanese_Nihongo-EUC
  Japanese_Nihongo-JIS
  Japanese_Nihongo-SJIS
  Japanese_Nihongo-UTF8
- Kazakh-Cyrillic
  Kazakh-UTF8
- Kinyamwezi_Nyamwezi-Latin1
  Kinyarwanda-Latin1
- Latin_Latina-Latin1
  Latin_Latina-v2-Latin1
- Magahi-Agra
  Magahi-UTF8
- Mapudungun_Mapuzgun-Latin1
  Mapudungun_Mapuzgun-UTF8
- Mongolian_Khalkha-Cyrillic
  Mongolian_Khalkha-UTF8
- Norwegian-Latin1
  Norwegian_Norsk-Bokmal-Latin1
  Norwegian_Norsk-Nynorsk-Latin1
- Nyanja_Chechewa-Latin1
  Nyanja_Chinyanja-Latin1
- OccitanAuvergnat-Latin1
  OccitanLanguedocien-Latin1
- Polish-Latin2
  Polish_Polski-Latin2
- Romani-Latin1
  Romani-UTF8
  Romanian-Latin2
  Romanian_Romana-Latin2
- Russian-Cyrillic
  Russian-UTF8
  Russian_Russky-Cyrillic
  Russian_Russky-UTF8
  Russian_Russky-UTF8~
- Serbian_Srpski-Cyrillic
  Serbian_Srpski-Latin2
  Serbian_Srpski-UTF8
- Slovak-Latin2
  Slovak_Slovencina-Latin2
- Spanish-Latin1
  Spanish_Espanol-Latin1
- Tonga-Latin1
  Tongan_Tonga-Latin1
- Turkish_Turkce-Turkish
  Turkish_Turkce-UTF8
- Uighur_Uyghur-Latin1
  Uighur_Uyghur-UTF8
- Ukrainian-Cyrillic
  Ukrainian-UTF8
- Vietnamese-ALRN-UTF8
  Vietnamese-TCVN
  Vietnamese-UTF8
  Vietnamese-VIQR
  Vietnamese-VPS
- Zapoteco-Latin1
  Zapoteco-SanLucasQuiavini-Latin1

/Peter


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »