Unicode 6.0 is released

36 views
Skip to first unread message

Roozbeh Pournader

unread,
Oct 11, 2010, 8:50:35 PM10/11/10
to Persian Computing
Unicode 6.0 was released today.

Here is links to the announcement:
http://www.unicode.org/press/pr-6.0.htm

These are of special interest to the Persian Computing community:

* Two characters have been encoded in the Arabic script block for use
in Kashmiri, one of the official languages of Jammu and Kashmir, the
Indian-administered part of Kashmir. The language is written in both
Arabic and Devanagari, along religious lines of Muslims and Hindus.

The two new characters are U+0620 Arabic Letter Kashmiri Yeh and
U+065F Arabic Wavy Hamza Below. Also, U+0673 Arabic Letter Alef With
Wavy Hamza Below has been deprecated (the first Arabic script
character to ever get deprecated in Unicode), and the character
sequence <U+0627, U+065F> should be used instead of it.

Unicode proposal (I'm a coauthor):
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3673.pdf

Updated Unicode chart:
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-0600.pdf

* Sixteen symbols have been encoded in the Arabic Presentations
Forms-A block for use in pedagogical materials and documents
discussing the features of the Arabic script.

Please note that these are not combining characters but stand-alone
symbols. These should only be used to display the dots and diacritics
in isolation, and not for making new letters. For example, one can
*not* use a Seen and add U+FBB6 Arabic Symbol Three dots Above to get
a Sheen. If you type that, you will get a Seen followed by three dots.
According to the standard, "These are spacing symbols representing
Arabic letter diacritics considered in isolation, as for example as in
discussions about the Arabic script."

Updated Unicode chart:
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-FB50.pdf

* Mandaic has been encoded. Mandaic is the script used by the
Mandaeans (mostly living in southern Iraq and southwestern Iran,
especially Khouzestan) for liturgical purposes. This the community
that some people believe the Qur'an refers to as Sabians, the third
member group of the People of the Book (next to Jews and Christians).

Michael Everson's proposal:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3485.pdf

Unicode chart:
http://www.unicode.org/charts/PDF/U0840.pdf

* Unicode Standard Annex #9, The Unicode Bidirectional Algorithm, has
been updated to include more information and some clarifications. Note
that the algorithm has not changed. The update just explains the
original intentions in more details. For the list of informational
changes to the text, see the following link (Behdad Esfahbod and I
have contributed to this and previous versions of the standard annex):

http://www.unicode.org/reports/tr9/tr9-23.html#Modifications

* A new data file has been added to the Unicode character database,
listing some characters that are used with several scripts (and which
scripts those are). For example, from the data file one can learn that
the Arabic Tatweel and some of the Arabic harakat are also used with
the Syriac script, the Arabic-Indic digits are also used with Thaana,
and the Arabic comma, semicolon, and question mark are also used with
both Syriac and Thaana:

http://www.unicode.org/Public/UNIDATA/ScriptExtensions.txt

* More than a thousand new symbols have been added, including lots of
symbols that you can find on electronics, maps, menus, signs, etc.
Most of these were added to support Emoji, symbols mostly used on
Japanese mobile phones for text messages, emails, chat, and even
cellphone novels:
http://en.wikipedia.org/wiki/Emoji
http://www.unicode.org/faq/emoji_dingbats.html

For you chart browsers over there, here are some of the blocks that
contain the new symbols (color-coded yellow):
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2300.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2600.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2700.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F0A0.pdf (playing cards)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F100.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F300.pdf (lots of
interesting new symbols, including symbols for beverage containers)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf
(emoticons, also known as smileys)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F680.pdf (transport
and map symbols)

Please note that Unicode encodes beverage containers, but not
alcoholic beverages (I personally made sure of that, to reduce
possible objections). For example, there is no BEER encoded, but only
BEER MUG (which is also used for non-alcoholic beer, among other
uses).

Religiously devout people that may object to some game characters or
musical instruments getting encoded should note that Unicode
implementations are not required to support any specific character,
and are allowed to choose their own set of characters to support. The
game symbols are encoded only for the sake of Unicode implementations
(especially those in East Asia) that need them to support their users.

* And finally, the official detail of additions and changes to the
standard, for the hard core:
http://www.unicode.org/versions/Unicode6.0.0/

Your friendly Unicode liaison [;)],
Roozbeh

Roozbeh Pournader

unread,
Oct 11, 2010, 8:52:56 PM10/11/10
to Persian Computing
On Mon, Oct 11, 2010 at 5:50 PM, Roozbeh Pournader <roo...@gmail.com> wrote:
> Here is links [sic] to the announcement:
> http://www.unicode.org/press/pr-6.0.htm

Oops. Here it is: http://www.unicode.org/press/pr-6.0.html

Roozbeh

Roozbeh Pournader

unread,
Oct 11, 2010, 9:41:35 PM10/11/10
to Persian Computing
I forgot to mention:

* Brahmi is also encoded, which is of use to Iranianists (some Iranian
languages like Khotanese have been written in Brahmi).

The most detailed proposal (although not the final one that got encoded):
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3491.pdf

Final Unicode chart:
http://www.unicode.org/charts/PDF/U11000.pdf

Roozbeh

Roozbeh Pournader

unread,
Oct 12, 2010, 4:07:07 PM10/12/10
to Persian Computing
Yet another improvement in Unicode 6.0 that I forgot to mention:

* The Qur'anic character U+06DE ARABIC START OF RUB EL HIZB has had
its glyph and properties changed.

For some unknown historical reason, the character was mistakenly
classified as a combining character instead of just a symbol, which
made it unusable. The character is now a normal spacing symbol and is
usable as originally intended.

Background document for the change (which I authored):
http://unicode.org/review/pr-171-rub-el-hizb.pdf

Roozbeh

Behdad Esfahbod

unread,
Oct 12, 2010, 4:15:39 PM10/12/10
to Roozbeh Pournader, Persian Computing
On 10/11/10 20:50, Roozbeh Pournader wrote:
> Your friendly Unicode liaison [;)],

What, no “[😉]”?

b

Reply all
Reply to author
Forward
0 new messages