Comments on some emoji symbols e-000 ... e-521

71 views
Skip to first unread message

Karl Pentzlin

unread,
Jan 6, 2009, 2:29:26 PM1/6/09
to uni...@unicode.org, emoji4...@googlegroups.com
Quick comments on some Emoji symbols
- Karl Pentzlin 2009-01-06

Reference:
http://www.unicode.org/~scherer/emoji4unicode/snapshot/full.html
as of 2009-01-06

The base for some of the comments are:
- Symbols which are not merely glyph variants of each other should not
be unified; if someone can address different semantics to two
symbols they are different symbols, even when they are used
interchangeably in the Japanese Telco context. When encoded in
Unicode, the context is no more limited such.
- Symbols should be named as they appear as emoji, not according
to the black-and-white fallback glyph which is associated to
them to print the Unicode charts. This means:
· Symbols with an inherent color shall bear this color in their
name unless the entity denoted by the name has identifies the color
anyway (e.g., a BANANA is uniquely yellow and therefore does
not need to be called YELLOW BANANA, while a RED APPLE must be
named so as there are also green apples).
· Symbols which semantics include animation shall have ANIMATED
as part of its name (this does not apply to symbols where
animation is a feature of glyph variance only).

All symbol names are relative to any generic prefixes which are applied
to the set of emoji symbols or subsets of it during the ongoing
discussion.

Any comment starting with "KDDI is", "DoCoMo is", or "SoftBank is"
is a request to not unify this with the other symbols of the same row.

e-004 KDDI is THUNDERSTORM WITH RAIN
e-008 SoftBank is NIGHT WITH FALLING STAR
e-014 should be named otherwise e.g. MOON-LIKE CRESCENT,
as a crescent moon must have its tips strictly opposite on
the enclosing circle. Naming this CRESCENT MOON is an offence
to anybody who knows the astronomical mechanisms.
e-02b...e-037: General comments sent by a previous mail.
e-036 The KDDI symbol shows one fish, while PISCES is plural.
Therefore, to complete the pictorial Zodiac set a picture
of two fishes is needed, while the KDDI symbol is "fish".
e-038 SoftBank is TSUNAMI (??)
e-03A should be named ERUPTING VOLCANO (in contrast to the Mount
Fuji symbol which may be required to be named VOLCANO to
avoid geographical preferences).
e-040 DoCoMo and SoftBank are PINK CHERRY BLOSSOM or JAPANESE CHERRY BLOSSOM
(Some European cherry trees blossom in white)
KDDI is PINK BLOSSOMING CHERRY TREE or JAPANESE BLOSSOMING CHERRY
TREE
e-044 just not to be listed under "nature", the symbol seems unequivocally
to be the newly licensed driver plate.
The name JAPANESE NEW LICENSED DRIVER SIGN seems preferable.
e-051 is RED APPLE
e-057 is WATER MELON - most melons sold in Europe are yellow and oval
e-05B is GREEN APPLE
e-190 is COMIC EYES or EYEBALLS
e-193 seems to be RED LIPS rather than generic MOUTH
e-197 is ANIMATED FACE MESSAGE
e-198 SoftBank is HAIRCUT (??)
e-19F is MAN WOMAN PAIR
e-1A1 is POLICEMANS HEAD WITH FLAT CAP
(in other countries, police caps may look definitively different)
if there is a police cap by SoftBank, this is a different FLAT POLICE CAP
e-1A2 KDDI is WOMANS HEAD WITH BUNNY EARS
SoftBank is TWO DANCING WOMEN WITH BUNNY EARS
e-1A3 is BRIDES HEAD WITH VEIL
e-1A4 at first glance: KDDI is BLOND WOMAN, SoftBank is BLOND MAN
It seems appropriate to recategorize:
e-19D DARK-HAIRED MANS HEAD
e-19E DARK-HAIRED WOMANS HEAD
e-1A4 (KDDI) BRIGHT-HAIRED WOMANS HEAD
e-1A4 (SoftBank) BRIGHT-HAIRED MANS HEAD
e-19B DARK-HAIRED BOYS HEAD
e-19B (variant) BRIGHT-HAIRED BOYS HEAD
e-19C (SoftBank) DARK-HAIRED GIRLS HEAD
e-19C (KDDI) BRIGHT-HAIRED GIRLS HEAD
e-1A5 is MANS HEAD WITH LONG MOUSTACHE
For reasons of political correctness, there must be two characters:
DARK-HAIRED MANS HEAD WITH LONG MOUSTACHE
BRIGHT-HAIRED MANS HEAD WITH LONG MOUSTACHE
Otherwise, some traditional Bavarians which use to wear long blonde
moustaches may be offended.
e-1A6 is MANS HEAD WITH TURBAN
*** It is *STRONGLY* objected to show this icon with another skin color
than the others
***
Alternatively, it has to be scrutinized whether ALL person and head
symbols have to be differentiated by BRIGHT SKINNED, BROWN SKINNED
and DARK SKINNED versions in a politically correct way which is acceptable
to all people in the world.
e-1A7 is OLDER MANS HEAD
e-1A8 is OLDER WOMANS HEAD
e-1A9 is BABYS HEAD
e-1AA is CONSTRUCTION WORKERS HEAD WITH HELMET
e-1AB is YOUNG BRIGHT-HAIRED PRINCESS HEAD or BRIGHT-HAIRED GIRLS HEAD WITH CROWN
e-1AC is RED FACED OGRES HEAD
e-1AD is LONG-NOSED GOBLINS HEAD
e-1AF is PUTTO ANGEL (simply ANGEL may be offensive to some religious people)
e-1B0 KDDI is ALIEN SPACESHIP, SoftBank is BIG-EYED ALIEN FACE
e-1B2 is FACE WITH DEVILS HORNS (simply DEVIL may be offensive to some
religious or superstitious people)
e-1B6 KDDI is ANIMATED MALE DISCO DANCER,
SoftBank is ANIMATED FEMALE FLAMENCO DANCER
e-1B7 is DOG FACE, SoftBank is PUPPY FACE, similarly
e-1B8,1BF,1C0,1C1,1C2,1CA,1D1,1D2,1D7,154 add " FACE" like it is done for e-1C4
MONKEY FACE
e-1BD see comment for e-036
e-1C8 is SITTING WHITE BIRD
e-1D0 is FOX HEAD
e-353 is ANIMATED BOWING FACE
e-357 is ANIMATED PERSON RAISING ONE HAND, SoftBank is PALM OF HAND
e-358 is ANIMATED PERSON RAISING BOTH HANDS, SoftBank is
ANIMATED PAIR OF HANDS OPENING AND CLOSING
e-359 is ANIMATED PERSON FROWNING
e-35A is ANIMATED PERSON MAKING POUTING FACE
e-35B SoftBank is PAIR OF RAISED FOLDED HANDS
e-4B0 is SMALL HOUSE
e-4B4 is HOSPITAL DENOTED BY CROSS SYMBOL
e-4B5 is ALPHABETIC BANK SYMBOL
e-4b6 is AUTOMATIC TELLER MACHINE SYMBOL
e-4b7 DoCoMo is LATIN LETTER H ENCLOSED IN A HOUSE SYMBOL
e-4C2 is RED LANTERN DENOTING JAPANESE IZAKAYA RESTAURANT
("red lantern" is a symbol for two totally different concepts in European
culture: a. brothel, b. being the last one in a sports competition)
e-4CA is WORKERS MALLET (as it looks different from the common household hammer)
e-4CC is MENS LOW SHOE
e-4D2 is TRIDENT (the listing under Clothing/Wearables is wrong)
e-4D5 is LADIES FORMAL DRESS
e-4D7 is MULE SHOE or similar, to denote it is not the animal called mule
e-4DD should be encoded as an enclosing combining mark MONEY BAG, which can
be applied to any currency symbol
e-4DE is DOLLAR YEN CURRENCY EXCHANGE
e-4DF is CHART WITH RISING CURVE AND YEN SYMBOL
e-4EF is SINGLE-LENS REFLEX STILL PICTURE CAMERA
e-4F4 is FAECES or PICTORIAL EXPRESSION OF DISDAIN
e-4F7 is CRYSTAL BALL ON RACK
e-4Fa is MEAT CLEAVER
e-4FB is TORCH
e-4FD this is a nonstandard symbol for window scrolling and must be named in
a way that it is not mistaken for any ISO 7000 or similar symbol;
thus it must get a prefix like JAPANESE TELCO SYMBOL if it gets
no generic name prefix for the emoji set or a subset
e-4FE is ELECTRIC PLUG WITH CABLE
e-4FF is GREEN CLOSED BOOK LYING WITH BACK TO THE RIGHT
in this way applicable to books to be read from right to left
e-500 is BLUE CLOSED BOOK LYING WITH BACK TO THE RIGHT
e-501 is ORANGE CLOSED BOOK LYING WITH BACK TO THE RIGHT
e-502 is FRONT OF GREEN BOOK WITH LABEL or FRONT OF GREEN NOTEBOOK WITH LABEL
e-503 is STACK OF BOOKS LYING WITH BACK TO THE LEFT
e-505 KDDI is WOMANS HEAD WITH BATHING CAP, SoftBank is PERSON TAKING A BATH
e-506 is LADIES AND GENTS RESTROOMS SIGN
e-509 is SYRINGE WITH DROP OF BLOOD
e-50B/C/D/E: depending on the way coloring of those emojis which are unique
when disregarding color are treated eventually: If the black-and-white
equivalents are to be encoded:, these are:
KDDI: new U+1F130 SQUARED LATIN LETTER A
existing U+1F131 SQUARED LATIN LETTER B
new U+1F1xx SQUARED DIGIT 0
new U+1F1xx SQUARED AB
SoftBank: new U+1F150 WHITE ON BLACK CIRCLED LATIN CAPITAL LETTER A
new U+1F151 WHITE ON BLACK CIRCLED LATIN CAPITAL LETTER B
new U+1F1xx WHITE ON BLACK CIRCLED DIGIT 0
(probably to be unified with U+24FF NEGATIVE CIRCLED DIGIT ZERO)
new U+1F1xx WHITE ON BLACK CIRCLED AB
e+513 is SANTA CLAUS FACE
e+515 is ANIMATED NIGHT SKY WITH FIREWORKS
e+517 is ANIMATED PARTY POPPER
e+51D is ANIMATED NIGHT SKY WITH JAPANESE SPARKLER
e+520 is ANIMATED OPENING CONFETTI BALL
----------- Comments for emoji symbols starting from e+522 may follow later.

John Cowan

unread,
Jan 6, 2009, 2:46:18 PM1/6/09
to Karl Pentzlin, uni...@unicode.org, emoji4...@googlegroups.com
Karl Pentzlin scripsit:

> · Symbols with an inherent color shall bear this color in their
> name unless the entity denoted by the name has identifies the color
> anyway (e.g., a BANANA is uniquely yellow and therefore does
> not need to be called YELLOW BANANA,

Beware parochialism: not all bananas are Cavendishes. I eat
short red bananas regularly, and think they taste better. See
http://www.allaboutyou.com/?module=images&func=display&fileId=51780 .
Wikipedia says there are also purple varieties, and of course there are
less-sweet bananas that always remain green, usually called "plantains"
in English (not to be confused with the herbs of genus _Plantago_ which
share that name).

--
My corporate data's a mess! John Cowan
It's all semi-structured, no less. http://www.ccil.org/~cowan
But I'll be carefree co...@ccil.org
Using XSLT
On an XML DBMS.

Karl Pentzlin

unread,
Jan 6, 2009, 3:26:38 PM1/6/09
to John Cowan, uni...@unicode.org, emoji4...@googlegroups.com
Am Dienstag, 6. Januar 2009 um 20:46 schrieb John Cowan:

JC> Karl Pentzlin scripsit:
>> ...(e.g., a BANANA is uniquely yellow and therefore does


>> not need to be called YELLOW BANANA,

JC> ... I eat short red bananas regularly ...

As I wrote in my original mail, my comments were *quick* comments.
I better had written something like: The abstract concept of a banana
is linked to default color "yellow" for most people as long as they do
not consider a specific subset of the set of all bananas.

I admit that this also applies to cherry blossoms: Thus, there is no
need to use PINK CHERRY BLOSSOM rather than simply CHERRY BLOSSOM.

>> · Symbols with an inherent color shall bear this color in their
>> name unless the entity denoted by the name has identifies the color
>> anyway

I change this to:

· Symbols with an inherent color shall bear this color in their

name if there is another symbol (already encoded, or proposed,
or reasonably not excluded to occur in a future proposal) which
differs in color only (e.g. RED APPLE, GREEN APPLE).

Thus, no bananas anymore.

- Karl Pentzlin

Markus Scherer

unread,
Jan 6, 2009, 3:44:02 PM1/6/09
to emoji4...@googlegroups.com, uni...@unicode.org
On Tue, Jan 6, 2009 at 11:29 AM, Karl Pentzlin <karl-p...@acssoft.de> wrote:
Quick comments on some Emoji symbols

Wow, that's a long list!
I copied this into project issue 64 for now. We will have to discuss these.

Some general comments:

On colors: We considered symbol colors for disunification but rarely for character names. Instead, with UTC guidance, we unified a number of symbols with existing characters which have black/white/striped... glyphs and names. For newly proposed symbols, we followed the precedent and chose similar character names, matching the glyphs in the font that is being worked on.

On disunifications: At a glance, it looks like many of the suggested disunifications assume more specific and precise meanings and shapes than are intended by the cell phone carriers. For example,
- If a symbol generally looks like a crescent moon (e-014) and is described or named by the carriers to represent one, it makes little sense to give it a different meaning based on an imprecise symbol shape. (What we can do is design a better glyph.)
- If a carrier clearly intends a certain meaning, and shows that in name, shape, context of surrounding symbols and maybe other available information, we should follow that meaning and not artificially invent a separate symbol and meaning. (e-036 pisces vs. KDDI single fish)
- The carriers' understanding of "glyph variants", as expressed in symbol names and cross-mapping tables, is clearly broader than your sense of "glyph variants". For interoperability, we usually try to follow the carriers' cross-mappings, except when they are way off (as in e-7E0 subway vs. e-7E1 metro sign, which has been discussed by the UTC before).

Many thanks, and best regards,
markus

Kenneth Whistler

unread,
Jan 6, 2009, 4:00:47 PM1/6/09
to karl-p...@acssoft.de, uni...@unicode.org, emoji4...@googlegroups.com, ke...@sybase.com

> Quick comments on some Emoji symbols
> - Karl Pentzlin 2009-01-06
>
> Reference:
> http://www.unicode.org/~scherer/emoji4unicode/snapshot/full.html
> as of 2009-01-06
>
> The base for some of the comments are:
> - Symbols which are not merely glyph variants of each other should not
> be unified; if someone can address different semantics to two
> symbols they are different symbols, even when they are used
> interchangeably in the Japanese Telco context. When encoded in
> Unicode, the context is no more limited such.

I disagree. It is true that encoding a character for a symbol in
Unicode puts it in a context where it might not always be
limited to transcoding for the Japanese wireless sets, so that
due consideration must be given to how this is done. However,
when what we are encoding is a compability character for an emoji
which is *already* unified by de facto mappings between the
various carrier sets, it is not helpful -- in fact is disruptive --
to disunify glyph variants simply because the telcos use different
glyphs to display the cross-mapped character in question.

In such cases, as for the zodiac symbols which you wrote a
separate note on (and which Markus responded to), the correct
encoding solution here is to treat the cross-mapped emoji as
a *single* character for encoding, and then to either encode
a new single Unicode character (if no existing Unicode character
is appropriate) or to map to a single Unicode character if
one already exists -- as for the zodiac signs.

If a separate need occurs in the future to distinguish
animal-pictorial representations of zodiac signs, for example,
from traditional astrological symbolic representations of
zodiac signs, that needs to be done in a separate context
and be separately argued from the current emoji set -- because
separately encoding them on the basis merely of the distinct
glyphs used by the wireless carriers would *not* be a helpful
or useful solution to the emoji cross-mapping to Unicode problem.

> - Symbols should be named as they appear as emoji, not according
> to the black-and-white fallback glyph which is associated to
> them to print the Unicode charts. This means:

> · Symbols with an inherent color shall bear this color in their
> name unless the entity denoted by the name has identifies the color

> anyway (e.g., a BANANA is uniquely yellow and therefore does
> not need to be called YELLOW BANANA, while a RED APPLE must be
> named so as there are also green apples).

I disagree. This principle is simply not helpful. It perpetuates
the notion that colors are *inherently* a part of the character
identity here. And that does not serve the purpose of providing
a cross-mapping set for interoperability with the emoji characters.
It would be far, far better to simply have some abstracted
compability characters identified as EMOJI SYMBOL FOR BOOK-1,
EMOJI SYMBOL FOR BOOK-2, EMOJI SYMBOL FOR BOOK-3, etc., rather
than to insist on encoding RED BOOK SYMBOL, BLUE BOOK SYMBOL,
ORANGE BOOK SYMBOL, and then jump off the deep end in insisting
that the associated glyphs actually need to support color
distinctions.

> · Symbols which semantics include animation shall have ANIMATED
> as part of its name (this does not apply to symbols where
> animation is a feature of glyph variance only).

I disagree. This is the same issue as for the colored glyphs,
only more so. It is simply not helpful to insist that
"ANIMATED" be part of the character name, when that is a
description of the animated glyphs used on phones, rather
than a useful identifying label for the *character* we are
going to encode to represent the symbol in question.

> All symbol names are relative to any generic prefixes which are applied
> to the set of emoji symbols or subsets of it during the ongoing
> discussion.
>
> Any comment starting with "KDDI is", "DoCoMo is", or "SoftBank is"
> is a request to not unify this with the other symbols of the same row.

And I will simply put my comment in as opposing *all* such
disunifications across the board, without objecting to each
individual suggestion one-by-one below. I think this whole
approach is a very deep semiotic trap that completely
misconstrues both the problem and the nature of the solution
required for cross-mapping the emoji sets in Unicode.

> e-1A5 is MANS HEAD WITH LONG MOUSTACHE
> For reasons of political correctness, there must be two characters:
> DARK-HAIRED MANS HEAD WITH LONG MOUSTACHE
> BRIGHT-HAIRED MANS HEAD WITH LONG MOUSTACHE
> Otherwise, some traditional Bavarians which use to wear long blonde
> moustaches may be offended.

This is an example of the kind of dead end that this approach
results in. The problem here is to create a standard mapping
code point in Unicode for the emoji symbol listed at e-1A5.
The problem is *not* to solve some generic issue of how to
represent all races, skin colors, and masculine facial hair
styles politically correctly via character codes.

> e-1B6 KDDI is ANIMATED MALE DISCO DANCER,
> SoftBank is ANIMATED FEMALE FLAMENCO DANCER

That is another example of a completely unhelpful disunification,
as well as an example of the inappropriate application
of "ANIMATED" to a character name. The symbolic concept
being represented here is of a dancer. The glyphs chosen
on the phones to display that concept are animated and
designed differently. But encoding distinct characters and
making them overly specific to glyph designs is simply not
a useful direction to take for the character encoding for
the purpose intended here.

I could make similar comments one-by-one, but it should be clear
that I object to the complete set of comments in principle,
rather than just here and there on its details.

--Ken


Andrew West

unread,
Jan 7, 2009, 5:21:42 AM1/7/09
to Kenneth Whistler, uni...@unicode.org, emoji4...@googlegroups.com
2009/1/6 Kenneth Whistler <ke...@sybase.com>:

>
> I disagree. It is true that encoding a character for a symbol in
> Unicode puts it in a context where it might not always be
> limited to transcoding for the Japanese wireless sets, so that
> due consideration must be given to how this is done. However,
> when what we are encoding is a compability character for an emoji

I wish that people would stop using the term "compatibility character"
in relation to the proposed emoji characters. As has been discussed
already, these are not compatibility characters in the normal Unicode
sense of the term, i.e. referring to characters that have a
compatibility mapping to an existing Unicode character or character
sequence. Calling them compatibility characters seems confusing or
even disingenuous.

> It would be far, far better to simply have some abstracted
> compability characters identified as EMOJI SYMBOL FOR BOOK-1,
> EMOJI SYMBOL FOR BOOK-2, EMOJI SYMBOL FOR BOOK-3, etc., rather

This is by far the best suggestion I have heard so far in this debate.
The major problem I see with the emoji proposal is its open-endedness
-- many people are rightly concerned that if you encode 10 flag
symbols, you open up the possibility of encoding an indefinite number
of flag symbols. If we name the proposed emoji flag symbols as EMOJI
SYMBOL FOR FLAG-1, etc. we are no longer encoding specific flag
symbols for a select few countries, but just encoding emoji flag
symbols 1-10. Thus, we are not setting a precedent for encoding flag
symbols for this or that country who feels snubbed by the omission of
their national flag. Equally, if in the future a decision is taken to
encode flag symbols for all members of the UN, then that can be done
without any reference to the existing emoji flag symbols.

Likewise, simple character names such as ANGEL and DEVIL for emoji
characters worry me, as they appropriate a character name that may
have more generic associations. I would rather see character names
such as EMOJI SYMBOL FOR ANGEL and EMOJI SYMBOL FOR DEVIL, leaving
open the possibility of encoding a generic ANGEL or DEVIL character if
ever required.

And the problems that Karl has pointed out with regards to skin colour
could be dispensed with if we use less descriptive names such as EMOJI
SYMBOL FOR MANS HEAD-1, EMOJI SYMBOL FOR MANS HEAD-2, etc.

At present I am still undecided about the emoji proposal, but like
many people here I see serious problems with it as it stands. However,
if the character naming convention for emoji characters were to be
changed to EMOJI SYMBOL FOR XXX(-N) it would, in my opinion, be an
important step in the right direction.

Andrew

John M. Fiscella

unread,
Jan 7, 2009, 2:00:23 PM1/7/09
to Andrew West, John M. Fiscella, Kenneth Whistler, [unknown], [unknown]
Message text written by "Andrew West"

>This is by far the best suggestion I have heard so far in this debate.
The major problem I see with the emoji proposal is its open-endedness
-- many people are rightly concerned that if you encode 10 flag
symbols, you open up the possibility of encoding an indefinite number
of flag symbols. If we name the proposed emoji flag symbols as EMOJI
SYMBOL FOR FLAG-1, etc. we are no longer encoding specific flag
symbols for a select few countries, but just encoding emoji flag
symbols 1-10. Thus, we are not setting a precedent for encoding flag
symbols for this or that country who feels snubbed by the omission of
their national flag. Equally, if in the future a decision is taken to
encode flag symbols for all members of the UN, then that can be done
without any reference to the existing emoji flag symbols.<

If the UTC decides to encode flag symbols, it must be done with the
realization that the collection would have to be open ended. Obviously, one
country could divide into more than one, and all the additional flag
symbols would need to be encoded. Extrapolating this line of thought, the
entire concept and handling of "planes" ultimately will need reexamination,
including the limitation of UTF-32, which should really be reset to UCS-32,
the sooner the better. But software developers will also have envolvement
with this.

>Likewise, simple character names such as ANGEL and DEVIL for emoji
characters worry me, as they appropriate a character name that may
have more generic associations. I would rather see character names
such as EMOJI SYMBOL FOR ANGEL and EMOJI SYMBOL FOR DEVIL, leaving
open the possibility of encoding a generic ANGEL or DEVIL character if
ever required.<

This only makes common sense. After all, U+0041 is "LATIN CAPITAL LETTER A"
not just "CAPITAL LETTER A." But clear thinking must decide what
classification Emoji characters fall into (SYMBOL, IDEOGRAPH, SYLLABLE,
etc.), perhaps inventing a new category. I personally do not swallow the
idea of Emoji being classified as "SYMBOL" because a symbol simply imparts
notification, not an expression of human thought or a component of an
expression of human thought coded into data.

>And the problems that Karl has pointed out with regards to skin colour
could be dispensed with if we use less descriptive names such as EMOJI
SYMBOL FOR MANS HEAD-1, EMOJI SYMBOL FOR MANS HEAD-2, etc.<

I haven't studied all the Emoji characters well enough to determine if
there are any racially-distinctive Emoji (BLACK MANS HEAD, WHITE MANS HEAD,
ASIAN MANS HEAD, etc.).

>At present I am still undecided about the emoji proposal, but like
many people here I see serious problems with it as it stands. However,
if the character naming convention for emoji characters were to be
changed to EMOJI SYMBOL FOR XXX(-N) it would, in my opinion, be an
important step in the right direction.
<

Agreed, with possibly some other descriptive term replacing SYMBOL.

The UTC needs to get back to basics and start scientifically examining and
possibly redefining the terms GRAPHIC, SYMBOL, LOGO, IDEOGRAPH, HIEROGLYPH,
SYLLABLE, GRAPHEME, PHENOME, etc. in a more complete and satisfactory
manner before Emoji characters are slapped with a classification label.

John F.

Michael Everson

unread,
Jan 7, 2009, 4:37:43 PM1/7/09
to unicore UnicoRe Discussion, emoji4...@googlegroups.com, sym...@unicode.org
I begin: I like symbols. I like encoding them. I've encoded many.

On 7 Jan 2009, at 21:16, James Kass wrote:

We've been given explanations about pragmatic interoperability requirements for Japanese cell phone users, but the Japanese cell phone users have no such requirement related to Unicode plain-text.  Their interoperability needs have already been met by the vendors selling the services.

It DOESN'T MATTER.

It doesn't matter that these things are used on proprietary networks right now. It can be understood that there is some leakage in text content from those networks into other forms of text, which are encoded in the Universal Character Set.

So from that standpoint I can understand why e.g. Google is concerned with transmitting that text without loss, or without too much loss.

**HOWEVER**:

On 7 Jan 2009, at 10:21, Andrew West wrote:

Likewise, simple character names such as ANGEL and DEVIL for emoji characters worry me, as they appropriate a character name that may have more generic associations. I would rather see character names such as EMOJI SYMBOL FOR ANGEL and EMOJI SYMBOL FOR DEVIL, leaving open the possibility of encoding a generic ANGEL or DEVIL character if
ever required.

Here is where I disagree with Andrew. (Andrew and I rarely disagree.)

It may be the case that these symbols are derived from a particular environment. However, once a character used as Emoji is encoded in the Universal Character Set, IT CEASES TO BE AN EMOJI CHARACTER.

It becomes a character like any other. It becomes a symbol that anyone can use for ANY PURPOSE at all.

It is in that light that I am now attempting to evaluate the names and mappings and representative glyphs, now that a font representation (which shows what the Emoji committee is thinking) is available. 

I would start by saying that many if not most of the proposed glyphs in that font are not by any means generic enough for the final code charts, but at least we now have a sounder position from which to evaluate the draft proposal, its names, unifications, and glyphs.

Michael Everson * http://www.evertype.com

Asmus Freytag

unread,
Jan 7, 2009, 6:10:49 PM1/7/09
to Andrew West, Kenneth Whistler, uni...@unicode.org, emoji4...@googlegroups.com
A general comment on terminology, and then some more detailed feedback
on (general) naming issues.

On 1/7/2009 2:21 AM, Andrew West wrote:
> 2009/1/6 Kenneth Whistler <ke...@sybase.com>:
>
>> I disagree. It is true that encoding a character for a symbol in
>> Unicode puts it in a context where it might not always be
>> limited to transcoding for the Japanese wireless sets, so that
>> due consideration must be given to how this is done. However,
>> when what we are encoding is a compability character for an emoji
>>
>
> I wish that people would stop using the term "compatibility character"
> in relation to the proposed emoji characters. As has been discussed
> already, these are not compatibility characters in the normal Unicode
> sense of the term, i.e. referring to characters that have a
> compatibility mapping to an existing Unicode character or character
> sequence. Calling them compatibility characters seems confusing or
> even disingenuous.
>

I respectfully disagree. If you read the discussion in chapter two of
the Standard, it states that what you are concerned with are a "second
narrow sense" of "compatibility character" and goes on to define a term
for this restricted category "compatibility decomposable character".

Compatibility characters, in the broad sense, are all characters that
are not ordinary characters (my term) but were included in the standard
because of the 10th design principle, *convertibility*, which is also
described in chapter 2.

In the current proposal, a large number of characters clearly could
qualify as "ordinary" characters. I agree, that these would not be
compatibility characters under any definition, and that therefore, it is
incorrect to consider the *entire* set of emoji as compatibility
characters, even if it is true that the set as a whole is proposed for
reasons of convertibility (10th principle).

The remainder of the set, however, are compatibility characters as that
term has been understood in Unicode from the beginning, even though
none, or at most very few, would be "compatibility decomposable
characters".


>
>> It would be far, far better to simply have some abstracted
>> compability characters identified as EMOJI SYMBOL FOR BOOK-1,
>> EMOJI SYMBOL FOR BOOK-2, EMOJI SYMBOL FOR BOOK-3, etc., rather
>>
>
> This is by far the best suggestion I have heard so far in this debate.
>

This is appropriate primarily for those symbols where there is no
clear-cut semantic differentiation that can be gleaned from existing
documentation, or where it's questionable that similar symbols are
consistently used with contrasting semantics.

> if the character naming convention for emoji characters were to be
> changed to EMOJI SYMBOL FOR XXX(-N) it would, in my opinion, be an
> important step in the right direction.

To roll out this type of generic naming to the entire set is unhelpful.
As Michael Everson pointed out, when characters are encoded they are not
artificially limited in how they are used in the Unicode context. That,
by the way, is the motivation why there's no formal property for the
subset of compatibility characters that are not also compatibility
decomposable characters.


> The major problem I see with the emoji proposal is its open-endedness
> -- many people are rightly concerned that if you encode 10 flag
> symbols, you open up the possibility of encoding an indefinite number
> of flag symbols. If we name the proposed emoji flag symbols as EMOJI
> SYMBOL FOR FLAG-1, etc. we are no longer encoding specific flag
> symbols for a select few countries, but just encoding emoji flag
> symbols 1-10. Thus, we are not setting a precedent for encoding flag
> symbols for this or that country who feels snubbed by the omission of
> their national flag. Equally, if in the future a decision is taken to
> encode flag symbols for all members of the UN, then that can be done
> without any reference to the existing emoji flag symbols.
>

That's a real concern, but not one that should be addressed by playing
tricks with character naming.


> Likewise, simple character names such as ANGEL and DEVIL for emoji
> characters worry me, as they appropriate a character name that may
> have more generic associations. I would rather see character names
> such as EMOJI SYMBOL FOR ANGEL and EMOJI SYMBOL FOR DEVIL, leaving
> open the possibility of encoding a generic ANGEL or DEVIL character if
> ever required.
>

I share that concern - but "EMOJI SYMBOL FOR" is not my preferred
solution. I would want to make sure that an "ANGEL FACE" is properly
distinguished from an "ANGEL" (whole body), etc. Also, for map markers
(church, hospital) for which non-EMOJI versions exist that use unrelated
symbols that the names be qualified somehow. But this should be done in
a way that doesn't tie these symbols to their use as EMOJI (because many
of the EMOJI are derived from other sources of symbols).

A./

Christopher Fynn

unread,
Jan 9, 2009, 4:01:29 AM1/9/09
to emoji4...@googlegroups.com
On 08/01/2009, Asmus Freytag <asm...@ix.netcom.com> wrote:

> A general comment on terminology, and then some more detailed feedback
> on (general) naming issues.

> On 1/7/2009 2:21 AM, Andrew West wrote:
>> 2009/1/6 Kenneth Whistler <ke...@sybase.com>:
>>
>>> I disagree. It is true that encoding a character for a symbol in
>>> Unicode puts it in a context where it might not always be
>>> limited to transcoding for the Japanese wireless sets, so that
>>> due consideration must be given to how this is done. However,
>>> when what we are encoding is a compability character for an emoji

>> I wish that people would stop using the term "compatibility character"
>> in relation to the proposed emoji characters. As has been discussed
>> already, these are not compatibility characters in the normal Unicode
>> sense of the term, i.e. referring to characters that have a
>> compatibility mapping to an existing Unicode character or character
>> sequence. Calling them compatibility characters seems confusing or
>> even disingenuous.

> I respectfully disagree. If you read the discussion in chapter two of
> the Standard, it states that what you are concerned with are a "second
> narrow sense" of "compatibility character" and goes on to define a term
> for this restricted category "compatibility decomposable character".

...

Why not call those emoji that are not simply symbols,
"interoperability characters"?

- C

Michael Everson

unread,
Jan 9, 2009, 4:56:43 AM1/9/09
to unicore UnicoRe Discussion, emoji4...@googlegroups.com, sym...@unicode.org
I begin: I like symbols. I like encoding them. I've encoded many.

On 7 Jan 2009, at 21:16, James Kass wrote:

We've been given explanations about pragmatic interoperability requirements for Japanese cell phone users, but the Japanese cell phone users have no such requirement related to Unicode plain-text.  Their interoperability needs have already been met by the vendors selling the services.

It DOESN'T MATTER.

It doesn't matter that these things are used on proprietary networks right now. It can be understood that there is some leakage in text content from those networks into other forms of text, which are encoded in the Universal Character Set.

So from that standpoint I can understand why e.g. Google is concerned with transmitting that text without loss, or without too much loss.

**HOWEVER**:

On 7 Jan 2009, at 10:21, Andrew West wrote:

Likewise, simple character names such as ANGEL and DEVIL for emoji characters worry me, as they appropriate a character name that may have more generic associations. I would rather see character names such as EMOJI SYMBOL FOR ANGEL and EMOJI SYMBOL FOR DEVIL, leaving open the possibility of encoding a generic ANGEL or DEVIL character if
ever required.

Here is where I disagree with Andrew. (Andrew and I rarely disagree.)

It may be the case that these symbols are derived from a particular environment. However, once a character used as Emoji is encoded in the Universal Character Set, IT CEASES TO BE AN EMOJI CHARACTER.

It becomes a character like any other. It becomes a symbol that anyone can use for ANY PURPOSE at all.

It is in that light that I am now attempting to evaluate the names and mappings and representative glyphs, now that a font representation (which shows what the Emoji committee is thinking) is available. 

I would start by saying that many if not most of the proposed glyphs in that font are not by any means generic enough for the final code charts, but at least we now have a sounder position from which to evaluate the draft proposal, its names, unifications, and glyphs.

Michael Everson

unread,
Jan 9, 2009, 5:23:49 AM1/9/09
to Michael Everson, unicore UnicoRe Discussion, emoji4...@googlegroups.com, sym...@unicode.org
The names for the proposed U+23EA are not conformant. To avoid the
apostrophe problem in O'CLOCK/OCLOCK/O-CLOCK, I propose the following:

CLOCK FACE H0100
CLOCK FACE H0200
CLOCK FACE H0300
CLOCK FACE H0400
CLOCK FACE H0500
CLOCK FACE H0600
CLOCK FACE H0700
CLOCK FACE H0800
CLOCK FACE H0900
CLOCK FACE H1000
CLOCK FACE H1100
CLOCK FACE H1200

Otherwise it ought to be

CLOCK FACE ONE OCLOCK
or
CLOCK FACE ONE O-CLOCK
or
CLOCK FACE ONE OF THE CLOCK

In any case 1 OCLOCK is disallowed.

katm...@gmail.com

unread,
Jan 9, 2009, 5:34:09 AM1/9/09
to emoji4unicode
Thanks, Michael,

This was filed as Issue 75: http://code.google.com/p/emoji4unicode/issues/detail?id=75

- Kat

Karl Pentzlin

unread,
Jan 11, 2009, 7:38:58 PM1/11/09
to Kenneth Whistler, uni...@unicode.org, emoji4...@googlegroups.com
Am Dienstag, 6. Januar 2009 um 22:00 schrieb Kenneth Whistler:

(Karl Pentzlin 2009-01-06 20:29):


>> - Symbols should be named as they appear as emoji, not according
>> to the black-and-white fallback glyph which is associated to
>> them to print the Unicode charts. This means:
>> · Symbols with an inherent color shall bear this color in their
>> name unless the entity denoted by the name has identifies the color

(should have been "implies" or "usually or implicitly is
associated with" rather than "identifies")
KW> I disagree. This principle is simply not helpful. It perpetuates
KW> the notion that colors are *inherently* a part of the character
KW> identity here.

If the colors are inherently a part of the character identity, they
are. We are talking about text entities to be primarily displayed
onto a medium which inherently enables color and animation.

KW> And that does not serve the purpose of providing
KW> a cross-mapping set for interoperability with the emoji characters.

This is plainly false.
The emojis are presented without any unique or Latin-written names.
Thus, the Unicode names are completely irrelevant for interoperability,
they have to be correct on their own.

KW> It would be far, far better to simply have some abstracted
KW> compability characters identified as EMOJI SYMBOL FOR BOOK-1,
KW> EMOJI SYMBOL FOR BOOK-2, EMOJI SYMBOL FOR BOOK-3, etc., rather
KW> than to insist on encoding RED BOOK SYMBOL, BLUE BOOK SYMBOL,
KW> ORANGE BOOK SYMBOL ...

It is at least better than BOOK WITH HORIZONTAL FILL etc.

If you encode a "purple heart" (e-B16) in contrast to other colored
hearts (e-B0C red heart, e-B15 yellow or golden heart, etc.),
and name it:
- PURPLE HEART, you name it according its real character identity.
- EMOJI SYMBOL FOR HEART-1, you obviously and visibly apply a trick,
but this is at least not dishonest.
- STRIPED HEART, you use a trick which is not obvious as long as you
look at the name and the representative glyph, thus it is a dirty trick.

If Unicode had required all characters to be carvable into stone using
hammer and chisel only, you were not able to encode a purple heart.
Fortunately, there is no Unicode principle requiring this.
The fact that all (not-control) Unicode characters hitherto encoded are
representable by black and white printing (with the exception of
U+2591...U+2593 which in fact contain their shade in their name
without recurring to dirty tricks) does not imply this for the future.

Times are changing.
Electronic media which inherently use color and animation lead to
the emergence of characters which rely on these features.

If you want to encode a STRIPED HEART, please show evidence of a
STRIPED HEART in carving, writing, print, or display.
A "purple heart" does not count as evidence for this.

KW> and then jump off the deep end in insisting that the associated
KW> glyphs actually need to support color distinctions.

As I said: If they do, they do.
What else are the distinctions between e-B0C, e-B13, e-B14, e-B15, e-B16?

(Karl Pentzlin 2009-01-06 20:29):


>> Symbols which semantics include animation shall have ANIMATED
>> as part of its name (this does not apply to symbols where
>> animation is a feature of glyph variance only).

KW> ... This is the same issue as for the colored glyphs,

Yes.

KW> only more so. It is simply not helpful ...

Neither helpful nor obstructive regarding to interoperability,
but in fact simply helpful for the integrity of Unicode itself!

KW> ..., to insist that
KW> "ANIMATED" be part of the character name, when that is a
KW> description of the animated glyphs used on phones, rather
KW> than a useful identifying label for the *character* we are
KW> going to encode to represent the symbol in question.

It is not "rather than", but "likewise". A name is useful as an
identifying label for the character if it in fact is a description for
all common glyphs of its glyph variance spectrum which occur on the
medium where it is created for.

- Karl Pentzlin


Christopher Fynn

unread,
Jan 12, 2009, 1:08:45 AM1/12/09
to emoji4...@googlegroups.com
On 12/01/2009, Karl Pentzlin <karl-p...@acssoft.de> wrote:

> Am Dienstag, 6. Januar 2009 um 22:00 schrieb Kenneth Whistler:

> (Karl Pentzlin 2009-01-06 20:29):
>>> - Symbols should be named as they appear as emoji, not according
>>> to the black-and-white fallback glyph which is associated to
>>> them to print the Unicode charts. This means:
>>> · Symbols with an inherent color shall bear this color in their
>>> name unless the entity denoted by the name has identifies the color
> (should have been "implies" or "usually or implicitly is
> associated with" rather than "identifies")
> KW> I disagree. This principle is simply not helpful. It perpetuates
> KW> the notion that colors are *inherently* a part of the character
> KW> identity here.
>
> If the colors are inherently a part of the character identity, they
> are. We are talking about text entities to be primarily displayed
> onto a medium which inherently enables color and animation.

If colors (and/or animation) are inherently part of emoji characters
- should any of them be unified with existing UCS characters that
have no color (or are presumed B&W)? It might be better to encode the
whole lot of emoji as a block of interoperability or compatibility
characters.
- C

Markus Scherer

unread,
Jan 15, 2009, 7:58:11 PM1/15/09
to emoji4...@googlegroups.com, karl-p...@acssoft.de, uni...@unicode.org, ke...@sybase.com
On Tue, Jan 6, 2009 at 1:00 PM, Kenneth Whistler <ke...@sybase.com> wrote:
> Quick comments on some Emoji symbols
>   - Karl Pentzlin 2009-01-06

On reviewing these suggestions more closely, I agree with Ken that the suggestions are based on an overly pedantic interpretation of the carrier images, disregarding both common practice of naming Unicode symbols as well as the carriers' cross-mappings.

Best regards,
markus

Karl Pentzlin

unread,
Jan 16, 2009, 5:43:22 AM1/16/09
to Markus Scherer, emoji4...@googlegroups.com, uni...@unicode.org, ke...@sybase.com
Am Freitag, 16. Januar 2009 um 01:58 schrieb Markus Scherer:

MS> On Tue, Jan 6, 2009 at 1:00 PM, Kenneth Whistler <ke...@sybase.com> wrote:
>> Quick comments on some Emoji symbols
>> - Karl Pentzlin 2009-01-06

MS> On reviewing these suggestions more closely, I agree with Ken
MS> that the suggestions are based on an overly pedantic
MS> interpretation of the carrier images

Ken did not say the latter at least on his public answer to my
original mail.
If you (Markus) regard an exact descriptive naming of the symbols
based on the specimens given in your list as inappropriate in any way,
please explain this instead of going to the brink of offensiveness.

MS> disregarding both common practice of naming Unicode symbols

Woolliness?

MS> as well as the carriers' cross-mappings.

Please be specific. I presume the carriers do not want to have the
characters other semantics than is deductible from the pictures, as
long as you give us (who do not speak Japanese) translations from the
Japanese phrases contained in the table, and as long you do not cite
other specific information you have got from the carriers.

Also, please distinguish two things clearly:

1. There are some characters for which the encoding is required for
interoperability reasons. Such a request is simply granted by
assigning code points. The exact naming is completely irrelevant
as long as the characters can be identified uniquely within the
range of characters used in the interoperation.

2. If code points are assigned for whatever reason, they have to be
named according to the Unicode practice and rules, regarding the
fact that they now can be used by all Unicode users, especially
outside of the scope of the original interoperability.
These characters may be stored in documents which will be read
long after some symbols have come out of fashion, and then they
only can be understood if the semantics of these symbols were
defined clearly and do not leave room for a broader semantical
spectrum as it was intended originally.

I see that some people associated with the UTC have a forced interest
to encode the emojis to provide the interoperability.
This is acceptable as in fact Unicode is there to serve the community.

But I also see that some people show a forced interest to do this in a
specific way, especially when it comes to the problematic points of
defining semantics, handling color, representative glyphs,
unification, or inclusion of flags.
This sometimes gives the impression that there is an attitude like
"We are the Borg. You will be assimilated. Resistance is futile."

I, however, am convinced that all these issues can be settled
eventually in a friendly and constructive discussion.

I look forward to participate in such discussions at the upcoming
SC2/WG2 discussions in Dublin and Tokushima, as part of the German
delegation.

Best wishes
Karl Pentzlin


Markus Scherer

unread,
Jan 16, 2009, 2:06:22 PM1/16/09
to Karl Pentzlin, emoji4...@googlegroups.com, uni...@unicode.org, ke...@sybase.com
On Fri, Jan 16, 2009 at 2:43 AM, Karl Pentzlin <karl-p...@acssoft.de> wrote:
Am Freitag, 16. Januar 2009 um 01:58 schrieb Markus Scherer:

MS> On Tue, Jan 6, 2009 at 1:00 PM, Kenneth Whistler <ke...@sybase.com> wrote:
>> Quick comments on some Emoji symbols
>>   - Karl Pentzlin 2009-01-06
MS> On reviewing these suggestions more closely, I agree with Ken
MS> that the suggestions are based on an overly pedantic
MS> interpretation of the carrier images

Ken did not say the latter at least on his public answer to my
original mail.

He did not use these same words, but he explained in detail what I summarized here.

If you (Markus) regard an exact descriptive naming of the symbols
based on the specimens given in your list as inappropriate in any way,
please explain this instead of going to the brink of offensiveness.

No offense intended. I simply agree with Ken that taking every little glyph/image difference as a reason for disunification, and extreme narrowing of character usage by super-specific character naming, are not desirable.

We have published the guiding principles for our work on this proposal in several drafts, and discussed at several UTC meetings, and so far the guidance has been that the approach is largely sound. Feedback based on a very different view of the repertoire therefore does not fit well with how we have developed it so far. Either we go back to square one, for which I don't see any consensus, or we concentrate on feedback that takes into account the general direction of the proposal.

Viele Grüße,
markus
Reply all
Reply to author
Forward
0 new messages