e-039 glyph40 GREEN EARTH, etc

137 views

Skip to first unread message

Rick McGowan

unread,

Dec 18, 2008, 12:16:29 PM12/18/08

to emoji4unicode

Unless contrasted with a red earth or scorched earth or some other,
this shouldn't be called by a color name. It should be "earth symbol"
or such.

e-044 glyph51 YOUNG GREEN LEAF -- I think would be better called
"newly licensed driver symbol" or some such.

"White Apple" is a misnomer for something that is documented as being
"usually red", likewise "Black Apple" for something "usually green".

Many of the symbols would be better prefixed with something
descriptive, such as "emoji symbol" blah...

Spiral shell should not be unified with snail (katatsumuri)

Hamster should not be aliased or xrefed to "guinea pig"

Move "hatching chick" near the other chicks?

The open-toed sandal should not be called "mule". Yeah, some people
call it that, but as a character name it's ambiguous amongst all the
other emoji animals.

"Poop" should be called something less slang, such as "feces". Please.

Please make "mini disc" and "floppy disk" have consistent spelling.

Haven't finished looking through the proposal.

Markus Scherer

unread,

Dec 18, 2008, 6:21:54 PM12/18/08

to emoji4...@googlegroups.com

Hi Rick,

Thanks for your feedback -- and keep it coming :-)

(I will collect it in project issues, probably an issue per symbol.)

On Thu, Dec 18, 2008 at 9:16 AM, Rick McGowan <ri...@unicode.org> wrote:

"White Apple" is a misnomer for something that is documented as being
"usually red", likewise "Black Apple" for something "usually green".

Yes. We have done this systematically, using customary black/white names and representative glyphs for Unicode characters that are mapped to colored Emoji. This was discussed and advised before in the UTC.

Many of the symbols would be better prefixed with something
descriptive, such as "emoji symbol" blah...

I don't see precedent for that with old or recent (ARIB) symbols.

"Poop" should be called something less slang, such as "feces". Please.

This has been discussed more than a year ago, including in the UTC, and "poop" was generally accepted.

Please make "mini disc" and "floppy disk" have consistent spelling.

The Mini Disc standard format is called exactly that, while everyone uses "floppy disk" as far as I know. Inconsistent with each other, but consistent with customary spellings of each.

markus

Rick McGowan

unread,

Dec 19, 2008, 5:07:52 PM12/19/08

to emoji4unicode

More specific comments on
http://www.unicode.org/~scherer/emoji4unicode/snapshot/utc.html

e-1C8 glyph131 should not be xreffed to "eagle" - none of the sources
seem to mention any eagles. Just "bird".

e-1DE glyph153 - In general, the Chinese zodiacal signs should be
grouped. This appears to be the only one that has a name labeling it as
such, and this seems inconsistent and confusing.

1-1E0 glyph155 - PIGS NOSE should be either "PIG'S NOSE" or "PIG NOSE"
because the current name is incorrect ("s" without an apostrophe).

e-355 glyph209 - name is mis-spelled: money -> monkey.

e-35A glyph214 is glossed "pouting" but all of the sources say "angry"
(okotta). The name would be better as PERSON MAKING ANGRY FACE

And... as a general rule: what is the difference between "X Y-verbing"
and "X making an adjective Y"? For example: "person with angry face"
versus "person making angry face"? A rule needs to be sorted out for
naming such things. E.g., if one is animated.

There is also e-320 glyph 158 "angry face". How is this distinguished
from e-35A? And why are they disunified?

I think the separation of e-4B0 and e-4B1 is insufficiently justified
based merely on one source having two little pictures virtually
indistinguishable.

e-4D2 glyph247 - I think the name "emblem" is far too generic.
"Crown-like emblem" would be better.

e-4DB glyph256 WOMANS CLOTHES - should have an apostrophe in it. And
should probably also be plural: women's clothes. Same with e-4CC, "Men's
Shoe".

e-4DD glyph258 PRIZE BAG - should be called MONEY BAG.

I find the entire set of flag symbols basically objectionable, and I'm
fairly certain they'll cause controversy.

e-4F8 glyph285 CONSTELLATION - The name should not be "constellation".
"Uranai" is fortune telling. But you already have one of those at e-4F7
glyph 284. These should be unified, and called "fortune telling symbol".

e-4FA glyph287 KNIFE -- "cooking knife" is a better name for this.

I find the set of books:
e-4FF through e-503 and e-545, e-546, e-547, e-54F
somewhat objectionable and too numerous, and insufficiently motivated in
a *character* encoding. They should all be unified as a single "book
symbol". Or, at most, vertical book, diagonal book, and horizontal book.

e-502 The name BOOK WITH VERICAL FILL has an error -> VERTICAL

e-504 glyph297 NAME PLATE - would be better called "name badge".

I think DoCoMo #147 would be better unified with the generic Unicode hot
spring symbol (U+2668), and rename e-505 glyph298 BATH to "person taking
a bath".

I think these should be unified: e-510 glyph309 PRESENT and e-535
glyph344 PACKAGE

To be consistent with nearby characters, e-B7F glyph348 SYMBOLS SIGN
should be called "symbols symbol".

e-538 glyph351 PC - name is too generic and abbreviated. Should be
called "personal computer symbol".

e-B09, e-B0A, and e-B0B are all objectionable - they should be unified
with existing question/exclamation mark.

e-B13 through e-B19... should all be unified... with U+2665.

e-B23 glyph534 EXCLAMATION MARK IN TRIANGLE - This already encoded at
U+26A0.

e-54D glyph370 ROLODEX - Oops! "Rolodex" is a trademark of Sanford, so
this name really MUST be changed. Perhaps "card filer" or something like
that.

e-7D5 glyph622 SKI - should be called "boot with ski" or "skiing" or
something like that.

e-7DD glyph387 FOOTBALL - should be called "American Football" to
distinguish, because in most of the world "football" is "soccer".

e-7E2 and e-7E3 should really be unified. I see no point in separation.

e-7F4 glyph408 PATROL CAR - I would rather see this specified as "police
car".

e-801 and e-802 should be unified. I see no point in separation.

That completes my basic run-through...

Rick

Mark Davis

unread,

Dec 19, 2008, 6:45:06 PM12/19/08

to emoji4...@googlegroups.com, ri...@unicode.org

Thanks for the detailed reading and feedback.

Some quick comments below.

Mark

On Fri, Dec 19, 2008 at 14:07, Rick McGowan <ri...@unicode.org> wrote:

More specific comments on
http://www.unicode.org/~scherer/emoji4unicode/snapshot/utc.html

e-1C8 glyph131 should not be xreffed to "eagle" - none of the sources
seem to mention any eagles. Just "bird".

e-1DE glyph153 - In general, the Chinese zodiacal signs should be
grouped. This appears to be the only one that has a name labeling it as
such, and this seems inconsistent and confusing.

1-1E0 glyph155 - PIGS NOSE should be either "PIG'S NOSE" or "PIG NOSE"
because the current name is incorrect ("s" without an apostrophe).

On the apostrophe; it isn't allowed.

Section 4.8 Name—Normative
All Unicode characters have unique names that serve as formal, unique identifiers for each
character. Unicode character names contain only uppercase Latin letters A through Z, dig-
its, space, and hyphen-minus.

We can use PIG NOSE in this case. In other cases, I think the best choice might still be the genitive without apostrophe, but we should avoid where there are reasonable alternatives.

e-355 glyph209 - name is mis-spelled: money -> monkey.

e-35A glyph214 is glossed "pouting" but all of the sources say "angry"
(okotta). The name would be better as PERSON MAKING ANGRY FACE

There is only one source for this. Remember to read the key - only the white ones have roundtrips; the others are fallbacks, and should be considered in that light. So the key here is

#822
キャラクター（かわいく怒る）
U+EB88
SJIS-F48D JIS-7B6D

vs

#258
顔２（おこったカオ）
U+E472
SJIS-F64A JIS-752B

And... as a general rule: what is the difference between "X Y-verbing"
and "X making an adjective Y"? For example: "person with angry face"
versus "person making angry face"? A rule needs to be sorted out for
naming such things. E.g., if one is animated.

There was no attempt to make animated items use verbs in that way. If you have specific recommendations for changes, it'd be good to raise them.

There is also e-320 glyph 158 "angry face". How is this distinguished
from e-35A? And why are they disunified?

They have to be, for source separation.

I think the separation of e-4B0 and e-4B1 is insufficiently justified
based merely on one source having two little pictures virtually
indistinguishable.

Again, source separation.

e-4D2 glyph247 - I think the name "emblem" is far too generic.
"Crown-like emblem" would be better.

e-4DB glyph256 WOMANS CLOTHES - should have an apostrophe in it. And
should probably also be plural: women's clothes. Same with e-4CC, "Men's
Shoe".

e-4DD glyph258 PRIZE BAG - should be called MONEY BAG.

I find the entire set of flag symbols basically objectionable, and I'm
fairly certain they'll cause controversy.

This was discussed at some considerable length in the UTC, and we're following the guidelines laid out in that discussion. If you have some specific suggestions...

e-4F8 glyph285 CONSTELLATION - The name should not be "constellation".
"Uranai" is fortune telling. But you already have one of those at e-4F7
glyph 284. These should be unified, and called "fortune telling symbol".

I think this is a possible unification.

e-4FA glyph287 KNIFE -- "cooking knife" is a better name for this.

I find the set of books:
e-4FF through e-503 and e-545, e-546, e-547, e-54F
somewhat objectionable and too numerous, and insufficiently motivated in
a *character* encoding. They should all be unified as a single "book
symbol". Or, at most, vertical book, diagonal book, and horizontal book.

Source separation.

e-502 The name BOOK WITH VERICAL FILL has an error -> VERTICAL

e-504 glyph297 NAME PLATE - would be better called "name badge".

I think DoCoMo #147 would be better unified with the generic Unicode hot
spring symbol (U+2668), and rename e-505 glyph298 BATH to "person taking
a bath".

I think these should be unified: e-510 glyph309 PRESENT and e-535
glyph344 PACKAGE

To be consistent with nearby characters, e-B7F glyph348 SYMBOLS SIGN
should be called "symbols symbol".

I can see why you'd say that. I think in TUS we don't have any consistent distinction between SIGN and SYMBOL. Compare:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:name=/\bSIGN$/:]
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:name=/\bSYMBOL$/:]
and even
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:name=/\bMARK$/:]

And I don't know where this came from, but I can see why whoever came up with this wanted to avoid symbols symbol - sounds odd to me.

Rick McGowan

unread,

Dec 19, 2008, 8:09:12 PM12/19/08

to emoji4unicode

This is a separate topic from various nits and observations about the
repertoire itself...

I would like to set out some opinions here that are *only* personal
opinions, not official in any way. And I don't want to argue about these
opinions on the list, I just wish to state them, and suggest a course of
action for the authors of the proposal.

1. What is being encoded, and why?

Most of the contents of these emoji sets I find merely silly and
faddish, but I accept that there is a definite need for these things by
some people, and they are being interchanged in public settings that
have leaked onto the web. So it is being considered appropriate to make
some provision for them. Except that I haven't seen the Japanese
*vendors* of these come to us about them. I'm not entirely convinced
that they need to be encoded in Unicode at all. The final proposal
should try to make convincing arguments for their encoding in the first
place, because at least some of these characters might be rather a hard
sell in WG2. How, for instance, do these sets really differ from the
large number of "font hack" sets that already infest the web, many
elements of which aren't going to be encoded in Unicode, even though
they are publicly interchanged?

2. The proposal cannot be just a chart

What we've seen so far, in the subcommittee and in UTC is not a fully
worked-out proposal. It's a set of possible unifications in a chart, so
there needs to be some real text with substantive argumentation to go
along with that before it's ready to be presented to the committees.

The authors should also try to get some solid evidence of buy-in from
the vendors who are using these sets and users who are leaking them onto
the web -- e.g., letters of support to UTC and WG2 -- otherwise it's not
clear that the "community" is really interested in having them encoded
in an international standard, or that they see a need.

And even if UTC votes for the proposal because a lot of members have
already seen drafts and been involved in discussions, WG2 might still
see it as a controversial proposal of dubious merit, and that could
either slow it down or derail it.

Also, in the final proposal any remaining "anomalies" should be
explained. For example, the spelling of "mini disc" versus "floppy
disk". If that's left inconsistent in the final proposal, for whatever
reason, it will raise red flags unless explained.

3. Source separation

Personally, I am unconvinced about the supposed need for source
separation of this entire repertoire when the set (column 0 of the
working draft chart) is transferred from the realm of "cute pictures in
cell phones" to that of "encoded text characters" in Unicode. One good
example of overkill in the character domain is the set of "book"
characters. Another example is the "angry face" set, e-320 versus e-35A.
In my opinion it is unreasonable and unnecessary in a character encoding
to have both e-320 and 3-35A as encoded characters.

I don't think it's been established that we really have different
characters for all of these, once you strip away colors and animation.
Yeah, we have 3 or 4 semi-overlapping "semi-private" repertoires of
pictures that we're proposing to be smooshed into the character encoding
space so that when this stuff leaks into UTF-8 text from the
semi-private sources it doesn't end up as white boxes and can be indexed
by Google and others. But to do that I see absolutely no real need for
source separation. Why is it important to be able to round trip any of
this stuff? The vendors have mappings to each other, presumably, and
it's not Unicode's job to provide those vendors with an all-singing
all-dancing solution to mapping issues between other "standards".

Others may think there is a need for source separation. If so, I suggest
that it be carefully argued in the actual proposal before UTC and WG2,
because I think it will be difficult to convince the committees that
source separation is actually required, or even a reasonable principle
for this stuff, once divested of color and animation.

Again, just my opinions for the authors to consider in making their
proposal.

Rick

Mark Davis

unread,

Dec 19, 2008, 9:22:49 PM12/19/08

to emoji4...@googlegroups.com

Some comments.

Mark

On Fri, Dec 19, 2008 at 17:09, Rick McGowan <ri...@unicode.org> wrote:

This is a separate topic from various nits and observations about the
repertoire itself...

I would like to set out some opinions here that are *only* personal
opinions, not official in any way. And I don't want to argue about these
opinions on the list, I just wish to state them, and suggest a course of
action for the authors of the proposal.

1. What is being encoded, and why?

Most of the contents of these emoji sets I find merely silly and
faddish, but I accept that there is a definite need for these things by
some people, and they are being interchanged in public settings that
have leaked onto the web. So it is being considered appropriate to make
some provision for them. Except that I haven't seen the Japanese
*vendors* of these come to us about them. I'm not entirely convinced
that they need to be encoded in Unicode at all. The final proposal
should try to make convincing arguments for their encoding in the first
place, because at least some of these characters might be rather a hard
sell in WG2. How, for instance, do these sets really differ from the
large number of "font hack" sets that already infest the web, many
elements of which aren't going to be encoded in Unicode, even though
they are publicly interchanged?

These issues have been discussed at some length in the UTC meeting and/or symbol subcommittee ad hocs. While Japanese mobile phone vendors certainly stand to benefit from this, it is Unicode member companies that are proposing it, because it solves a current problem with data interchange. The emoji symbols are current encoded with PUA, which has nasty problems for interchange. The symbols are no more problematic than many we have already incorporated into Unicode.

I think so much of this tempest in a teapot is because we don't yet have a black-and-white font.

Perhaps it was a mistake to open this up for public comment, just because there are so many people getting such wrong impressions of the proposal.

2. The proposal cannot be just a chart

What we've seen so far, in the subcommittee and in UTC is not a fully
worked-out proposal.

I don't know how very many times we have to say that this IS NOT the proposal that we will be making to the UTC; it is preliminary work that we wanted to get comments on. In part, that is because as we do a font, we don't want to produce unnecessary or unrepresentative glyphs. So if there were some areas where we can make earlier changes because of feedback (such as the feedback that you gave), that's all to the good.

It's a set of possible unifications in a chart, so
there needs to be some real text with substantive argumentation to go
along with that before it's ready to be presented to the committees.

And there will be, following the pattern of the ARIB proposal.

The authors should also try to get some solid evidence of buy-in from
the vendors who are using these sets and users who are leaking them onto
the web -- e.g., letters of support to UTC and WG2 -- otherwise it's not
clear that the "community" is really interested in having them encoded
in an international standard, or that they see a need.

That isn't necessary. While we suspect and hope that the vendors will switch from their PUA usage once these are encoded, that is not the primary value to the proposers.

And even if UTC votes for the proposal because a lot of members have
already seen drafts and been involved in discussions, WG2 might still
see it as a controversial proposal of dubious merit, and that could
either slow it down or derail it.

Only if there is a lot of unnecessary FUD around the proposal.

Also, in the final proposal any remaining "anomalies" should be
explained. For example, the spelling of "mini disc" versus "floppy
disk". If that's left inconsistent in the final proposal, for whatever
reason, it will raise red flags unless explained.

Good point.

3. Source separation

Personally, I am unconvinced about the supposed need for source
separation of this entire repertoire when the set (column 0 of the
working draft chart) is transferred from the realm of "cute pictures in
cell phones" to that of "encoded text characters" in Unicode. One good
example of overkill in the character domain is the set of "book"
characters. Another example is the "angry face" set, e-320 versus e-35A.
In my opinion it is unreasonable and unnecessary in a character encoding
to have both e-320 and 3-35A as encoded characters.

No. We need source separation in order to prevent degrading data. If we take in data from vendor X with their version of SJIS, and convert to Unicode, we need to be able to preserve the data so that if we were to send the data back to vendor X, none of the data is destroyed.

This is the same principle as source separation for Han, or any of the other encodings that had source separation for.

I don't think it's been established that we really have different
characters for all of these, once you strip away colors and animation.
Yeah, we have 3 or 4 semi-overlapping "semi-private" repertoires of
pictures that we're proposing to be smooshed into the character encoding
space so that when this stuff leaks into UTF-8 text from the
semi-private sources it doesn't end up as white boxes and can be indexed
by Google and others. But to do that I see absolutely no real need for
source separation. Why is it important to be able to round trip any of
this stuff? The vendors have mappings to each other, presumably, and
it's not Unicode's job to provide those vendors with an all-singing
all-dancing solution to mapping issues between other "standards".

Others may think there is a need for source separation. If so, I suggest
that it be carefully argued in the actual proposal before UTC and WG2,
because I think it will be difficult to convince the committees that
source separation is actually required, or even a reasonable principle
for this stuff, once divested of color and animation.

This was one of the key principles discussed in the UTC committee, multiple times. Perhaps you were nodding off during those discussions ;-)

Again, just my opinions for the authors to consider in making their
proposal.

Frankly, there are far more need for emoji encoding than there are for archaic scripts. If you really want to argue against proposals that have firm industry requirements, you'll find a lot less support in the UTC for the encoding of exotic characters.

Rick

Rick McGowan

unread,

Dec 20, 2008, 12:54:55 PM12/20/08

to emoji4...@googlegroups.com

Mark Davis wrote:
> I think so much of this tempest in a teapot is because we don't yet
> have a black-and-white font.
> Perhaps it was a mistake to open this up for public comment, just
> because there are so many people getting such wrong impressions of the
> proposal.

Yes, possibly. It might have been better to wait until the proposal had
some accompanying explanatory text. People are definitely not
understanding it. However, this is part of the reason I post my comments
here as direct feedback to the authors, and have no intention of
contributing to any tempest on the Unicode list.

> I don't know how very many times we have to say that this IS NOT the
> proposal that we will be making to the UTC; it is preliminary work

Yes... I just wanted to get my 2 cents in to this official feedback
forum, to make sure that the text isn't forgotten.