"White Apple" is a misnomer for something that is documented as being
"usually red", likewise "Black Apple" for something "usually green".
Many of the symbols would be better prefixed with something
descriptive, such as "emoji symbol" blah...
"Poop" should be called something less slang, such as "feces". Please.
Please make "mini disc" and "floppy disk" have consistent spelling.
e-1C8 glyph131 should not be xreffed to "eagle" - none of the sources
seem to mention any eagles. Just "bird".
e-1DE glyph153 - In general, the Chinese zodiacal signs should be
grouped. This appears to be the only one that has a name labeling it as
such, and this seems inconsistent and confusing.
1-1E0 glyph155 - PIGS NOSE should be either "PIG'S NOSE" or "PIG NOSE"
because the current name is incorrect ("s" without an apostrophe).
e-355 glyph209 - name is mis-spelled: money -> monkey.
e-35A glyph214 is glossed "pouting" but all of the sources say "angry"
(okotta). The name would be better as PERSON MAKING ANGRY FACE
And... as a general rule: what is the difference between "X Y-verbing"
and "X making an adjective Y"? For example: "person with angry face"
versus "person making angry face"? A rule needs to be sorted out for
naming such things. E.g., if one is animated.
There is also e-320 glyph 158 "angry face". How is this distinguished
from e-35A? And why are they disunified?
I think the separation of e-4B0 and e-4B1 is insufficiently justified
based merely on one source having two little pictures virtually
indistinguishable.
e-4D2 glyph247 - I think the name "emblem" is far too generic.
"Crown-like emblem" would be better.
e-4DB glyph256 WOMANS CLOTHES - should have an apostrophe in it. And
should probably also be plural: women's clothes. Same with e-4CC, "Men's
Shoe".
e-4DD glyph258 PRIZE BAG - should be called MONEY BAG.
I find the entire set of flag symbols basically objectionable, and I'm
fairly certain they'll cause controversy.
e-4F8 glyph285 CONSTELLATION - The name should not be "constellation".
"Uranai" is fortune telling. But you already have one of those at e-4F7
glyph 284. These should be unified, and called "fortune telling symbol".
e-4FA glyph287 KNIFE -- "cooking knife" is a better name for this.
I find the set of books:
e-4FF through e-503 and e-545, e-546, e-547, e-54F
somewhat objectionable and too numerous, and insufficiently motivated in
a *character* encoding. They should all be unified as a single "book
symbol". Or, at most, vertical book, diagonal book, and horizontal book.
e-502 The name BOOK WITH VERICAL FILL has an error -> VERTICAL
e-504 glyph297 NAME PLATE - would be better called "name badge".
I think DoCoMo #147 would be better unified with the generic Unicode hot
spring symbol (U+2668), and rename e-505 glyph298 BATH to "person taking
a bath".
I think these should be unified: e-510 glyph309 PRESENT and e-535
glyph344 PACKAGE
To be consistent with nearby characters, e-B7F glyph348 SYMBOLS SIGN
should be called "symbols symbol".
e-538 glyph351 PC - name is too generic and abbreviated. Should be
called "personal computer symbol".
e-B09, e-B0A, and e-B0B are all objectionable - they should be unified
with existing question/exclamation mark.
e-B13 through e-B19... should all be unified... with U+2665.
e-B23 glyph534 EXCLAMATION MARK IN TRIANGLE - This already encoded at
U+26A0.
e-54D glyph370 ROLODEX - Oops! "Rolodex" is a trademark of Sanford, so
this name really MUST be changed. Perhaps "card filer" or something like
that.
e-7D5 glyph622 SKI - should be called "boot with ski" or "skiing" or
something like that.
e-7DD glyph387 FOOTBALL - should be called "American Football" to
distinguish, because in most of the world "football" is "soccer".
e-7E2 and e-7E3 should really be unified. I see no point in separation.
e-7F4 glyph408 PATROL CAR - I would rather see this specified as "police
car".
e-801 and e-802 should be unified. I see no point in separation.
That completes my basic run-through...
Rick
More specific comments on
http://www.unicode.org/~scherer/emoji4unicode/snapshot/utc.html
e-1C8 glyph131 should not be xreffed to "eagle" - none of the sources
seem to mention any eagles. Just "bird".
e-1DE glyph153 - In general, the Chinese zodiacal signs should be
grouped. This appears to be the only one that has a name labeling it as
such, and this seems inconsistent and confusing.
1-1E0 glyph155 - PIGS NOSE should be either "PIG'S NOSE" or "PIG NOSE"
because the current name is incorrect ("s" without an apostrophe).
e-355 glyph209 - name is mis-spelled: money -> monkey.
e-35A glyph214 is glossed "pouting" but all of the sources say "angry"
(okotta). The name would be better as PERSON MAKING ANGRY FACE
And... as a general rule: what is the difference between "X Y-verbing"
and "X making an adjective Y"? For example: "person with angry face"
versus "person making angry face"? A rule needs to be sorted out for
naming such things. E.g., if one is animated.
There is also e-320 glyph 158 "angry face". How is this distinguished
from e-35A? And why are they disunified?
I think the separation of e-4B0 and e-4B1 is insufficiently justified
based merely on one source having two little pictures virtually
indistinguishable.
e-4D2 glyph247 - I think the name "emblem" is far too generic.
"Crown-like emblem" would be better.
e-4DB glyph256 WOMANS CLOTHES - should have an apostrophe in it. And
should probably also be plural: women's clothes. Same with e-4CC, "Men's
Shoe".
e-4DD glyph258 PRIZE BAG - should be called MONEY BAG.
I find the entire set of flag symbols basically objectionable, and I'm
fairly certain they'll cause controversy.
e-4F8 glyph285 CONSTELLATION - The name should not be "constellation".
"Uranai" is fortune telling. But you already have one of those at e-4F7
glyph 284. These should be unified, and called "fortune telling symbol".
e-4FA glyph287 KNIFE -- "cooking knife" is a better name for this.
I find the set of books:
e-4FF through e-503 and e-545, e-546, e-547, e-54F
somewhat objectionable and too numerous, and insufficiently motivated in
a *character* encoding. They should all be unified as a single "book
symbol". Or, at most, vertical book, diagonal book, and horizontal book.
e-502 The name BOOK WITH VERICAL FILL has an error -> VERTICAL
e-504 glyph297 NAME PLATE - would be better called "name badge".
I think DoCoMo #147 would be better unified with the generic Unicode hot
spring symbol (U+2668), and rename e-505 glyph298 BATH to "person taking
a bath".
I think these should be unified: e-510 glyph309 PRESENT and e-535
glyph344 PACKAGE
To be consistent with nearby characters, e-B7F glyph348 SYMBOLS SIGN
should be called "symbols symbol".
I would like to set out some opinions here that are *only* personal
opinions, not official in any way. And I don't want to argue about these
opinions on the list, I just wish to state them, and suggest a course of
action for the authors of the proposal.
1. What is being encoded, and why?
Most of the contents of these emoji sets I find merely silly and
faddish, but I accept that there is a definite need for these things by
some people, and they are being interchanged in public settings that
have leaked onto the web. So it is being considered appropriate to make
some provision for them. Except that I haven't seen the Japanese
*vendors* of these come to us about them. I'm not entirely convinced
that they need to be encoded in Unicode at all. The final proposal
should try to make convincing arguments for their encoding in the first
place, because at least some of these characters might be rather a hard
sell in WG2. How, for instance, do these sets really differ from the
large number of "font hack" sets that already infest the web, many
elements of which aren't going to be encoded in Unicode, even though
they are publicly interchanged?
2. The proposal cannot be just a chart
What we've seen so far, in the subcommittee and in UTC is not a fully
worked-out proposal. It's a set of possible unifications in a chart, so
there needs to be some real text with substantive argumentation to go
along with that before it's ready to be presented to the committees.
The authors should also try to get some solid evidence of buy-in from
the vendors who are using these sets and users who are leaking them onto
the web -- e.g., letters of support to UTC and WG2 -- otherwise it's not
clear that the "community" is really interested in having them encoded
in an international standard, or that they see a need.
And even if UTC votes for the proposal because a lot of members have
already seen drafts and been involved in discussions, WG2 might still
see it as a controversial proposal of dubious merit, and that could
either slow it down or derail it.
Also, in the final proposal any remaining "anomalies" should be
explained. For example, the spelling of "mini disc" versus "floppy
disk". If that's left inconsistent in the final proposal, for whatever
reason, it will raise red flags unless explained.
3. Source separation
Personally, I am unconvinced about the supposed need for source
separation of this entire repertoire when the set (column 0 of the
working draft chart) is transferred from the realm of "cute pictures in
cell phones" to that of "encoded text characters" in Unicode. One good
example of overkill in the character domain is the set of "book"
characters. Another example is the "angry face" set, e-320 versus e-35A.
In my opinion it is unreasonable and unnecessary in a character encoding
to have both e-320 and 3-35A as encoded characters.
I don't think it's been established that we really have different
characters for all of these, once you strip away colors and animation.
Yeah, we have 3 or 4 semi-overlapping "semi-private" repertoires of
pictures that we're proposing to be smooshed into the character encoding
space so that when this stuff leaks into UTF-8 text from the
semi-private sources it doesn't end up as white boxes and can be indexed
by Google and others. But to do that I see absolutely no real need for
source separation. Why is it important to be able to round trip any of
this stuff? The vendors have mappings to each other, presumably, and
it's not Unicode's job to provide those vendors with an all-singing
all-dancing solution to mapping issues between other "standards".
Others may think there is a need for source separation. If so, I suggest
that it be carefully argued in the actual proposal before UTC and WG2,
because I think it will be difficult to convince the committees that
source separation is actually required, or even a reasonable principle
for this stuff, once divested of color and animation.
Again, just my opinions for the authors to consider in making their
proposal.
Rick
This is a separate topic from various nits and observations about the
repertoire itself...
I would like to set out some opinions here that are *only* personal
opinions, not official in any way. And I don't want to argue about these
opinions on the list, I just wish to state them, and suggest a course of
action for the authors of the proposal.
1. What is being encoded, and why?
Most of the contents of these emoji sets I find merely silly and
faddish, but I accept that there is a definite need for these things by
some people, and they are being interchanged in public settings that
have leaked onto the web. So it is being considered appropriate to make
some provision for them. Except that I haven't seen the Japanese
*vendors* of these come to us about them. I'm not entirely convinced
that they need to be encoded in Unicode at all. The final proposal
should try to make convincing arguments for their encoding in the first
place, because at least some of these characters might be rather a hard
sell in WG2. How, for instance, do these sets really differ from the
large number of "font hack" sets that already infest the web, many
elements of which aren't going to be encoded in Unicode, even though
they are publicly interchanged?
2. The proposal cannot be just a chart
What we've seen so far, in the subcommittee and in UTC is not a fully
worked-out proposal.
It's a set of possible unifications in a chart, so
there needs to be some real text with substantive argumentation to go
along with that before it's ready to be presented to the committees.
The authors should also try to get some solid evidence of buy-in from
the vendors who are using these sets and users who are leaking them onto
the web -- e.g., letters of support to UTC and WG2 -- otherwise it's not
clear that the "community" is really interested in having them encoded
in an international standard, or that they see a need.
And even if UTC votes for the proposal because a lot of members have
already seen drafts and been involved in discussions, WG2 might still
see it as a controversial proposal of dubious merit, and that could
either slow it down or derail it.
Also, in the final proposal any remaining "anomalies" should be
explained. For example, the spelling of "mini disc" versus "floppy
disk". If that's left inconsistent in the final proposal, for whatever
reason, it will raise red flags unless explained.
3. Source separation
Personally, I am unconvinced about the supposed need for source
separation of this entire repertoire when the set (column 0 of the
working draft chart) is transferred from the realm of "cute pictures in
cell phones" to that of "encoded text characters" in Unicode. One good
example of overkill in the character domain is the set of "book"
characters. Another example is the "angry face" set, e-320 versus e-35A.
In my opinion it is unreasonable and unnecessary in a character encoding
to have both e-320 and 3-35A as encoded characters.
I don't think it's been established that we really have different
characters for all of these, once you strip away colors and animation.
Yeah, we have 3 or 4 semi-overlapping "semi-private" repertoires of
pictures that we're proposing to be smooshed into the character encoding
space so that when this stuff leaks into UTF-8 text from the
semi-private sources it doesn't end up as white boxes and can be indexed
by Google and others. But to do that I see absolutely no real need for
source separation. Why is it important to be able to round trip any of
this stuff? The vendors have mappings to each other, presumably, and
it's not Unicode's job to provide those vendors with an all-singing
all-dancing solution to mapping issues between other "standards".
Others may think there is a need for source separation. If so, I suggest
that it be carefully argued in the actual proposal before UTC and WG2,
because I think it will be difficult to convince the committees that
source separation is actually required, or even a reasonable principle
for this stuff, once divested of color and animation.
Again, just my opinions for the authors to consider in making their
proposal.
Rick
Yes, possibly. It might have been better to wait until the proposal had
some accompanying explanatory text. People are definitely not
understanding it. However, this is part of the reason I post my comments
here as direct feedback to the authors, and have no intention of
contributing to any tempest on the Unicode list.
> I don't know how very many times we have to say that this IS NOT the
> proposal that we will be making to the UTC; it is preliminary work
Yes... I just wanted to get my 2 cents in to this official feedback
forum, to make sure that the text isn't forgotten.
Rick