Flag Symbols

Kenneth Whistler

unread,

Dec 22, 2008, 9:04:22 PM12/22/08

to emoji4...@googlegroups.com, ke...@sybase.com

The 10 flag symbols should be reinterpreted (and
renamed) specifically as common language/locale
symbols.

This avoids all the political issues of trying to
encode symbols *for* national flags and the need
for designing an open-ended scheme for encoding and
naming any national flag.

The whole "encoding of flags" problem could sink the
proposal otherwise, or require them to be left out
of the encoding altogether. This will only get worse
when the proposal gets to WG2, so I advise redefining
the issue as what it should be: symbols for common
language/locales, so as to avoid those problems now.

--Ken

Christopher Fynn

unread,

Jan 10, 2009, 2:10:25 AM1/10/09

to emoji4...@googlegroups.com, Unicode Mailing List

On 23/12/2008, Kenneth Whistler <ke...@sybase.com> wrote:

> The whole "encoding of flags" problem could sink the
> proposal otherwise, or require them to be left out
> of the encoding altogether. This will only get worse
> when the proposal gets to WG2, so I advise redefining
> the issue as what it should be: symbols for common
> language/locales, so as to avoid those problems now.

> --Ken

But there are probably locales that are more common than some of those
included in this set. IMO, if any national flags are included, there
is no way you are going to be able to limit them to ten currently
"needed for interoperability" in the long term.

Either you have to reject national flag symbols outright as logos; set
aside enough code points for every national flag (as Mark mentioned);
use something like a generic flag character + a variation selector; or
come up with a way of representing them in traditional plain text as
Skype seems to be doing
<http://factoryjoe.com/projects/emoticons/#flags>.

- Chris

Christopher Fynn

unread,

Jan 11, 2009, 5:42:48 AM1/11/09

to Unicode Mailing List, emoji4unicode, Michael Everson

A block of 256 characters would be sufficient to handle flags of all
countries with existing ISO 3166-1-alpha-2 codes ~ with enough space
left over for the flags of organizations such as the UN, the IRC, the
EC and a few others.

- Chris

Michael Everson wrote:
> On 11 Jan 2009, at 01:55, Michael D'Errico wrote:

>> Michael Everson wrote:
>>> Do NOT think that encoding even ONE of the ten flags proposed will
>>> not lead to a huge number of requests for additional characters. And
>>> do NOT think that those requests will not be reasonable.

>> Flags are actually a perfect example that could use the method I've been
>> talking about. If you had FLAG_A through FLAG_Z you could specify any
>> and all country flags with two code points, e.g. FLAG_C FLAG_A for the
>> Canadian flag.

> I would not favour such a scheme. Karl's suggestion of encoding a block
> of two-letter codes makes more sense. (FLAG AA to FLAG ZZ.)

> Michael Everson * http://www.evertype.com

Kent Karlsson

unread,

Jan 11, 2009, 7:01:27 AM1/11/09

to emoji4...@googlegroups.com, Unicode Mailing List, Michael Everson

I think it would be best to handle the flags as logos. After all, they
have much more in common with logos than with characters (at least IMO).
Note also that flags often have requirements on exact proportions/
placements, and exact colours (at least in certain contexts). Much
like a logo...

/kent k

Karl Pentzlin

unread,

Jan 11, 2009, 7:04:45 AM1/11/09

to Christopher Fynn, Unicode Mailing List, emoji4unicode

Christopher Fynn wrote 2009-01-11 11:42:

> A block of 256 characters would be sufficient to handle flags of all
> countries with existing ISO 3166-1-alpha-2 codes ~ with enough space
> left over for the flags of organizations such as the UN, the IRC, the
> EC and a few others.

As pointed out before, L2/08-305 (also available at
http://www.pentzlin.com/Nationalflags1.pdf ) deals with this
question in more detail.
In fact, 632 characters (i.e. a 40 column block) is needed for a
solution which avoids the need to incorporate all future changes of
ISO 3166 manually.

- Karl Pentzlin

John Hudson

unread,

Jan 11, 2009, 10:53:35 AM1/11/09

to Kent Karlsson, emoji4...@googlegroups.com, Unicode Mailing List, Michael Everson

Kent Karlsson wrote:

> I think it would be best to handle the flags as logos. After all, they
> have much more in common with logos than with characters (at least IMO).
> Note also that flags often have requirements on exact proportions/
> placements, and exact colours (at least in certain contexts). Much
> like a logo...

They also tend to change when regimes change. Further, encoding flags is
a major political hot potato, since it implies choices about recognition
of states and non-state authorities. Not only countries have flags, but
also every group of people who *want* to have a country. Some of those
people end up getting countries. Not all of those countries are
universally recognised by other states. It's a mess.

JH

--

Tiro Typeworks www.tiro.com
Gulf Islands, BC ti...@tiro.com

The Lord entered her to become a servant.
The Word entered her to keep silence in her womb.
The thunder entered her to be quiet.
-- St Ephrem the Syrian

Philippe Verdy

unread,

Jan 11, 2009, 1:02:23 PM1/11/09

to Karl Pentzlin, Christopher Fynn, Unicode Mailing List

Whoild that be enough ? Remember that ISO 3166 is NOT stable, the same
countries are changing their codes, or codes are reused after some time. In
addition, some countries have several flags with distinct usages (national,
enseign, civil, milirary....) and even in this case, flags are also changing
over time : there will then be the need for historical flags (some of them
are still displayed and used, like the flag for the former USSR, because all
flags are not politically neutral.

Given the changes that occur in this political area over time, this is
contradicting the need for stability in the UCS : you cannot assume the
correct semanctic of flags if they are just encoded with names like "FLAG
AA" .. "FLAG AZ" without being more precise about what they effectively
designate (even if you are restricting yourself to ISO 3166-1 for modern
flags, you'll have to track the version history of ISO 3166-1 to make the
encoding consistant). Just think about "FLAG CS" : which country and which
flag among the respective variants ?

A vexillology site will reveal you that there are already many more
country/nation flags than what you expect : 256 or 632 will not suffice or
will be used in a way that wastes the space (with unused codes), but also
with conflicting codes (used at different epochs, so needing a date field,
not just the code, with also the need to handle flags related to various
epoch).

In a book or article speaking about history, even the most recent
publications, or for publications about sport, I see the proposed solution
(treating them as variants) completely inacceptable : changing a flag in
such article completely out of the context of the text that needed to refer
to a specific flag for a country at a relevant date is certainly not the way
to go. Different flags for different countries or political regimes is
certainly not acceptable (would you accept to see the current German flag
associated to the nzai regime, or the Nazi falg used for modern Germany ?
Would you accept to see the former Yugoslav flag used for articles speaking
about Czechoslovlakia?

No these distinct flags are definitely NOT acceptable graphic variants of
the same character, if they get encoded as characters ! And even if France
and Italy use the same tricolor composition, displaying the glyphs in
monochrome does not mean that thee two flags must be unified by the same
encoded character (because the color is in fact essential for both flags to
make the distinction, even if you may find some publications where such
distinction is impossible this is not the best solution and can only be used
as a paliative solution, just like we have mapping for compatibility in
NFKC/NFKD : this loss of color distinction is lossy in germs of information
and unified colorless flags cannot be meant to represent countries/nations
distinctly).

And there's no solution for other flags needed in sports that do not cover
exactly the same federal nations as those recognized at UN (see Ireland as a
single sportive nation, or England which is not UK): sports is certainly one
domain where flags are extremely frequently used and displayed, you can't
ignore it. These flags do not represent directly the national political
entity, but the federal institution that governs some sport in some areas.

Those federations are not necessarily following the political or
administrative divisions even if they are linked to some region of the world
(another assumption that starts to be wrong as well, given that some sport
federations are in competition and have cross-border activities, notably is
North America for some professionalized sports with high audiences, but also
for some sports in groups of small nations organized with a single
confederation and no nartional federation really working everywhere). For
some federations, the flag is not the one used for the country (this has
included some federations represented at the International Olympic Comity as
well, notably for new ad hoc comities created after a major change of
political regime and reorganization of the federations representing the
various countries : see the participations of the countries of the former
USSR in the "Unified team").

Kenneth Whistler

unread,

Jan 12, 2009, 7:08:16 PM1/12/09

to chris...@gmail.com, emoji4...@googlegroups.com, ke...@sybase.com

[Not cc'ing to the entire unicode list, as pointless.]

> > The whole "encoding of flags" problem could sink the
> > proposal otherwise, or require them to be left out
> > of the encoding altogether. This will only get worse
> > when the proposal gets to WG2, so I advise redefining
> > the issue as what it should be: symbols for common
> > language/locales, so as to avoid those problems now.
>
> > --Ken
>
> But there are probably locales that are more common than some of those
> included in this set.

This is missing the point yet again.

I do not think it makes any sense to encode flags (national
or otherwise) as characters in the Unicode Standard.

I do not think it makes any sense to encode locales as
characters in the Unicode Standard.

Either one of those is just madness, and would lead to the
kind of public pushback and extensibility problems that
Chris and others have pointed out. Trying to encode them
will sink the proposal in WG2 if not earlier.

I am not advocating encoding either.

I *am* advocating encoding exactly 10 compatibility characters
to represent the 10 emoji in the Japanese telco set, which
any reasonable analysis would indicate are there as symbols
reflecting the graphical UI usage of well-known country flags
as icons representing common locales.

This is *clearly* a case where encoding anything that looks
like an open-ended set of well-known symbols or whose names
imply the same will lead to trouble. It already *HAS* lead
to trouble.

Hence, if the proponents of this proposal want to avoid further
trouble, then the only reasonable course I see is to

a. change the glyphs to make them NON-flags. Or flaggy in
a way that avoids all the criticism. Easiest way to do this
is to follow the kind of arbitrary graphics used in
the 24XX block for the graphics for control pictures.
Just treat the damn things as compatibility symbols
for graphics for UI controls. In fact stick them in the 243X
column, if you want, to make it clear they are *NOT*
extensible.

b. change the names to make them NON-flags. I suggested
EMOJI SYMBOL FOR RUSSIAN LOCALISATION, but since it
seems obvious now that even *that* will be misinterpreted,
then how about:

U+2430 SYMBOL FOR EMOJI UI GRAPHIC FOR RUSSIAN LOCALISATION

Or U+1F3XX, I don't care.

c. change the proposal to *explicitly* indicate that these are
not proposed for encoding either as flags or as locales,
to indicate that objections to encoding flags and to
encoding locales have already been noted (and need not
be repeated ad nauseum), and that the 10 compatibility
characters in question are only intended for interoperability
with the Japanese telco gaiji, and will not be extended to
make them flaggy or localely in any way.

> IMO, if any national flags are included, there
> is no way you are going to be able to limit them to ten currently
> "needed for interoperability" in the long term.

Which is correct, but utterly beside the point if the
proposal does the right thing -- because then it will
not be including "any national flags".

> Either you have to reject national flag symbols outright as logos; set
> aside enough code points for every national flag (as Mark mentioned);
> use something like a generic flag character + a variation selector;

None of those are viable options which address the requirements
of the proposal.

> or
> come up with a way of representing them in traditional plain text as
> Skype seems to be doing
> <http://factoryjoe.com/projects/emoticons/#flags>.

This is a nice summary of one very common approach to dealing
with emoticons in bulletin boards, instant messaging, and other
similar contexts. But this is not "a way of representing them
in traditional plain text", but rather a low-overhead way
of using plain text markup embedding in plain text as a protocol
for indicating the request for substituting graphic pictures.

In Skype, interpreting the sequence of ASCII characters "8)" or
"8=)" or "(cool)" etc. as instructions to switch in the
sunglasses-wearing Mr. Cool smiley icon for display is effectively
no different than the next layer of markup convention used on
that very page, using the HTML markup:

to display the same thing directly in an HTML context.

To the extent that people around the world can agree on what
these conventions mean, you could well end up with more
interoperability for use of emoticons, and even non-emoticon
emoji of a more generic type. (Note that the Skype set includes
icons for "Pizza, "Cake", "Beer" and "[Cocktail glass] Drink",
so folks upset about the Japanese telco use of "FISH CAKE WITH
SWIRL DESIGN" are simply showing cultural biases, IMO.)

But these kinds of conventions are not the business
of the Unicode Standard -- this is markup *above* the level
of plain text, and an area which I submit is not yet quite ready
for standardization. And in any case, this is *not* what
the encoding of a set of emoji symbols *as characters*
in Unicode to represent the SJIS gaiji *characters* in
Japanese telco sets is about.

--Ken

Christopher Fynn

unread,

Jan 13, 2009, 1:07:36 AM1/13/09

to Kenneth Whistler, emoji4unicode, Michael Everson

Ken

Do you want this to be treated as a proposal for characters needed
for interoperability or as a proposal for a bunch of additional
symbols?

To many people if something looks like a flag it is a flag - no matter
how rational your argument, it is going to be difficult to change that
perception - especially if this is being considered as if it is a
proposal for symbols.

In fact many of these emoji characters, not only the flags, are
similar to symbols which form parts of other much larger and well
established sets of symbols (weather symbols, hand signs, map symbols
and so on) which it probably will make sense to encode at some time.
If we are going to treat some emoji as normal characters and encode
them in existing symbol blocks - and/or unify some of them with
existing symbols - but treat others only as compatibility characters
of some kind then the problem is: Where to draw the line?

Those symbol characters which form parts of larger well established
sets of symbols then
need consideration in that context.

That is one reason why I have suggested elsewhere that it might be an
idea to consider encoding the whole lot as a special block of
interoperability characters with names like EMOJIXXX and forget about
unifying them with existing characters or encoding any of them in
existing blocks and so on. After all it has been repeatedly stated the
primary purpose of this proposal is to get a bunch of characters
encoded necessary for interoperability encoded - *not* particularly to
get a bunch of additional symbol characters encoded.

Looked at as proposal for symbol characters it contains a bunch of
fairly random sub sets of various larger symbol sets, an assortment of
dingbats with little or no symbolic value, and some highly
questionable things which I think would normally never be considered
as acceptable candidates for encoding.

In the end I suspect that keeping the emoji together as a
"compatibility set" might be the most practical and least
controversial way of ensuring the complete set of emoji characters
needed for interoperability gets encoded quickly. Otherwise it seems
quite likely there may continue to be lengthy discussion about
particular characters, unifications, flags, etc. etc. as the proposal
moves forward.

After all what is being proposed is a bunch of interoperability
characters - why not simply call them that?

- Chris

Kenneth Whistler

unread,

Jan 13, 2009, 2:04:37 PM1/13/09

to chris...@gmail.com, emoji4...@googlegroups.com, eve...@evertype.com, ke...@sybase.com

Chris Fynn asked:

> Do you want this to be treated as a proposal for characters needed
> for interoperability or as a proposal for a bunch of additional
> symbols?

Both. Trying to paint the issues in a black and white,
either/or choice like that is one of the continuing difficulties
with the feedback on this proposal.

The *entire* set (minus the corporate logos) is needed for
interoperability with the character data from the Japanese
telco networks.

But it makes no sense to encode the *entire* set as a compatibility
block somewhere, because that would ignore the obvious fact that
many of the characters in the SJIS gaiji sets in question
represent characters that are *already* encoded in the standard.
Once you recognize that fact, then all the rest of the characters
also need to be evaluated as to whether they need unification,
whether they make sense as extensions of sets of symbols already
encoded in the standard, or whether they should simply be
treated as chunks of compatibility characters serving no additional
purpose beyond the interoperability conversion requirement.

To do any less would be to avoid due diligence on this set --
and I think the UTC, as well as the proposers themselves, are
well beyond thinking that a simple choice between option A and
option B is what is required here.

> To many people if something looks like a flag it is a flag - no matter
> how rational your argument, it is going to be difficult to change that
> perception - especially if this is being considered as if it is a
> proposal for symbols.

Sure. Most people are not trained as linguists or semioticians,
and cannot make subtle distinctions in symbolic status. Even most
character encoders seem to be deficient in this regard.

So I have already granted that if the standard starts encoding
symbols called FLAG SYMBOL FOR JP, using a representative glyph
that looks like the flag of Japan, that people *will* interpret
that as you suggest. A flag is a flag, and we don't want to
be bothered with trying to parse out what the meaning of "is"
is, doggonit.

So please go back and re-read one more time what I suggest. We
need 10 compatibility characters for the gaiji sets. Since
encoding flags will get us into trouble, we should *not* encode
flags. We encode something that neither looks like nor is called
a "flag", as a compatibility character, in *this* case.

> In fact many of these emoji characters, not only the flags, are
> similar to symbols which form parts of other much larger and well
> established sets of symbols (weather symbols, hand signs, map symbols
> and so on) which it probably will make sense to encode at some time.

See above. I agree that they do come in sets. The proposal makes
that clear as well, by the way it is organized into semantic groups.
I disagree about the advisability of encoding more "hand signs",
which I think is a useless rathole, but we already have a bunch
of weather symbols and map symbols, and will no doubt encode a
few more of them in the future.

> If we are going to treat some emoji as normal characters and encode
> them in existing symbol blocks - and/or unify some of them with
> existing symbols - but treat others only as compatibility characters
> of some kind then the problem is: Where to draw the line?

That is why smart people need to get together and work on proposals.
Because lines need to be drawn, and they don't draw themselves.

>
> Those symbol characters which form parts of larger well established
> sets of symbols then
> need consideration in that context.

Which they are getting.

>
> That is one reason why I have suggested elsewhere that it might be an
> idea to consider encoding the whole lot as a special block of
> interoperability characters with names like EMOJIXXX and forget about
> unifying them with existing characters or encoding any of them in
> existing blocks and so on.

As a blanket solution, this is a bad idea.

For perhaps 80% of the emoji symbols not unified with existing
characters, this is fine -- and is *exactly* what the proposal
currently does, with the significant majority of the emoji symbols
just lumped in a big block at U+1F300..U+1F5XX.

The other 20% are where some more careful assessment of allocation
is in order, and where placement in other blocks probably makes
sense. For example, given the ARIB extension of squared ideographic
symbols, it makes no sense whatsoever not to add the few additional
squared ideographic symbols from the Japanese telco gaiji sets
into that block at this point.

> In the end I suspect that keeping the emoji together as a
> "compatibility set" might be the most practical and least
> controversial way of ensuring the complete set of emoji characters
> needed for interoperability gets encoded quickly.

It is the 80% solution -- not the 100% solution.

> Otherwise it seems
> quite likely there may continue to be lengthy discussion about
> particular characters, unifications, flags, etc. etc. as the proposal
> moves forward.

Why, if the 20% (which doesn't include the 10 EMOJI SYMBOLS WHICH
HAVE USED FLAG ICONS IRRESPONSIBLY TO REFLECT WIDESPREAD BUT
DENIGRATED PRACTICE OF INDICATING COMMON LOCALES OF INTEREST
TO THE PHONE COMPANIES IN QUESTION) don't include any of
the characters that people have been digging in their heels
to oppose (the horror!) encoding for?

> After all what is being proposed is a bunch of interoperability
> characters - why not simply call them that?

Personally, I don't much care what the chunk of compatibility
emoji in the U+1F300..U+1F5XX block are called. I'd be fine
with prefixing them all with EMOJI SYMBOL FOR... if that would
be what it would take to get people to stop obsessing about
their potential ontological status as the UCS WORLD STANDARD
SYMBOL FOR MEAT ON A BONE, or whatever it is that is particularly
bothering people about particular ones.

--Ken

Reply all

Reply to author

Forward