Dead key for Unicode flags?

179 views
Skip to first unread message

Andrew Archibald

unread,
Sep 23, 2020, 3:04:50 AM9/23/20
to Ukelele Users
Is it possible to create a dead key for Unicode flags that accepts the two-letter country code? For example "<dead> d e" would insert 🇩🇪. This would be similar to the built-in macOS Unicode hex input keyboard that takes four keys and outputs a Unicode character. As there are 676 two-letter combinations that could make up a flag, is there a way to specify the dead key combinations programmatically or in text?

Thanks,
Andrew

Gé van Gasteren

unread,
Sep 23, 2020, 4:10:33 AM9/23/20
to ukelel...@googlegroups.com
Hi Andrew,

A key can be either a regular key OR a dead key, not both. So if you want a combination like  d e  to produce another character, you would have to make the  d  a dead key and in the process it would lose its normal function.

The simplest way to implement this would be to set aside 1 key on the keyboard to initiate such flag sequences and make that key a dead key.
That way, the keyboard will still function like before, except for that 1 key.
It could also be a key combination, like Shift-Option-`
That dead key would give access to its own set of key assignments, and in that set, all the alphabet keys would be defined as dead keys, so that three-key sequences would each define a language flag. In this example:  Shift-Option-`   d      e      where the first two keystrokes are dead keys and the third isn't.

Obviously, you would end up with a massive amount of dead key states in that layout, each with its own keyset.
The example assumes you want to use two-letter abbreviations for the languages. If you want to use three letter abbreviations, you'll obviously need another level of dead keys and things will become really complicated.

In principle, such a thing can be programmed, because custom keyboard layout files are in XML, i.e. just text, but in practice, I suspect writing the code might be more work than assigning the keys manually in Ukelele.

One thing I want to add: There is no clear relationship between languages and countries, and therefore between languages and country flags. Some countries have multiple languages, while some languages are spoken in multiple countries. So at some point in this project, you'll have to decide how you want to cut through that chaos…

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ukelele-users/8028622a-b10f-429b-88d4-c25caeb55055n%40googlegroups.com.

Andrew Archibald

unread,
Sep 23, 2020, 5:17:36 AM9/23/20
to ukelel...@googlegroups.com
Hello,

Thanks for your response. Let me provide some more details.

Each Unicode flag consists of two codepoints: two letters representing the country code. For example, the text 🇩🇪🇩🇰 consists of these four characters:
U+1F1E9 : REGIONAL INDICATOR SYMBOL LETTER D
U+1F1EA : REGIONAL INDICATOR SYMBOL LETTER E
U+1F1E9 : REGIONAL INDICATOR SYMBOL LETTER D
U+1F1F0 : REGIONAL INDICATOR SYMBOL LETTER K

So if ` is the dead key, then I’d want to be able to press `de for 🇩🇪 or `dk for 🇩🇰. It would be trivial to do it as `d`e or `d`k instead and trigger the dead key for each letter, but isn’t as elegant.

The Unicode Hex Input keyboard allows you to type option+2+2+0+0 to get ∀, for example. In that case, option is not a dead key and if four keys are not pressed while option is held, nothing is outputted. Here’s a paste of the keylayout: https://pastebin.com/UeArUxFf

In the Unicode hex keylayout, the actions are multiplied by 16, evidently to move to the next digit. However, I’m not sure why the ranges are needed. The output field also doesn’t make sense to me because not all 2 billion Unicode codepoints are specified, so why these specific ones?

<action id="15">
    <when state="none" next="4368"/>
    <when state="1" through="256" output="&#x000F;" multiplier="16"/>
    <when state="257" through="512" output="ဏ" multiplier="16"/>
    <when state="513" through="768" output="‏" multiplier="16"/>
...
    <when state="3585" through="3840" output="" multiplier="16"/>
    <when state="3841" through="4096" output="" multiplier="16"/>
    <when state="4097" through="4352" next="16" multiplier="16"/>
    <when state="4353" through="4368" next="4112" multiplier="16"/>
</action>

Thanks,
Andrew

You received this message because you are subscribed to a topic in the Google Groups "Ukelele Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukelele-users/WEhcd4dzoHI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukelele-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ukelele-users/CAOH1hL_DtienkCDdiAtEizP_vtgO2d-Ebv18yDuJ_%2BVn9mQN%3DA%40mail.gmail.com.

Message has been deleted

Tom Gewecke

unread,
Sep 23, 2020, 5:58:50 AM9/23/20
to ukelel...@googlegroups.com


On Sep 23, 2020, at 5:50 AM, Gé van Gasteren <gevang...@gmail.com> wrote:



The Unicode Hex Input keyboard layout "only" addresses the Unicode BMP, 

You can produce characters beyond the BMP by holding down option and typing the 8 hex codes of the utf16 sequence for them.

Geke

unread,
Sep 23, 2020, 6:03:50 AM9/23/20
to Ukelele Users
Hi Andrew,

I don't think you understand the concept of dead keys, to be honest.
You can read all about it in the Ukelele PDF manual, available from the Help menu.

The Unicode Hex Input keyboard layout "only" addresses the Unicode BMP, which contains 16^^4 = 65536 code points.
In other words, it uses a sequence 4 hex symbols (range: 0123456789abcdef) and therefore takes 4 keystrokes to enter.
Of these, the first 3 are dead keys. In your example:
1. Option-2
2. Option-2
3. Option-0
4. Non-dead key 0

In this Unicode Hex Input layout, the output is produced by calculation, which is different from how a Ukelele-made keyboard layout does it: by looking up each output in a table.
Unfortunately, the connection between language abbreviations and country flag code points is not as straightforward as that, to put it mildly.
So it'll be a big job either way.

An alternative approach would be to use an "auto-correct" feature like MS Word has, which replaces certain strings by others.
E.g. you could define abbreviations: cfde, cffr, cfen, etc. to be replaced by the flag characters.

Hope this helps…

Geke

unread,
Sep 23, 2020, 6:08:39 AM9/23/20
to Ukelele Users
I guess this works by using surrogates? Or are there planes that need more than two utf16 code points already?!

Thanks for the addition, and sorry for the deletion :)
I had written something about Ukelele's way to address higher planes that wasn't correct and I couldn't edit the post, so I deleted it and posted the new version.

Tom Gewecke

unread,
Sep 23, 2020, 6:37:35 AM9/23/20
to ukelel...@googlegroups.com


> On Sep 23, 2020, at 6:08 AM, Geke <gevang...@gmail.com> wrote:
>
> I guess this works by using surrogates? Or are there planes that need more than two utf16 code points already?

Yes, 2 utf16 surrogates will cover all of Unicode.

Tom Gewecke

unread,
Sep 23, 2020, 7:45:33 AM9/23/20
to ukelel...@googlegroups.com


On Sep 23, 2020, at 5:17 AM, Andrew Archibald <and...@aarchibald.com> wrote:


In the Unicode hex keylayout, the actions are multiplied by 16, evidently to move to the next digit. However, I’m not sure why the ranges are needed. The output field also doesn’t make sense to me because not all 2 billion Unicode codepoints are specified, so why these specific ones?

Unicode doesn’t have 2 billion code points, only 1,114,112.

Tom Gewecke

unread,
Sep 23, 2020, 8:30:22 AM9/23/20
to ukelel...@googlegroups.com
On Sep 23, 2020, at 5:17 AM, Andrew Archibald <and...@aarchibald.com> wrote:

 It would be trivial to do it as `d`e or `d`k instead and trigger the dead key for each letter.

That’s  the simplist way I think.  Another option would be to create a different kind of custom input method, as mentioned in


or create a lot of text replacements in system preferences > keyboard > text > replace with


Gé van Gasteren

unread,
Sep 23, 2020, 8:57:51 AM9/23/20
to ukelel...@googlegroups.com
Hey, I just discovered that the hard work has been done for you:
All you need to do is to create one level in your keyboard layout where a keypress produces not a letter, but a "Regional Indicator Symbol Letter".
When you type two of those in sequence, the OS replaces the pair by the corresponding country’s flag symbol.

Typing those Regional indicator symbol letters could be done by holding down Option and typing the keys, if you don’t need the standard Option set. Or use a less-used combination like Shift-Option + the letter keys.

The beauty of the above method is that you can hold down the modifier key(s) while you type the two letters.
With a dead key sequence, you can certainly do this as well, using – indeed – some 4-keystroke sequence like ` d ` e

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.

Sorin Paliga

unread,
Sep 28, 2020, 1:56:11 PM9/28/20
to ukelel...@googlegroups.com
UKELELE triggers a mechanism which may invoke chars existent in a given font. Many fonts are now large. Your wish may be achieved if a given font includes flags in its repertoire: I think I saw such a font somewhere. Flags are not codified in Unicode, so that potential font will use Private Use Area (PUA), so the keylayout must invoked precisely that code.

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.

Patrick Chew

unread,
Sep 28, 2020, 2:12:51 PM9/28/20
to ukelel...@googlegroups.com
All:

Andrew:
In reviewing your ask, you might want to consider a "sticky modifier" (possibly: ALL CAP) that would set each of your standard keyboard letters as the corresponding flag-related letter codepoints.

cheers,
- Patrick

Tom Gewecke

unread,
Sep 28, 2020, 2:19:51 PM9/28/20
to ukelel...@googlegroups.com


> On Sep 23, 2020, at 3:10 AM, Sorin Paliga <sorin....@gmail.com> wrote:
>
> Flags are not codified in Unicode,

I think they were added in Version 6 10 years ago.

Sorin Paliga

unread,
Sep 28, 2020, 3:16:11 PM9/28/20
to ukelel...@googlegroups.com
Really? It seems I lost that train, but it also seems it has not stopped somewhere so I still can catch it.

>
> --
> You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ukelele-users/800D4F31-D905-45E2-A3C3-9A2146969A86%40gmail.com.

Gé van Gasteren

unread,
Sep 29, 2020, 7:04:32 AM9/29/20
to ukelel...@googlegroups.com
Dear Sorin,

There’s lots of stuff in the "new" planes beyond the basic multilingual plane, including these flags.
Interestingly, although they do have their own code points, the standard is to type/enter them through two "country letter" code points, e.g. for Romania 🇷 🇴 (with a space between the two characters) and  🇷🇴 (if typed without a space):
https://apps.timwhitlock.info/unicode/inspect/hex/1F1F7/1F1F4
Some platforms don’t have the conversion routine built in and will show the two letters, while others show the flag.

Please note that these code points have 5 hex digits, whereas all code points in the BMP have only 4.

Thanks to Patrick Chew for the weblink! Here’s another one:

Andrew Archibald

unread,
Sep 29, 2020, 10:24:54 AM9/29/20
to ukelel...@googlegroups.com
A benefit of the two letter input is that the codes for all 26*26 country codes already exist, so if a new country comes into existence, that country’s country code can be rendered as a flag without adding any new codepoints to Unicode.

Andrew

You received this message because you are subscribed to a topic in the Google Groups "Ukelele Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukelele-users/WEhcd4dzoHI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukelele-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ukelele-users/CAOH1hL9C%2BF26Ex1sfkGKTx5_9j3%2Bnh2gQtTf7F4Zu_TZ9epkKQ%40mail.gmail.com.

Gé van Gasteren

unread,
Sep 29, 2020, 1:06:34 PM9/29/20
to ukelel...@googlegroups.com
On Tue, Sep 29, 2020 at 4:24 PM Andrew Archibald <and...@aarchibald.com> wrote:
A benefit of the two letter input is that the codes for all 26*26 country codes already exist, so if a new country comes into existence, that country’s country code can be rendered as a flag without adding any new codepoints to Unicode.

Right, good idea. I also read something about backward compatibility or round-trip conversion with existing encoding schemes.

But what I wanted to ask: Have you made progress with this keyboard layout? If you like, please share it for our enjoyment :) 

Andrew Archibald

unread,
Sep 29, 2020, 1:59:33 PM9/29/20
to ukelel...@googlegroups.com
Since originally posting, I haven’t had much free time, so I haven’t figured out the country codes yet. Though once I make progress on my layout, I’ll certainly post it here.

As I don’t want to create bindings for all 26*26 possible country codes, I’ve been familiarizing myself with the XML keylayout format. From a computer science background, it looks to me like a state machine with states for each modifier and keypresses transition between states or output a character sequence. Thus it seems like I should be able to make a binding such that I could press option+f d e for 🇩🇪, that is, without holding option the whole time. Option+f would transition to the “flag” state. In the flag state, letters A-Z (without needing modifier) would transition to states “flagA” through “flagZ”. In the flagA state, letters A-Z would output «letter A»«letter A» through «letter A»«letter Z». That would be 1+26 states and 1+26*26 transitions. I’ve been drafting a program in the Go programming language that will generate these states.

As a stopgap, I made a quick US-based layout with German characters to avoid the need for the option+u umlaut dead key and assign fancy quotes to sensible keys.

A → ä Ä
O → ö Ö
U → ü Ü
S → ß ẞ
< → ‹ «
> → › »
[ → ‘ “
] → ’ ”
\ → ‚ „

Question: can caps lock be used as a non-sticky modifier like a second option key? I want to use it like the compose key on Linux systems (i.e. compose+AE → Æ).

I plan to setup option+x for 4-digit Unicode input and option+shift+x for 6-digit Unicode input. I may also make bindings for the key optical characters ⌘⇧⌥⌃⇪.

Andrew

--
You received this message because you are subscribed to a topic in the Google Groups "Ukelele Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukelele-users/WEhcd4dzoHI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukelele-user...@googlegroups.com.

Tom Gewecke

unread,
Sep 29, 2020, 2:51:28 PM9/29/20
to ukelel...@googlegroups.com


> On Sep 29, 2020, at 1:59 PM, Andrew Archibald <and...@aarchibald.com> wrote:
>
> Thus it seems like I should be able to make a binding such that I could press option+f d e for 🇩🇪, that is, without holding option the whole time. Option+f would transition to the “flag” state. In the flag state, letters A-Z (without needing modifier) would transition to states “flagA” through “flagZ”. In the flagA state, letters A-Z would output «letter A»«letter A» through «letter A»«letter Z». That would be 1+26 states and 1+26*26 transitions. I’ve been drafting a program in the Go programming language that will generate these states.

That would be a very interesting tool if you oould automate the generation of the xml code needed for so many two level dead keys.

>
> I plan to setup option+x for 4-digit Unicode input and option+shift+x for 6-digit Unicode input.

So the Unicode Hex input source would be a subset of your layout? That would also be interesting.

What are you referring to by 6-digit unicode? Most of Unicode has either 4 or 5 hex digits. For characters with more than 4 digits (those beyond the BMP), Apple’s unicode hex input source makes them by holding down option and typing the 8 digits of the utf16 version of the character.

Gé van Gasteren

unread,
Sep 29, 2020, 5:43:08 PM9/29/20
to ukelel...@googlegroups.com

Wow, that’s a lot of big projects!

As Tom already hinted at ("two-level dead keys"), your idea with the example Option-f D E works only if the D is a dead key as well.
The reason is that you can enter a dead-key state by Option-f, but that state is automatically left after the next keypress, unless that is a dead key again.
In other words: any dead-keypress enters the state assigned to it, and any non-dead-keypress leaves that state again after producing output.
Without this automatism, one would need a mechanism for leaving the dead-key state, and I don’t know of any. At least not in the standard keyboard layout; it’s different for other input methods, of course.

Question: can caps lock be used as a non-sticky modifier like a second option key? I want to use it like the compose key on Linux systems (i.e. compose+AE → Æ).

You can make the CapsLock key work like the Command key, the Option key, or the Control key (in the System Preferences > Keyboard > Modifier keys…) but that would just be a duplicate of the existing one(s), not an *extra* modifier key. 
To access more levels (modifier sets), I think there’s no other way than to use combinations of two or more modifier keys, like Shift-Option.

Andrew Archibald

unread,
Sep 29, 2020, 6:22:00 PM9/29/20
to ukelel...@googlegroups.com
What are you referring to by 6-digit unicode?  Most of Unicode has either 4 or 5 hex digits.  For characters with more than 4 digits (those beyond the BMP), Apple’s unicode hex input source makes them by holding down option and typing the 8 digits of the utf16 version of the character.

The highest codepoint is U+10FFFF, 6 hex digits. I prefer to use the UTF-32 codepoints rather than splitting non-BMP codepoints into UTF-16 surrogate pairs. If however the keylayout format does not support UTF-32 codepoints, it won’t be a simple case of extending the multiplies for two more digits and there would have to be character encoding math to convert to the surrogate pairs behind the scenes.


As Tom already hinted at ("two-level dead keys"), your idea with the example Option-f D E works only if the D is a dead key as well.

Yeah, this is what I’m thinking in pseudo-XML/pseudo-state machine. I had left it implicit in my earlier email, but the first letter would be a dead key and it would only be dead if the option+f dead key is active. Notice that although 🇦🇦 (AA) is not a valid flag, it and all others would still be generated. If, for some reason, you want just a single letter of the country code, you could press space after the first letter.

state “bare”:
→ output "a"
state “option”:
→ output “ä" (for example)
→ dead “flag”
dead “flag”:
→ dead “flagA”
 dead “flagB
→ dead “flagZ”
dead “flagA”:
a → output “🇦🇦"
b → output “🇦🇧"
c → output “🇦🇨"
d → output “🇦🇩”
z → output “🇦🇿”
“ “ → output “🇦"
dead “flagB”:
a → output "🇧🇦"
b → output “🇧🇧"
z → output “🇧🇿”
“ “ → output “🇧"
dead “flagZ”:
a → “🇿🇦"
b → “🇿🇧”
...
z → “🇿🇿"
“ “ → output “🇿"

Wow, that’s a lot of big projects!

There’s more too! In this hypothetical tool that I’ll make, the user specifies the keys to trigger accented dead keys, then it computes all the accented letters that could be produced using your keyboard’s alphabet. This would also stack accented dead keys. For example, if you defined dead keys for diereses (option+u) and macron (option+m), then you could produce ǟ (Latin a with diaeresis and macron) with option+u option+m a or with option+m option+u a.

You can make the CapsLock key work like the Command key, the Option key, or the Control key (in the System Preferences > Keyboard > Modifier keys…) but that would just be a duplicate of the existing one(s), not an *extra* modifier key. 

Alternatively, I could bind caps lock to some key combo like option+q using Karabiner and my keyboard layout could have option+q be the dead key that starts the compose key.

Andrew

--
You received this message because you are subscribed to a topic in the Google Groups "Ukelele Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukelele-users/WEhcd4dzoHI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukelele-user...@googlegroups.com.

John Brownie

unread,
Sep 30, 2020, 2:20:56 AM9/30/20
to ukelel...@googlegroups.com
Lots of good comments here, but first a caution from me.

The XML standard says that you can encode any valid Unicode character, with the exception of the null character, U+0000. So any XML parser will remove any nulls. Since I have not written a custom parser to handle what is technically not well-formed XML, Ukelele removes all null characters, including those represented as code points (&#x0000; and similar). Normally this is not a problem, but it is with the Unicode Hex Input keyboard layout, since this has an explicit null in it.

The takeaway: If you edit a keyboard layout including hex input style features, you will likely need to edit the XML by hand afterwards to put the null back in where it has been removed.

More comments below.

Andrew Archibald wrote on 30/9/20 01:21:

What are you referring to by 6-digit unicode?  Most of Unicode has either 4 or 5 hex digits.  For characters with more than 4 digits (those beyond the BMP), Apple’s unicode hex input source makes them by holding down option and typing the 8 digits of the utf16 version of the character.

The highest codepoint is U+10FFFF, 6 hex digits. I prefer to use the UTF-32 codepoints rather than splitting non-BMP codepoints into UTF-16 surrogate pairs. If however the keylayout format does not support UTF-32 codepoints, it won’t be a simple case of extending the multiplies for two more digits and there would have to be character encoding math to convert to the surrogate pairs behind the scenes.
You won't have to generate the surrogate pairs, that is done automatically in the XML conversion process.
That's how I would approach it. But you don't need to have those last transitions, as the terminator mechanism will handle it. So for your flagA state, the terminator would be "🇦", and any character other than those specified to have output.

Wow, that’s a lot of big projects!

There’s more too! In this hypothetical tool that I’ll make, the user specifies the keys to trigger accented dead keys, then it computes all the accented letters that could be produced using your keyboard’s alphabet. This would also stack accented dead keys. For example, if you defined dead keys for diereses (option+u) and macron (option+m), then you could produce ǟ (Latin a with diaeresis and macron) with option+u option+m a or with option+m option+u a.
Looks a lot like the scenario I describe in the manual!


You can make the CapsLock key work like the Command key, the Option key, or the Control key (in the System Preferences > Keyboard > Modifier keys…) but that would just be a duplicate of the existing one(s), not an *extra* modifier key. 

Alternatively, I could bind caps lock to some key combo like option+q using Karabiner and my keyboard layout could have option+q be the dead key that starts the compose key.
The caps lock key can't be handled as a non-sticky modifier. To do something like this, you need to use Karabiner or similar, as you suggested.

John
--
John Brownie
Mussau-Emira language, New Ireland Province, Papua New Guinea
Kouvola, Finland

Andrew Archibald

unread,
Sep 30, 2020, 2:26:26 AM9/30/20
to ukelel...@googlegroups.com
John,

Ukelele removes all null characters, including those represented as code points (&#x0000; and similar)

Thanks for the warning and many thanks for building Ukulele!

Andrew

On 30 Sep 2020, at 00:20, John Brownie <john_b...@sil.org> wrote:



Reply all
Reply to author
Forward
0 new messages