New From Current Input Source removes functionality

88 views
Skip to first unread message

W Groleau

unread,
Feb 17, 2020, 5:08:38 PM2/17/20
to Ukelele Users
or maybe it's caused by adding language.

Someone used a version of Ukulele to create a bundle from Unicode Hex Input (UHI).  He then added "en" to it as language, which made the Press & Hold popups available.

But then I discovered that some of the original functionality of UHI doesn't work on the copy.  With UHI, if I hold down Alt while I press 011f011e01310130, I get ğĞıİ.  With the new one, which I renamed UHI+, the same input gets ğĞı.  In other words, at least one key sequence is now unsupported.

I then thought maybe he had a different keyboard or OS version or something, so I did the same New From Current Input Source and added "en" (and changed the icon).  I also edited the names (Unicode Hex Input copy → UHI+) which should have no effect.  Same dotted capital I still unavailable.  It's not major, because I can easily add it the the 'I' in the Press & Hold popup.  (Or add them as state-machine sequences in Ukulele).

As a retired software engineer, I understand the state-machine approach of a keyboard layout, and have even hand-edited my own before I knew about Ukulele.

The one given me and the one I created have very different XML, but the differences are things that should not affect the functionality (and apparently don't).  Interestingly, I cannot find any of the four characters in either XML file.

Any insight appreciated.

Tom Gewecke

unread,
Feb 17, 2020, 5:32:16 PM2/17/20
to ukelel...@googlegroups.com


On Feb 17, 2020, at 3:08 PM, W Groleau <Go...@Lang-Learn.org> wrote:

 In other words, at least one key sequence is now unsupported.

Very strange, my duplicated copy of Unicode Hex apparently fails to produce any character whose code ends in zero...

W Groleau

unread,
Feb 17, 2020, 5:37:33 PM2/17/20
to Ukelele Users
Is it possible that Ukulele or kluchrtoxml_64 altered one or more key codes?

Since Apple no longer provides the XML version, I can't check.

Gé van Gasteren

unread,
Feb 17, 2020, 5:58:27 PM2/17/20
to ukelel...@googlegroups.com
Good you found at least some system to the chaos…
Didn’t John mention a few times that there were problems with Apple’s conversion of (binary) keyboard layouts to XML?
You may have just found one of those.

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ukelele-users/C2053EBD-6F68-4AD5-A313-BEC84DE67C57%40gmail.com.

Tom Gewecke

unread,
Feb 17, 2020, 8:01:11 PM2/17/20
to ukelel...@googlegroups.com
On Feb 17, 2020, at 3:08 PM, W Groleau <Go...@Lang-Learn.org> wrote:

  Interestingly, I cannot find any of the four characters in either XML file.

I think some other part of the OS does the actually work of the Unicode Hex input system, and how the XML file relates to that is not clear  (perhaps similar to the IM’s for CJK scripts)

W Groleau

unread,
Feb 17, 2020, 8:20:45 PM2/17/20
to Ukelele Users
Actually, the XML file that Ukulele creates (using the embedded kluchrtoxml_64) looks very similar to the one Apple provided back before they started using the binary version.  It a complicated statemachine, where each key code either specifies another state or specifies a specific output according to the previous state.

If it were another part of the system, how could it be broken by the conversion?

Tom Gewecke

unread,
Feb 18, 2020, 12:07:51 AM2/18/20
to ukelel...@googlegroups.com


On Feb 17, 2020, at 6:20 PM, W Groleau <Go...@Lang-Learn.org> wrote:

Actually, the XML file that Ukulele creates (using the embedded kluchrtoxml_64) looks very similar to the one Apple provided back before they started using the binary version.  It a complicated statemachine, where each key code either specifies another state or specifies a specific output according to the previous state.

The xml seems strangely incomplete.  I can see about 1000 lines, with some Latin and some Chinese, but most of unicode appears to be missing.  I wonder how that gets generated?

W Groleau

unread,
Feb 18, 2020, 1:29:48 AM2/18/20
to Ukelele Users
If you're looking at it with a recent version of TextEdit, it initially doesn't load the whole file.  You have to scroll to the bottom to trigger it to load more lines.  It should have one keyboard element, with the end tag /keyboard the last visible thing in the file.  It's more than 930 lines.

John Brownie

unread,
Feb 18, 2020, 9:36:45 AM2/18/20
to ukelel...@googlegroups.com
I've attached the original version (I think) of the Unicode Hex Input
keyboard layout as XML.

The magic of the layout is in the states 0-15, with things like:

            <when state="2561" through="2816" multiplier="16"
output="&#xa00d;" />

What it tells the system to do is to take the current state number, S,
subtract state, F, multiply by the multiplier, M, and add that to the
output value, O, to produce a character, i.e. (S-F)*M+O. This allows the
system to produce 256 different outputs with a single line in the XML
file, as you'll see that the range from "first" to "through" is 256. If
you then look at the key elements for keyMap 3, you'll see references to
the actions for all the variants of 0-9 and a-f on a US keyboard.

The cool idea someone had was to put this keyboard layout into a bundle
(keyboard layout collection) and give it a language code, which enables
the press and hold functionality.

Ukelele should be able to make changes to this keyboard layout, but you
can't edit the magic parts with Ukelele, as there's no real point in
trying to change them. Though I think that you can only get the BMP code
points with it, as there are only four digits allowed. It's probably
possible to extend it to five or six to get the whole of Unicode, but I
don't really want to wrap my brain around how to handle varying numbers
of digits.

John

W Groleau wrote on 18/2/20 08:29:
--
John Brownie
Mussau-Emira language, New Ireland Province, Papua New Guinea
Kouvola, Finland
Unicode Hex Input.keylayout

Tom Gewecke

unread,
Feb 18, 2020, 10:01:35 AM2/18/20
to ukelel...@googlegroups.com


> On Feb 18, 2020, at 7:36 AM, John Brownie <john_b...@sil.org> wrote:
>
> I think that you can only get the BMP code points with it, as there are only four digits allowed.

You can get the whole of unicode by typing in the eight digits of the two utf-16 hex codes that correspond to code points beyond the BMP.

Thanks so much for explaining how this thing works!

John Brownie

unread,
Feb 18, 2020, 10:05:32 AM2/18/20
to ukelel...@googlegroups.com
OK, that's cool. You can use the surrogate pairs to access the whole of
Unicode with that keyboard layout, then.

John

Tom Gewecke wrote on 18/2/20 17:01:

Tom Gewecke

unread,
Feb 18, 2020, 10:09:39 AM2/18/20
to ukelel...@googlegroups.com
I’m still curious why my copy made with Ukelele messes up when the last of the 4 digits is a zero.

John Brownie

unread,
Feb 18, 2020, 10:12:38 AM2/18/20
to ukelel...@googlegroups.com
Tom Gewecke wrote on 18/2/20 17:09:
> I’m still curious why my copy made with Ukelele messes up when the last of the 4 digits is a zero.
No idea on that one. It may be an issue with the conversion tool, since
I am not sure that one has been extensively tested with that type of
keyboard layout. I'll experiment when I get some time.

John

John Brownie

unread,
Feb 18, 2020, 10:41:31 AM2/18/20
to ukelel...@googlegroups.com, Tom Gewecke
I have some clarity now on the problem. Technically, a null character
can't be present in an XML file, even if it's coded as &#x0000;, and
Ukelele respects that. However, the original keyboard layout has the
null for the last character output when the last digit is zero, and
Ukelele changes that to an empty string, which causes it to fail.

Time to dig into the XML specification and see if I have it correct!

John

W Groleau

unread,
Feb 18, 2020, 4:36:47 PM2/18/20
to Ukelele Users
Many thanks for the original layout.  I no longer have access to a MacOS that old.

Seems Apple must have made a lot of changes, unless the converter is responsible for the differences:
The old one starts with
<keyboard group="126" id="-1" name="Unicode Hex Input">
 
<layouts>
 
<layout first="0" last="0" modifiers="28" mapSet="a0" />
 
</layouts>

and the converted one with
<keyboard group="0" id="7297" name="UHI+" maxout="1">
   
<layouts>
       
<layout first="0" last="17" mapSet="164" modifiers="ec"/>
       
<layout first="18" last="18" mapSet="680" modifiers="ec"/>
       
<layout first="21" last="23" mapSet="680" modifiers="ec"/>
       
<layout first="30" last="30" mapSet="680" modifiers="ec"/>
       
<layout first="194" last="194" mapSet="680" modifiers="ec"/>
       
<layout first="197" last="197" mapSet="680" modifiers="ec"/>
       
<layout first="200" last="201" mapSet="680" modifiers="ec"/>
       
<layout first="206" last="207" mapSet="680" modifiers="ec"/>
   
</layouts>


W Groleau

unread,
Feb 18, 2020, 5:15:10 PM2/18/20
to Ukelele Users
Tom wrote:
You can get the whole of unicode by typing in the eight digits of the two utf-16 hex codes that correspond to code points beyond the BMP. 

Can you give me a specific eight digits you've tried?  So far all the ones I tried (with Apple's current binary) gave me an "unknown" glyph for the first four and a Chinese Character for the second four.
 

W Groleau

unread,
Feb 18, 2020, 5:19:01 PM2/18/20
to Ukelele Users
Regarding the converted version not being able to handle a last digit of zero: it does still work for U+5FE0 to print 忠
(Let's get right to the center of the heart of the matter!)

Tom Gewecke

unread,
Feb 18, 2020, 5:38:37 PM2/18/20
to ukelel...@googlegroups.com
Try d83dde00   

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.

Tom Gewecke

unread,
Feb 18, 2020, 5:41:24 PM2/18/20
to ukelel...@googlegroups.com


On Feb 18, 2020, at 3:19 PM, W Groleau <Go...@Lang-Learn.org> wrote:

Regarding the converted version not being able to handle a last digit of zero: it does still work for U+5FE0 to print 忠
(Let's get right to the center of the heart of the matter!)


Very nice!   I guess that must be one of the Chinese characters you can see in the xml.

I used the old xml provided by John to produce another copy which should hopefully fix that zero problem.


John Brownie

unread,
Feb 19, 2020, 1:48:46 AM2/19/20
to ukelel...@googlegroups.com, W Groleau
W Groleau wrote on 18/2/20 23:36:
That's a converter artefact. The multiple mapSet elements are to handle JIS keyboards more correctly.

John Brownie

unread,
Feb 19, 2020, 3:19:43 AM2/19/20
to ukelel...@googlegroups.com
I've been looking into how Ukelele handles the null character that is
actually present in the Unicode Hex Input file. It seems that the XML
parser silently ignores the null, returning an empty string when it
decodes &#x0000;, so it seems like there's no way to handle this apart
from hacking the XML parser, which is a very complicated piece of C.

The upshot of this is that Ukelele will currently (with no apparent
work-around) cripple the Unicode Hex Input keyboard layout. The only
thing to do (unless I can find a fix) is to manually edit the XML after
it has been saved by Ukelele.

Sorry!

Tom Gewecke

unread,
Feb 19, 2020, 10:24:58 AM2/19/20
to ukelel...@googlegroups.com


> On Feb 19, 2020, at 1:19 AM, John Brownie <john_b...@sil.org> wrote:
>
> The upshot of this is that Ukelele will currently (with no apparent work-around) cripple the Unicode Hex Input keyboard layout. The only thing to do (unless I can find a fix) is to manually edit the XML after it has been saved by Ukelele.

The “original version” which you attached to an email yesterday may be a good substitute. I used it to create a Unicode Hex bundle that does honor zeros and also has the popup menu. Perhaps it would be useful to include that file with the other stuff in the Ukelele download.
Reply all
Reply to author
Forward
0 new messages