double quotation marks assigned to keys can corrupt keyboard xml code

26 views
Skip to first unread message

rno...@ling.upenn.edu

unread,
Aug 30, 2016, 8:03:02 PM8/30/16
to Ukelele Users
I assigned the double quotation curly marks (left and right) —  u+201C and u+201D — to two different keys on a keyboard by typing them in (from a keyboard which already had them) in the “Enter the new output string” dialog text entry box.

After this change the keyboard file in question was no longer visible when I tried to add it to the keyboard input menu. The file still opened in Ukelele though, so wasn’t totally corrupt. It’s just that the Mac didn’t think it was a keyboard file (I added the .keylayout label for good measure, but no change).

I assumed the file had been corrupted somehow and would have to be given up for lost. But then I remembered there is this problem with double quotation marks. 

I opened the xml file as text and found that the relevant line of code had “ and ” inside " and ", 

i.e.

(keycode blah blah blah …)  = "”"   
and
(keycode blah blah blah …)  = "“"  

Evidently the character being mentioned — namely “ (or ”)  — was interpreted as the closing quotation mark of the xml, so the line was interpreted such that the key was bound to null, i.e. to "".

Of the course the rest of the code was then parsed improperly.

Changing “ and ” to “ and ” within the xml file solved the problem.

I don’t think entering the text “ and ” within Ukelele itself solves the problem. I tried this, then opened the xml file as text and found that it had “ and ” in it.

I am not sure if double quotation marks in general (or just curly ones) create the problem, or if double quotation marks always cause a problem. Maybe it’s just the closing quotation mark (that is at least conceivable, but I haven’t tested it.)

This is not the first keyboard design where I had the problem. Fortunately I had already found this bug several years ago and I remembered it again this time.

It’s a vicious little bug, but should be easy to fix: just make sure that when the xml is generated by Ukelele it will always substitute “ and ” for “ and ” when the latter are entered by the user.

 (OS 10.11.6 on MacBook Pro)

John Brownie

unread,
Sep 5, 2016, 3:17:54 AM9/5/16
to ukelel...@googlegroups.com
This can't be true, as I have valid keyboard layouts with those characters, and nearly all keyboard layouts would have them. There must be something else going on that is causing the issue. The only way to check it is to see the error message (if any) produced by the keyboard layout compiler. It would be in the system's console log, and searching for uchr should turn it up.

Anyway, it would be helpful to me to see the keyboard layout and experiment a bit with it myself, so could you mail it to me, please?
31 August 2016 at 3:03
--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
To post to this group, send email to ukelel...@googlegroups.com.
Visit this group at https://groups.google.com/group/ukelele-users.
For more options, visit https://groups.google.com/d/optout.

--
John Brownie
In Finland on furlough from SIL Papua New Guinea

Gé van Gasteren

unread,
Sep 5, 2016, 5:17:50 AM9/5/16
to ukelel...@googlegroups.com
There something fishy going on here, indicated by the OP's sentence: "I added the .keylayout label for good measure, but no change."

That extension is added automatically, as far as I know. Maybe you edited the XML after Ukelele had produced it, in a program like Tex-Edit that doesn't support Unicode, or some such thing?

To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-users+unsubscribe@googlegroups.com.

To post to this group, send email to ukelel...@googlegroups.com.
Visit this group at https://groups.google.com/group/ukelele-users.
For more options, visit https://groups.google.com/d/optout.

--
John Brownie
In Finland on furlough from SIL Papua New Guinea

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-users+unsubscribe@googlegroups.com.

rno...@ling.upenn.edu

unread,
Sep 5, 2016, 6:17:22 AM9/5/16
to Ukelele Users
Thank you for your interest in this issue. 

Let me explain that it was only when the keyboard was not recognized by the Mac in Keyboard Preferences (when I tried to add it to the Inputs menu) that I opened the file as a text file.

I believe I had to remove the “.keylayout” extension, or else Ukelele would open it instead. (I now realize I probably could have changed the preferred application to open the file with.) I probably added the extension back after editing it as a text file, at which point it opened in Ukelele again.

In any case, it was when I examined the xml as a text file that I discovered the problem — or what I assume is the problem — with the double quotation marks, and, as I said in the original post, when I changed _”_ and _“_ to “ and ” the problem was solved.

I did not open the file in question as a text file *before* it ceased working. Therefore it cannot possibly have been because I *later* opened it as a text file that it did not work. 

The issue of whether I did or did not add “.keylayout” to the file name in question is immaterial. It is however significant that I opened the file as a text file to examine its contents, because that is when I found what I think was the problem. 

In any event, if it was not the problem, then it was a truly remarkable coincidence that when I changed the text, as I described, the keyboard worked again.

I agree entirely that it is peculiar that this problem, whatever its exact nature, has gone unnoticed, although I insist that I have had this problem in the past as well, so this instance was not unique for me. 

Most keyboards do have double quotation marks, as John said, so why has this not been a problem before? However, the double quotation marks in question are curly ones. While it is true that most keyboards have double quotation marks, not all have curly ones. Many users rely on the software to make the automatic substitution of curly for straight quotation marks. My keyboards have dedicated keys for the curly ones. Could this be relevant?

The problem is entirely reproducible, as I’ll explain now:

STEP (1)

I have a keyboard “Pahlavi.keylayout”. It currently works correctly and is recognized by the OS.

I duplicate the keyboard, open it, and call the new copy “Pahlavi2.keylayout”. I also change the ID number and “language” text (it is a Unicode keyboard), so that the OS should think it is a different keyboard than the original one.

Then I go to the keys that have _“_ and _”_ assigned to them. I click on each of them, and assign _“_ and _”_ to them, respectively, by typing the characters _“_ and _”_ in the “Enter a new output string” dialogue box. I type the characters using a keyboard which has dedicated keys for curly double quotes. I mention this because I am not relying on any system-magic for changing straight quotes to curly quotes, for example: nothing like that is happening here.

I save the keyboard, close it, and move it to the Keyboard Layouts folder with all the other keyboards I have. I go to Keyboard Preferences and try to add it to the input menu. 

It is not on the list. 

STEP 2

I duplicate (in the Finder) the file “Pahlavi2.keylayout” and call the new one “Pahlavi3.keylayout” — I say this so you will not think I am toying with the original file which is not working — I remove the suffix “.keylayout” so the file is now called “Pahlavi3”. I open this *duplicate file* as a text file. It opens in Xcode.

I see in the text file the following lines:

<key code="30" output="”"/>

.....

<key code="33" output="“"/>

....


I replace the lines in question with the following lines


<key code="30" output="&#x201D;"/>

....

<key code="33" output="&#x201C;"/>


I save the file and close the file.


I add the suffix “.keylayout” to the filename again. 



STEP 3.


I remove “Pahlavi2.keylayout” from the Keyboard Layouts folder. 


I move “Pahlavi3.keylayout” into the Keyboard Layouts folder.


I open Keyboard Preferences. 


In the “Other” list there now appears a keyboard with the name “Pahlavi2”.


I have done this twice now. It is the same thing each time. Once the double quotation marks are replaced in the xml file, the keyboard is recognized again by the OS.


You can see for yourselves. I am attaching the three files to this post.


“Pahlavi.keylayout” — this one works.


“Pahlavi2.keylayout” — this one does not work. It was produced by the process explained in STEP 1.


“Pahlavi3.keylayour” — this one works. It was produced from “Pahalvi2.keylayout” by the process explained in STEP 2.


Be aware that many of the keys in the keyboard have been assigned unicode points which are not officially designated yet. The keyboard is for Book Pahlavi, and there is no unicode standard yet. I am using for the most part the unicode points which are assigned to Book Pahlavi in the proposal of Pournader 2013, for use with a font of my own design. I have no idea if the extensive use of unassigned unicode points has anything to do with the problem, but I thought I should mention it.


I hope that you find this helpful and that I have now made everything entirely clear.











To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.

To post to this group, send email to ukelel...@googlegroups.com.
Visit this group at https://groups.google.com/group/ukelele-users.
For more options, visit https://groups.google.com/d/optout.

--
John Brownie
In Finland on furlough from SIL Papua New Guinea

--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
Pahlavi.keylayout
Pahlavi2.keylayout
Pahlavi3.keylayout

Sorin Paliga

unread,
Sep 5, 2016, 6:27:51 AM9/5/16
to ukelel...@googlegroups.com
Hello

If I am allowed. When you post an issue, you must describe correctly and entirely what you did. When Gé correctly noted ‘There something fishy going on here, indicated by the OP's sentence: "I

added the .keylayout label for good measure, but no change."
he made you describe correctly what you did, or—at least—in more detail. When you hide details, or consider them irrelevant, you are misguiding all those wishing to help you. In fact, you got the file corrupt by modifying it manually, that was the issue, if I correctly interpret your last message.

<Pahlavi.keylayout><Pahlavi2.keylayout><Pahlavi3.keylayout>

rno...@ling.upenn.edu

unread,
Sep 5, 2016, 6:36:56 AM9/5/16
to Ukelele Users
I did not corrupt the file by modifying it manually! How many times do I have to explain this fact? 

I apologize if my first message appeared to suggest that I did.

If you read my second message — carefully — you will see that that is not the case.

I am not hiding anything. I have now made the process leading up to the problem entirely explicit. I have provided three files for everyone to look at who might be interested in the problem. 

I insist, I did not open the file as a text file prior to it not working. 

I never said I did this, and I did not do it. 

I opened the file in question as a text file *after* it stopped working. I fixed the problem by fixing the xml code. 

Is this now entirely clear??

John Brownie

unread,
Sep 5, 2016, 7:03:04 AM9/5/16
to ukelel...@googlegroups.com
I have done a bit of digging around, and the problem doesn't appear to be the curly quotes. The XML parser chokes on a different line, where the output string is U+10BBF. Turning that into an entity makes it work. There is something weird going on in Apple's XML parser, in that it rejects some valid XML, and some fiddling with the XML makes it work again.

The solution would be to make all the non-ASCII characters into escaped entities, but I see now that simply doesn't work at present! I'll work on making that happen correctly.

BTW, I suspect Sorin replied to your previous message, but his reply came after your reply to Gé.

John

rno...@ling.upenn.edu

unread,
Sep 5, 2016, 7:14:17 AM9/5/16
to Ukelele Users
So in a sense it *was* a remarkable coincidence about the double quotation marks...

If I understand you correctly, any harmless modification of the xml file would have fixed the problem, or maybe just opening it as xml and closing it again? It’s kind of ironic then, that I imagined that it was the double quotation marks, just because I had done the some thing to “fix” the problem once before.

This makes me wonder if some other output string besides U+10BBF can cause the problem, because when I had this problem once before, a number of years ago, I am sure I was not using that codepoint. 

And everyone ... I am sorry if I got a little testy, I just felt rather misunderstood! 

I am glad in the end this may have been helpful somehow.

rno...@ling.upenn.edu

unread,
Sep 5, 2016, 7:45:51 AM9/5/16
to Ukelele Users
Still, I am puzzled by the following fact: U+10BBF was assigned to a key in an earlier version of this keyboard which was working fine.

If the parser chokes on U+10BBF, why didn’t it choke on the same output string in an earlier version? Or does the problem have nothing to do with U+10BBF in particular?

It was only when I added punctuation (including, as it turned out, the curly quotes) to the keyboard that it stopped being recognized by the OS. 

(You see, I didn’t just pull the connection with punctuation out of thin air ... this was partly why I assumed that it was the curly quotes that was the problem.)

This still seems utterly mysterious to me. But this is far beyond what I know anything about.

John Brownie

unread,
Sep 5, 2016, 8:07:08 AM9/5/16
to ukelel...@googlegroups.com
Yes, it's a real mystery, because it relies on internal details of the XML parser that Apple uses when compiling keyboard layout files, and that's not visible to anyone but Apple. I'll file a bug with them, but any change will be a long time coming, I suspect.

It's not that particular code point, it's more or less random, I suspect. I wonder if it's to do with code points outside the BMP, but it's really hard to know what's happening without access to the code of the actual parser.

I think I have a solution worked out, but need to test it some more. Hopefully I'll have a new version (3.1b3) in the next day or two.
5 September 2016 at 14:45
Still, I am puzzled by the following fact: U+10BBF was assigned to a key in an earlier version of this keyboard which was working fine.

If the parser chokes on U+10BBF, why didn’t it choke on the same output string in an earlier version? Or does the problem have nothing to do with U+10BBF in particular?

It was only when I added punctuation (including, as it turned out, the curly quotes) to the keyboard that it stopped being recognized by the OS. 

(You see, I didn’t just pull the connection with punctuation out of thin air ... this was partly why I assumed that it was the curly quotes that was the problem.)

This still seems utterly mysterious to me. But this is far beyond what I know anything about.


5 September 2016 at 14:14
So in a sense it *was* a remarkable coincidence about the double quotation marks...

If I understand you correctly, any harmless modification of the xml file would have fixed the problem, or maybe just opening it as xml and closing it again? It’s kind of ironic then, that I imagined that it was the double quotation marks, just because I had done the some thing to “fix” the problem once before.

This makes me wonder if some other output string besides U+10BBF can cause the problem, because when I had this problem once before, a number of years ago, I am sure I was not using that codepoint. 

rno...@ling.upenn.edu

unread,
Sep 9, 2016, 5:34:45 PM9/9/16
to Ukelele Users
I am still having some issues of a similar nature with Ukelele Version 3.1b3 (3.1.0.105). I am not sure if they were supposed to be solved by the latest update or not (although the Pahlavi keyboard I posted before seems to compile just fine now).

In this particular case the keyboard (a different one) was affected when I added U+2E31 “WORD SEPARATOR MIDDLE DOT”.  Again, when I opened the non-functioning keyboard in XCode and replaced “⸱” with “&#x2E31;” the keyboard was recognized by the OS and I was able to add it to the inputs menu.

This time I feel relatively confident that U+2E31 must be, at least in part, responsible for the problem because it was the *only* character I added to the keyboard in question before it stopped working. (I did, to be completely clear, remove another output from the same key, but I do not think this matters, because the problem happened to me again with a third keyboard when all I did was add U+2E31).

Because of my old theory about the double quotation marks the first thing I did to the file was to make the same change as last time (see previous posts). This maneuver however did not work. So obviously they are not the real issue, at least not always. 

But when I changed “⸱” to “&#x2E31;” the keyboard worked again. This happened two times, so it was not an accident. 

From this I conclude that not just *any* innocuous change to the code suffices to make the XML parser happy again, it must be more complicated than that.

Completely off the top of my head I am guessing that maybe the parser does not choke on specific code points but maybe on certain (totally mysterious) combinations of them, when they happen to be in the same keyboard?

I am attaching here the non-functioning keyboard in case that would be helpful. It is mostly for Avestan script (U+10B00 to U+10B3F), but U+2E31, now recommended for use with Avestan, is from the Supplemental Punctuation sector (U+2E00 to U+2E7F0.

Thanks
Avestan2.keylayout

John Brownie

unread,
Sep 10, 2016, 1:31:44 AM9/10/16
to ukelel...@googlegroups.com
Well, the offending character that the keyboard layout compiler chokes
on is actually &#x10B0C;. I think that it is indeed some obscure
combination of characters that causes the issue.

To get around it in general, turn on the option (which now works in the
most recent beta of 3.1) of converting non-ASCII characters to hex codes
in the XML file.

John
Reply all
Reply to author
Forward
0 new messages