Solving the problem of gibberish in Word 2011

33 views
Skip to first unread message

Ralph Hancock

unread,
Feb 28, 2015, 9:22:10 PM2/28/15
to ukelel...@googlegroups.com
Some time ago I mentioned a problem on this forum, and no one could provide an answer. Now I think I have got one, but it concerns the output of Ukelele.

I had written some keyboards for classical Greek, and had found that they worked properly in most applications, but *not* in MS Word 2011 (although they did work in older versions of Word). In Word 2011, if you typed a dead key for a classical Greek accent and then mistyped the following letter, for example hitting a consonant instead of a vowel, Word would throw a line of gibberish on to the screen which was very hard to delete. In short, the 'terminator' mechanism was not working. Also, several keyboard shortcuts didn't work, such as Command-b, Command-z and Command-w.

The popular classical Greek program GreekKeys, which works by providing ordinary Mac keyboard bundles, was also having this problem. And the people at GreekKeys have solved it, and have provided a fix at
This fix consists of replacing the Info.plist file with one in which one instruction is revised, as follows:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrogreek (us).papyrogreek (us)</string>
<key>TISIntendedLanguage</key>
<string>el</string>
</dict>

The important part of this revision is in the second-last line, where the keyboard language is declared as 'el' -- that is, simply as ordinary Greek.

In bundles made by Ukelele, if you select classical Greek as the language, the Info.plist file in the bundle has this instruction:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrogreek (us).papyrogreek (us)</string>
<key>TISIntendedLanguage</key>
<string>grc-Grek-GR-poly</string>
</dict>

-- that is, the keyboard language is declared as 'grc-Grek-GR-poly'. Word 2011 doesn't understand this declaration, and misbehaves. Evidently other applications simply ignore the instruction and don't throw up a problem.

However, the fact that this fix works raises the possibility that Ukelele may be producing Info.plist files that contain wrong information.

Well, I know how to sort out my Greek keyboards now. But there is a further problem, because I am also putting out Coptic keyboards. Here Ukelele gives this instruction in Info.plist:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrocoptic (us).papyrocoptic (us)</string>
<key>TISIntendedLanguage</key>
<string>cop-Copt-EG</string>
</dict>

-- that is, the keyboard language is declared as 'cop-Copt-EG'. There are no dead keys in the Coptic drivers to go wrong in Word 2011, but the problem with the Command combinations occurs. I can't declare Coptic as 'el' because the new Unicode page for Coptic no longer uses the standard Greek page as a base; it is completely separate. And as far as I know, the Mac OS doesn't recognise Coptic as a language anyway.

Can anyone suggest what the Info.plist instruction should specify here?

Thanks very much in advance for any help.

John Brownie

unread,
Feb 28, 2015, 11:22:10 PM2/28/15
to ukelel...@googlegroups.com
On 1/03/2015 12:22, Ralph Hancock wrote:
Some time ago I mentioned a problem on this forum, and no one could provide an answer. Now I think I have got one, but it concerns the output of Ukelele.

I had written some keyboards for classical Greek, and had found that they worked properly in most applications, but *not* in MS Word 2011 (although they did work in older versions of Word). In Word 2011, if you typed a dead key for a classical Greek accent and then mistyped the following letter, for example hitting a consonant instead of a vowel, Word would throw a line of gibberish on to the screen which was very hard to delete. In short, the 'terminator' mechanism was not working. Also, several keyboard shortcuts didn't work, such as Command-b, Command-z and Command-w.

The popular classical Greek program GreekKeys, which works by providing ordinary Mac keyboard bundles, was also having this problem. And the people at GreekKeys have solved it, and have provided a fix at
This fix consists of replacing the Info.plist file with one in which one instruction is revised, as follows:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrogreek (us).papyrogreek (us)</string>
<key>TISIntendedLanguage</key>
<string>el</string>
</dict>

The important part of this revision is in the second-last line, where the keyboard language is declared as 'el' -- that is, simply as ordinary Greek.

In bundles made by Ukelele, if you select classical Greek as the language, the Info.plist file in the bundle has this instruction:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrogreek (us).papyrogreek (us)</string>
<key>TISIntendedLanguage</key>
<string>grc-Grek-GR-poly</string>
</dict>

-- that is, the keyboard language is declared as 'grc-Grek-GR-poly'. Word 2011 doesn't understand this declaration, and misbehaves. Evidently other applications simply ignore the instruction and don't throw up a problem.

OK, then the fix is to make it Modern Greek, which will put "el" as the language code.

However, the fact that this fix works raises the possibility that Ukelele may be producing Info.plist files that contain wrong information.

Well, I know how to sort out my Greek keyboards now. But there is a further problem, because I am also putting out Coptic keyboards. Here Ukelele gives this instruction in Info.plist:

<dict>
<key>TISInputSourceID</key>
<string>org.sil.ukelele.keyboardlayout.papyrocoptic (us).papyrocoptic (us)</string>
<key>TISIntendedLanguage</key>
<string>cop-Copt-EG</string>
</dict>

-- that is, the keyboard language is declared as 'cop-Copt-EG'. There are no dead keys in the Coptic drivers to go wrong in Word 2011, but the problem with the Command combinations occurs. I can't declare Coptic as 'el' because the new Unicode page for Coptic no longer uses the standard Greek page as a base; it is completely separate. And as far as I know, the Mac OS doesn't recognise Coptic as a language anyway.

Can anyone suggest what the Info.plist instruction should specify here?

Thanks very much in advance for any help.

I don't know, but perhaps saying just Coptic would work. But, if OS X doesn't know about Coptic, then specifying Coptic won't help with the press and hold feature, so there's probably not a good reason to specify a language at all.

John
--
John Brownie, john_b...@sil.org or j.br...@sil.org.pg
Summer Institute of Linguistics      | Mussau-Emira language, Mussau Is.
Ukarumpa, Eastern Highlands Province | New Ireland Province
Papua New Guinea                     | Papua New Guinea

Sorin Paliga

unread,
Mar 1, 2015, 2:29:18 AM3/1/15
to ukelel...@googlegroups.com
Hello

The problem is that, unlike Linux or, perhaps, Windows, OS X does not need any info of this type, the only essential detail is the ID, but even this detail is of minor importance. In your case, if you mention a Unicode encoding, i.e. with negative number, it should work as a charm with any such keylayout, with or without diacriticals. 
Otherwise put, you put redundant information there, it is not necessary for the system to work properly. 
--
You received this message because you are subscribed to the Google Groups "Ukelele Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ukelele-user...@googlegroups.com.
To post to this group, send email to ukelel...@googlegroups.com.
Visit this group at http://groups.google.com/group/ukelele-users.
For more options, visit https://groups.google.com/d/optout.

Behnam Rassi

unread,
Mar 1, 2015, 10:07:30 AM3/1/15
to ukelel...@googlegroups.com
This is totally unrelated but since you mentioned Coptic I thought I mention my experience which may have some implications in your case.
I am making a font for all languages of Arabic script covered in Unicode. There is a newly added code u+0605 which is a date sign. The catch is that the date numbers used for that sign are supposed to be Coptic numbers. This is the only reason your topic reminded me of this!
So I tried to implement this in my font. But the OpenType instructions I wrote for it did’t work although the similar instructions for a similar sign works without any problem. After a while I noticed that u+0605 ‘directionality’ is not implemented in OS X text engine  (or Windows 7 that I tried). This means that you can type the code and it promptly shows up if available in the font, but ‘contextual manipulation’ on it doesn’t work because -I guess- in order to implement the OpenType instructions to a code, the text engine should recognize the BiDi and other behavioural patterns assigned to this code and this doesn’t seem to be the case as of yet.
-b

On Mar 1, 2015, at 2:29 AM, Sorin Paliga <sorin....@gmail.com> wrote:

Hello

The problem is that, unlike Linux or, perhaps, Windows, OS X does not need any info of this type, the only essential detail is the ID, but even this detail is of minor importance. In your case, if you mention a Unicode encoding, i.e. with negative number, it should work as a charm with any such keylayout, with or without diacriticals. 
Otherwise put, you put redundant information there, it is not necessary for the system to work properly. <Screen Shot 2015-03-01 at 09.27.23.png>

Ralph Hancock

unread,
Mar 1, 2015, 3:39:21 PM3/1/15
to ukelel...@googlegroups.com
Thank you very much John and Cattus for this information. I should now be able to make Coptic keyboards that pacify Word 2011 and stop its misbehaviour.

Behnam: I am all too well acquainted with this kind of problem. When I made my first Hebrew font with OpenType instructions to set cantillation marks as well as vowel points, Unicode had no special code for the dot under letters that is mysteriously used to mark certain words in the Hebrew Bible, and which is not a cantillation mark and has to be kept distinct from these. (These words also have overdots, but there was already one in the Hebrew set.) Therefore it was necessary to use the underdot 0323 from the ordinary zero-width mark set, which of course doesn't belong to the class of right-to-left letters. The result was constant and apparently random rebellion by Word for Windows, and completely different malfunctions in other Windows-based applications -- keyboards jammed, irrelevant fonts substituted, the whole gamut of Windows horror. And there seemed to be no way out of it.

Luckily the Unicoders added the missing symbol in a later revision, and Microsoft quite promptly included it in their list of recognised symbols in Uniscribe, and the problem is cured. Have you contacted Microsoft about this? My contact when I was doing the OpenType instructions for the forthcoming Biblical Hebrew fonts for Microsoft was Ali Basit, alib[at]microsoft.com , and I'm sure he wouldn't mind me giving you his address for a substantial matter such as this.

Behnam Rassi

unread,
Mar 1, 2015, 4:24:39 PM3/1/15
to ukelel...@googlegroups.com
Thanks Ralph for the tip. I will certainly give it a try. It’s way too early for my project. I’ll keep it when I have a handful of issues to discuss!
-b

Ralph Hancock

unread,
Mar 2, 2015, 8:53:53 PM3/2/15
to ukelel...@googlegroups.com
The Coptic keyboards are now working fine, thanks to excellent advice. They declare themselves falsely as Greek, and neither the operating system nor Word are bothered by this.

Behnam: I had further thoughts on the problem of the rebellious character. It can't be made to work if inserted directly, but maybe it can be made to work if inserted by a substitution. For example, suppose it is the little s-shaped thing that is the Coptic one-half symbol. You can add a substitution so that ن (standing for نصف) followed by the zero-width joiner character (or any pair of recognised characters that wouldn't occur in a word) turns into the symbol. I think that any symbol, even if it is not an R to L one, will behave properly if inserted in this way. And if, in the future, Uniscribe is altered so that the actual Coptic symbol is recognised, the symbol can be inserted directly and there is no need to change the font.

Behnam Rassi

unread,
Mar 2, 2015, 10:07:09 PM3/2/15
to ukelel...@googlegroups.com
Thanks Ralph for the suggestion. I’ll give it a try. My preference is to make it as user friendly as possible. You type the date sign, then you type 3, 4 digits as date on top of it (they go on top of it). This is how it is supposed to work. The problem is not coptic numbers mind you. They will be used as ‘shapes’ that substitute Arabic-Indic digits. So Coptic numbers already are created by recognized codes. But this substitution occurs ‘IF’ the digits are typed after the date sign. This unrecognized date sign -I guess- doesn’t give any indiction that anything is being typed after it. The problem is the date sign itself which has no bidi support I think.
BTW do you have a good referral for the shape of Coptic numbers?
-b

Ralph Hancock

unread,
Mar 2, 2015, 10:46:42 PM3/2/15
to ukelel...@googlegroups.com
I think that the trick of inserting the date sign by the indirect method I described is likely to do the trick: arrange a substitution for any sequence of recognised characters to bring up this sign. The point of this roundabout method is that the trouble is caused by having an unrecognised character in the text string that goes into Uniscribe. What Uniscribe puts on the screen in consequence does not cause trouble. So a legitimate sequence can insert an illegitimate character, which will then behave properly.

I am talking about Windows Uniscribe, of course, on a forum for Mac users. But obviously the Mac AAT system can't cope with this, and as far as I know the only Mac application that can cope fully with OpenType instructions is Adode InDesign. (Not sure how much the Mellel word processor can manage.) Or have things been improved by Mavericks?

Absolutely the only reference I have for Coptic numerals is the official Unicode chart of them here:

Behnam Rassi

unread,
Mar 3, 2015, 7:11:21 PM3/3/15
to ukelel...@googlegroups.com
I will certainly try your trick. OS X supports OpenType. Not all of it but most of it and certainly the features used in Arabic script. But the way it supports is not based on Uniscribe. Apple uses its own ‘CoreText’. In a way, the OT instructions are ‘translated’ to AAT instructions on the fly (that’ s my understanding).But if the trick works for Uniscribe, it should work for CoreText too. In both cases, the issue is the lack of directionality identification for this new code.
Thanks for the reference. Yes that is the one I used as my guide for shaping the Coptic numbers.
-b
Reply all
Reply to author
Forward
0 new messages