Keyboard input & Tavultesoft Keyman (windows)...

George Petasis

unread,

Jul 3, 2006, 11:53:12 AM7/3/06

to

Hi all,

A user of an application of mine, uses a software that simulates
keyboard. The software is called Tavultesoft Keyman
(http://www.tavultesoft.com/keyman/) and wants to use it to enter
unicode characters in an africal language.

The problem is that this software does not work with Tk. When a Tk app
is selected and configured through the keyboard application to accept
its input, a warning is issued that the (Th) application does not
support unicode input.

So, does anybody knows why Tk cannot work with this program?
What input methods are used by Tk to accept keyboard input?
Are there additional (newer) input methods that can be used, but are not
yet implemented? I tried to play with the system encoding of wish, but
without any change. Any ideas?

George

George Petasis

unread,

Jul 4, 2006, 2:04:02 AM7/4/06

to George Petasis

O/H George Petasis έγραψε:

Is this happening because Tk does not process the WM_UNICHAR event?
Looking in
http://tktoolkit.cvs.sourceforge.net/tktoolkit/tk/win/tkWinX.c?revision=1.53&view=markup,
only the WM_CHAR event is processed.
It seems that a relevant request is at:
http://www.codecomments.com/archive234-2004-12-342830.html

George

George Petasis

unread,

Jul 4, 2006, 2:15:13 AM7/4/06

to George Petasis

O/H George Petasis έγραψε:

There is also an article at:

http://www.tavultesoft.com/keyman/documentation/unicodeinput.php

Does anybody knows how Tk creates its windows?

George

Jeff Hobbs

unread,

Jul 4, 2006, 11:37:16 AM7/4/06

to George Petasis

The wrapper windows will use CreateWindowExW where available, but many
other windows are created with CreateWindow(Ex) (without A or W). I'm
surprised this subtlety makes a difference, as you usually only need the
W API when you are *calling* the window with specific unicode data (like
to create one with a unicode class name).

--

Jeff Hobbs, The Tcl Guy, http://www.activestate.com/

George Petasis

unread,

Jul 5, 2006, 1:42:00 AM7/5/06

to Jeff Hobbs

O/H Jeff Hobbs έγραψε:

According to the last reference, the data that is placed in WM_CHAR
messages is different:

"If the window is created as an ANSI window, WM_CHAR will always be
received by the window class as codepage characters. If the window is
created as a Unicode window, all WM_CHAR messages will contain Unicode
(UTF-16) characters. The IsWindowUnicode(HWND) function will tell you
whether the window supports Unicode input. Note that windows cannot be
created as Unicode windows in Windows 95, 98 or Me."

I will try to create a patch for Tk that creates all windows with
CreateWindowExW (if available) and also support WM_UNICHAR, and see
how it goes :-)

George

mcdu...@tavultesoft.com

unread,

Jul 6, 2006, 6:27:26 AM7/6/06

to

George Petasis wrote:

> O/H Jeff Hobbs Ýãñáøå:
> > George Petasis wrote:
> >> O/H George Petasis Ýãñáøå:
> >>> O/H George Petasis Ýãñáøå:

George - if you would like assistance or validation of any changes,
please don't hesitate to contact me; I've created patches for a number
of applications to address this Unicode input limitation.

Marc Durdin
Tavultesoft Pty Ltd

George Petasis

unread,

Jul 7, 2006, 5:08:56 AM7/7/06

to mcdu...@tavultesoft.com

O/H mcdu...@tavultesoft.com έγραψε:

Thank you for your offer :-) I have already written some code related to
the WM_UNICHAR event. Keyman no longer complains about the application
not accepting unicode input. However, I have questions about the
WM_UNICHAR message:

From http://www.tavultesoft.com/keyman/documentation/unicodeinput.php
and
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/userinput/keyboardinput/keyboardinputreference/keyboardinputmessages/wm_unichar.asp,
I figured out that the UTF-32 code is stored in the WPARAM wParam of the
message. Isn't WPARAM a 16 bit parameter? How can I retrieve the 4 bytes
from wParam?

George

George Petasis

unread,

Jul 7, 2006, 5:17:04 AM7/7/06

to George Petasis, mcdu...@tavultesoft.com

O/H George Petasis έγραψε:

>>
>> George - if you would like assistance or validation of any changes,
>> please don't hesitate to contact me; I've created patches for a number
>> of applications to address this Unicode input limitation.
>>
>> Marc Durdin
>> Tavultesoft Pty Ltd
>>
>
> Thank you for your offer :-) I have already written some code related to
> the WM_UNICHAR event. Keyman no longer complains about the application
> not accepting unicode input. However, I have questions about the
> WM_UNICHAR message:
>
> From http://www.tavultesoft.com/keyman/documentation/unicodeinput.php
> and
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/userinput/keyboardinput/keyboardinputreference/keyboardinputmessages/wm_unichar.asp,
>
> I figured out that the UTF-32 code is stored in the WPARAM wParam of the
> message. Isn't WPARAM a 16 bit parameter? How can I retrieve the 4 bytes
> from wParam?
>
>
> George

False alarm :-( WPARAM is 32-bit, and not 16-bit.
However, is there a standart way to convert utf-32 to utf-8?

George

George Petasis

unread,

Jul 7, 2006, 6:35:27 AM7/7/06

to

O/H George Petasis έγραψε:

Actually, from the Tcl code it seems that Tcl_UniCharToUtf is
able to handle more than 16-bit unicode chars, at least up to 24-bit
unicode characters with TCL_UTF_MAX defined to 3 (which happens in my
windows system). Keyman (with the keyboard I use) also sends characters
that fit into 3 byte utf-8 characters, so in theory it is enough.

So, now keyman send me a unicode character, which is translated into a
4-byte utf-8 character with Tcl_UniCharToUtf. I generate a KeyPress
XEvent where event.xkey.nbytes=3, and event.xkey.trans_chars[] is
filled with the 3 bytes of the utf-8 character. Then the event is
placed in the que with Tk_QueueWindowEvent.

But what I get in the output, is 3 characters. It seems that the
decoding is not done as expected. Any ideas on where to look for
this decoding? (bind sources?)

A similar approach is taken by the WM_CHAR message, where similar
code exists (win/tkwinX.c):

event.type = KeyPress;
event.xany.send_event = -1;
event.xkey.keycode = 0;
event.xkey.nbytes = 1;
event.xkey.trans_chars[0] = (char) wParam;
if (IsDBCSLeadByte((BYTE) wParam)) {
MSG msg;
if ((PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE) != 0)
&& (msg.message == WM_CHAR)) {
GetMessage(&msg, NULL, 0, 0);
event.xkey.nbytes = 2;
event.xkey.trans_chars[1] = (char) msg.wParam;
}
}
Tk_QueueWindowEvent(&event, TCL_QUEUE_TAIL);

When a lead byte is detected and another WM_CHAR event follows,
the two events are merged, and the two bytes are plaaced in the
event.xkey.trans_chars array. I do the same but with 3 bytes.
However, the result is not the expected one...

George

George Petasis

unread,

Jul 7, 2006, 6:46:46 AM7/7/06

to

Problem solved! I had to modify TkpGetString (by adding another
special case).

Now, input from keyman (at least through WM_UNICHAR) works as expected.

George

George Petasis

unread,

Jul 7, 2006, 7:41:54 AM7/7/06

to George Petasis, Jeff Hobbs, Marc Durdin

O/H George Petasis έγραψε:

>
> Problem solved! I had to modify TkpGetString (by adding another
> special case).
>
> Now, input from keyman (at least through WM_UNICHAR) works as expected.
>
> George

The patch (which adds support for WM_UNICHAR) for Tk can be found at:

http://sourceforge.net/tracker/index.php?func=detail&aid=1518677&group_id=12997&atid=312997

Is it probable to see this patch applied to Tk 8.4 series?? :-)
As soon as possible?? :-)

George

Jeff Hobbs

unread,

Jul 7, 2006, 11:55:36 AM7/7/06

to George Petasis, Marc Durdin

So this is verified to work with keyman? I will have to examine it and
put it through the paces to make sure it doesn't regress any existing
behavior. Can you think of cases where that might happen?

George Petasis

unread,

Jul 7, 2006, 12:15:49 PM7/7/06

to Jeff Hobbs, Marc Durdin

O/H Jeff Hobbs έγραψε:

Yes, this works with keyman. The patch is actually *very* small, and
does not interleave in any way with with earlier Tk bahaviour. What the
patch does is simple: When WM_UNICHAR is received, then a
KeyPress/KeyRelease is put in Tk's queue, and the utf-8 character sent
is stored in the events. This is exactly what is done for WM_CHAR.
But now, instead of storing at most two bytes (WM_CHAR), I can store up
to TCL_UTF_MAX (as defined by Tcl). The conversion from the four-byte
unicode code to utf-8 is done with Tcl_UniCharToUtf, which according to
its implementation can handle unicode codes up to 0xFFFF. Theoretically,
WM_UNICHAR can deliver codes up to 0x1FFFFF, but in order to handle
this, TCL_UTF_MAX must be defined to 4 (and not to 3 that is currently
defined). But I have no idea which languages use these characters :-)

So, my impression is that this patch doesn't intorduce any problem to Tk
:-)

George

Donal K. Fellows

unread,

Jul 11, 2006, 11:10:48 AM7/11/06

to

George Petasis wrote:
> Theoretically,
> WM_UNICHAR can deliver codes up to 0x1FFFFF, but in order to handle
> this, TCL_UTF_MAX must be defined to 4 (and not to 3 that is currently
> defined). But I have no idea which languages use these characters :-)

It's for obscure stuff like Mongolian and Sindarin if I remember right.
Good luck finding a keyboard and fonts to support them. Even more luck
finding an app that can lay out Mongolian correctly (it requires
vertical layout IIRC). :-)

Donal.

George Petasis

unread,

Jul 11, 2006, 12:29:04 PM7/11/06

to Donal K. Fellows

O/H Donal K. Fellows έγραψε:

Well, we have found already a keyboard :-) Keyman with a suitable
keyboard layout :D
To say the truth it never crossed my mind that I will ever use 3-byte
utf-8 characters, until I had a user of my Ellogon NLP platform that
wanted to support an African language named IGBO (from Nigeria if I
remember correctly). I knew that Tcl could represent these characters
(as I displayed IGBO texts in a Tk text widget). Working to support
the WM_UNICHAR message, I found that these characters required three
bytes and it was a surpise, because I always though that Tcl supported
up to 2 bytes. Then, looking into the code, I saw that Tcl is ready for
up to 6 bytes!

However, there is a definition (TCL_UTF_MAX) that limits the usuable
number of bytes to 3. I still have no clear idea why this is needed
(to limit the memory allocation during utf-8 - unicode conversions?),
but it seems a reasonable limit for the time being. Let's hope that
no Mongolian tries to use my app :-)

George

PS: My patch for supporting WM_UNICHAR does not checks if the unicode
character given through the event can be represented by the Tcl core.
If a four-byte utf-8 character is entered and TCL_UTF_MAX is set to 3, I
suspect that it will be converted to another character. No error/warning
will be issued.

But I think that there is also a similar problem in the way Tk works
right now, with input from the WM_CHAR message. As the code is right
now, it can concatenate up to 2 successive WM_CHAR messages to form a
2-byte utf-8 character. Isn't this limit to only two events somewhat
arbitrary? Aren't there any IME that deliver 3-byte characters?
(I don't know, I am asking :-))

Donal K. Fellows

unread,

Jul 11, 2006, 7:18:36 PM7/11/06

to

George Petasis wrote:
> However, there is a definition (TCL_UTF_MAX) that limits the usuable
> number of bytes to 3. I still have no clear idea why this is needed
> (to limit the memory allocation during utf-8 - unicode conversions?),
> but it seems a reasonable limit for the time being.

Three-byte UTF-8 is sufficient for describing any UNICODE character from
U+000000 to U+00FFFF, and corresponds to Tcl_UniChar's definition as an
unsigned short. Beyond that, the unicode-string representation has to
use ints.

Maybe converting to a sequence of Tcl_UniChars where one is a surrogate
pair would be a workaround?

> Let's hope that no Mongolian tries to use my app :-)

Not many apps cope with vertical layout well, so I'd not worry about
another one. ;-)

Donal.