A user of an application of mine, uses a software that simulates
keyboard. The software is called Tavultesoft Keyman
(http://www.tavultesoft.com/keyman/) and wants to use it to enter
unicode characters in an africal language.
The problem is that this software does not work with Tk. When a Tk app
is selected and configured through the keyboard application to accept
its input, a warning is issued that the (Th) application does not
support unicode input.
So, does anybody knows why Tk cannot work with this program?
What input methods are used by Tk to accept keyboard input?
Are there additional (newer) input methods that can be used, but are not
yet implemented? I tried to play with the system encoding of wish, but
without any change. Any ideas?
George
Is this happening because Tk does not process the WM_UNICHAR event?
Looking in
http://tktoolkit.cvs.sourceforge.net/tktoolkit/tk/win/tkWinX.c?revision=1.53&view=markup,
only the WM_CHAR event is processed.
It seems that a relevant request is at:
http://www.codecomments.com/archive234-2004-12-342830.html
George
There is also an article at:
http://www.tavultesoft.com/keyman/documentation/unicodeinput.php
Does anybody knows how Tk creates its windows?
George
The wrapper windows will use CreateWindowExW where available, but many
other windows are created with CreateWindow(Ex) (without A or W). I'm
surprised this subtlety makes a difference, as you usually only need the
W API when you are *calling* the window with specific unicode data (like
to create one with a unicode class name).
--
Jeff Hobbs, The Tcl Guy, http://www.activestate.com/
According to the last reference, the data that is placed in WM_CHAR
messages is different:
"If the window is created as an ANSI window, WM_CHAR will always be
received by the window class as codepage characters. If the window is
created as a Unicode window, all WM_CHAR messages will contain Unicode
(UTF-16) characters. The IsWindowUnicode(HWND) function will tell you
whether the window supports Unicode input. Note that windows cannot be
created as Unicode windows in Windows 95, 98 or Me."
I will try to create a patch for Tk that creates all windows with
CreateWindowExW (if available) and also support WM_UNICHAR, and see
how it goes :-)
George
> O/H Jeff Hobbs Ýãñáøå:
> > George Petasis wrote:
> >> O/H George Petasis Ýãñáøå:
> >>> O/H George Petasis Ýãñáøå:
George - if you would like assistance or validation of any changes,
please don't hesitate to contact me; I've created patches for a number
of applications to address this Unicode input limitation.
Marc Durdin
Tavultesoft Pty Ltd
Thank you for your offer :-) I have already written some code related to
the WM_UNICHAR event. Keyman no longer complains about the application
not accepting unicode input. However, I have questions about the
WM_UNICHAR message:
From http://www.tavultesoft.com/keyman/documentation/unicodeinput.php
and
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/userinput/keyboardinput/keyboardinputreference/keyboardinputmessages/wm_unichar.asp,
I figured out that the UTF-32 code is stored in the WPARAM wParam of the
message. Isn't WPARAM a 16 bit parameter? How can I retrieve the 4 bytes
from wParam?
George
>>
>> George - if you would like assistance or validation of any changes,
>> please don't hesitate to contact me; I've created patches for a number
>> of applications to address this Unicode input limitation.
>>
>> Marc Durdin
>> Tavultesoft Pty Ltd
>>
>
> Thank you for your offer :-) I have already written some code related to
> the WM_UNICHAR event. Keyman no longer complains about the application
> not accepting unicode input. However, I have questions about the
> WM_UNICHAR message:
>
> From http://www.tavultesoft.com/keyman/documentation/unicodeinput.php
> and
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/userinput/keyboardinput/keyboardinputreference/keyboardinputmessages/wm_unichar.asp,
>
> I figured out that the UTF-32 code is stored in the WPARAM wParam of the
> message. Isn't WPARAM a 16 bit parameter? How can I retrieve the 4 bytes
> from wParam?
>
>
> George
False alarm :-( WPARAM is 32-bit, and not 16-bit.
However, is there a standart way to convert utf-32 to utf-8?
George
Actually, from the Tcl code it seems that Tcl_UniCharToUtf is
able to handle more than 16-bit unicode chars, at least up to 24-bit
unicode characters with TCL_UTF_MAX defined to 3 (which happens in my
windows system). Keyman (with the keyboard I use) also sends characters
that fit into 3 byte utf-8 characters, so in theory it is enough.
So, now keyman send me a unicode character, which is translated into a
4-byte utf-8 character with Tcl_UniCharToUtf. I generate a KeyPress
XEvent where event.xkey.nbytes=3, and event.xkey.trans_chars[] is
filled with the 3 bytes of the utf-8 character. Then the event is
placed in the que with Tk_QueueWindowEvent.
But what I get in the output, is 3 characters. It seems that the
decoding is not done as expected. Any ideas on where to look for
this decoding? (bind sources?)
A similar approach is taken by the WM_CHAR message, where similar
code exists (win/tkwinX.c):
event.type = KeyPress;
event.xany.send_event = -1;
event.xkey.keycode = 0;
event.xkey.nbytes = 1;
event.xkey.trans_chars[0] = (char) wParam;
if (IsDBCSLeadByte((BYTE) wParam)) {
MSG msg;
if ((PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE) != 0)
&& (msg.message == WM_CHAR)) {
GetMessage(&msg, NULL, 0, 0);
event.xkey.nbytes = 2;
event.xkey.trans_chars[1] = (char) msg.wParam;
}
}
Tk_QueueWindowEvent(&event, TCL_QUEUE_TAIL);
When a lead byte is detected and another WM_CHAR event follows,
the two events are merged, and the two bytes are plaaced in the
event.xkey.trans_chars array. I do the same but with 3 bytes.
However, the result is not the expected one...
George
Problem solved! I had to modify TkpGetString (by adding another
special case).
Now, input from keyman (at least through WM_UNICHAR) works as expected.
George
The patch (which adds support for WM_UNICHAR) for Tk can be found at:
http://sourceforge.net/tracker/index.php?func=detail&aid=1518677&group_id=12997&atid=312997
Is it probable to see this patch applied to Tk 8.4 series?? :-)
As soon as possible?? :-)
George
So this is verified to work with keyman? I will have to examine it and
put it through the paces to make sure it doesn't regress any existing
behavior. Can you think of cases where that might happen?
Yes, this works with keyman. The patch is actually *very* small, and
does not interleave in any way with with earlier Tk bahaviour. What the
patch does is simple: When WM_UNICHAR is received, then a
KeyPress/KeyRelease is put in Tk's queue, and the utf-8 character sent
is stored in the events. This is exactly what is done for WM_CHAR.
But now, instead of storing at most two bytes (WM_CHAR), I can store up
to TCL_UTF_MAX (as defined by Tcl). The conversion from the four-byte
unicode code to utf-8 is done with Tcl_UniCharToUtf, which according to
its implementation can handle unicode codes up to 0xFFFF. Theoretically,
WM_UNICHAR can deliver codes up to 0x1FFFFF, but in order to handle
this, TCL_UTF_MAX must be defined to 4 (and not to 3 that is currently
defined). But I have no idea which languages use these characters :-)
So, my impression is that this patch doesn't intorduce any problem to Tk
:-)
George
It's for obscure stuff like Mongolian and Sindarin if I remember right.
Good luck finding a keyboard and fonts to support them. Even more luck
finding an app that can lay out Mongolian correctly (it requires
vertical layout IIRC). :-)
Donal.
Well, we have found already a keyboard :-) Keyman with a suitable
keyboard layout :D
To say the truth it never crossed my mind that I will ever use 3-byte
utf-8 characters, until I had a user of my Ellogon NLP platform that
wanted to support an African language named IGBO (from Nigeria if I
remember correctly). I knew that Tcl could represent these characters
(as I displayed IGBO texts in a Tk text widget). Working to support
the WM_UNICHAR message, I found that these characters required three
bytes and it was a surpise, because I always though that Tcl supported
up to 2 bytes. Then, looking into the code, I saw that Tcl is ready for
up to 6 bytes!
However, there is a definition (TCL_UTF_MAX) that limits the usuable
number of bytes to 3. I still have no clear idea why this is needed
(to limit the memory allocation during utf-8 - unicode conversions?),
but it seems a reasonable limit for the time being. Let's hope that
no Mongolian tries to use my app :-)
George
PS: My patch for supporting WM_UNICHAR does not checks if the unicode
character given through the event can be represented by the Tcl core.
If a four-byte utf-8 character is entered and TCL_UTF_MAX is set to 3, I
suspect that it will be converted to another character. No error/warning
will be issued.
But I think that there is also a similar problem in the way Tk works
right now, with input from the WM_CHAR message. As the code is right
now, it can concatenate up to 2 successive WM_CHAR messages to form a
2-byte utf-8 character. Isn't this limit to only two events somewhat
arbitrary? Aren't there any IME that deliver 3-byte characters?
(I don't know, I am asking :-))
Three-byte UTF-8 is sufficient for describing any UNICODE character from
U+000000 to U+00FFFF, and corresponds to Tcl_UniChar's definition as an
unsigned short. Beyond that, the unicode-string representation has to
use ints.
Maybe converting to a sequence of Tcl_UniChars where one is a surrogate
pair would be a workaround?
> Let's hope that no Mongolian tries to use my app :-)
Not many apps cope with vertical layout well, so I'd not worry about
another one. ;-)
Donal.