In ScialbGTK 4.1 (with Fedora 6) , SciPad fails to handle Chinese
characters, they just disappeared..
After tracing TCL source at "tcl/scipadsources/commonbindings.tcl"
, I modified following command
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}}
to
bind Text <KeyPress> { puttext %W %A}
, and my Chinese characters show up.
Original code may fail for other multi-byte language,
> bind Text <KeyPress> { puttext %W %A}
> , and my Chinese characters show up.
>
> Original code may fail for other multi-byte language,
Thank you for pointing out the problem. However, your proposed fix is
poor: just select some text and press the control key (as if for
example you wanted to issue a command), the selection is instantly
removed (because puttext is called with an empty string for %A), and
this is certainly not what is desired. A better workaround is due.
I have no access at the moment to a multibyte input system, so I rely
on your suggestions. Do you have any idea, for instance, why your
chinese characters do not provide an unicode %A, and what could be
instead? Or, couldn't the problem rather be that what is needed is
something like
bind Text <KeyPress> {if {{"%A"} != {{}}} {puttext %W "%A"}}
or some other way to excape the first byte?
Enrico
> Thank you for pointing out the problem. However, your proposed fix is
> poor:
Agree. I just want to pointing out the problem, there must be a better
way to attack it.
This bug is in fact nothing to do with Chinese characters. But to
handle Chinese character make the problem hard to solve.
> Do you have any idea, for instance, why your
> chinese characters do not provide an unicode %A, and what could be
> instead?
Chinese documents are encoded in UTF8, BIG5(Taiwan), or GB
(China) . A Chinese characters need 2 bytes for BIG5 and GB code,
and usually 2-3 bytes for UTF8 encoding. Your TCL code handle a
Chiese character as 2 or 3 seperated characters , which may not
properlly show up Chinese characters. I think we need a buffer (say
$B) to handle a Chinese character (or any multi-byte characters)
and "puttext $W $B" just once.
I have some experiences in modified Scilab for BIG5 encoding ,the
following is a sample for a C function in wsci/wtext.c
for your reference,
--------------------Scilab code for Chinese BIG5
encoding--------------------------------------
int twobyteMode = 0 ; // <------------------------------------------To
handle towbytes chacters
EXPORT int WINAPI TextPutCh (LPTW lptw, BYTE ch)
{
int pos;
#ifdef BIG5ENC //..> Chinese BIG5 encoding
if(twobyteMode==1) { //
<-----------------------------------------------for the second byte
pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
>CursorPos.x;
lptw->ScreenBuffer[pos+1] = ch;
lptw->AttrBuffer[pos+1] = lptw->Attr;
UpdateText (lptw, 2); <--------------------------------To
handele 2 byte
twobyteMode = 0 ;
return ch;
}
if(ch > 0x7F) { //<---------------------------------------if >
0x7F , it is first byte of a two bytes characters ,
twobyteMode = 1 ; //..
pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
>CursorPos.x;
lptw->ScreenBuffer[pos] = ch; <------------------save to
buffer
lptw->AttrBuffer[pos] = lptw->Attr;
return ch;
}
#endif
........................
......................
return ch;
}
-------------------------------------------------------------
I have on hand at TCL/TK coding , hoping my C experiences would
help.
Yung-Jang Lee
A quick search on wiki.tcl.tk brought me scarce, but non zero leads:
http://www.tclchina.com/
http://wiki.tcl.tk/_search?S=chinese&_charset_=UTF-8
It is difficult for me, though, to check them, as I miss a way of
experimenting with chinese input (somewhen I had a kanji terminal at
hand, long time ago, but alas) and understanding.
However, as a first step, could you confirm if replacing the line in
commonbindings.tcl with
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
(quotes around the second "%A" only) doesn't affect the problem (I
supsect it could)?
Enrico
>
> However, as a first step, could you confirm if replacing the line in
> commonbindings.tcl with
>
> bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
>
> (quotes around the second "%A" only) doesn't affect the problem (I
> supsect it could)?
I test the suggested code, and the textarea show
{}
after I input several chinese characters.
Can you explain the meaning of the statement "{%A} != {{}} " to me ?
May be I can find a way out.
Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
form TCL or OS difference.
> Can you explain the meaning of the statement "{%A} != {{}} " to me ?
> May be I can find a way out.
directly from http://www.tcl.tk/man/tcl8.4/TkCmd/bind.htm#M42 :
"%A Substitutes the UNICODE character corresponding to the event, or
the empty string if the event doesn't correspond to a UNICODE
character (e.g. the shift key was pressed)".
The test is precisely for this:"if {{%A} != {{}}" means only when the
keypress doesn't produce an empty string we pass it to puttext. One
reason is that if the scipad buffer contains a selection and puttext
receives an empty string, the selection is erased.
In principle we could remove the test there and trap the empty
argument case in puttext (IIRC there are anyway other scipad
procedures dealing with erasing selections), but the basic question is
what is really going on with your input.
> Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
can you detail with which variant of the bind line?
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
(given that removing completely {%A} != {{}} does not seem an
acceptable solution)
> but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
> form TCL or OS difference.
might be either. Notably, the input engines for chinese might be
different in the two systems, and this is out of tcl. Also, Tcl 8.5 is
alpha, and there are bugs being continuosly sorted out in the
successive alphas (try to get the latest). The places to check are
comp.lang.tcl and the tk bug tracker http://sourceforge.net/tracker/?group_id=12997&atid=112997&func=add
. If you post there, there is good chance you get support, as the
activity is high. But I leave posting to you, as you know the details
of your input system(s), which I don't have.
A bare bone tcl test snippet might help you to better understand what
is passed (I owe this to Francois Vogel):
#------beginning of script----
toplevel .t
bind .t <Key> [list keystate %A %K %N %k %s]
proc keystate {A K N k s} {
set bits [list]
foreach {bit id} {
1024 "1024-bit RMB"
512 "512-bit MMB"
256 "256-bit LMB"
128 "128-bit ??"
64 "64-bit ??"
32 SCROLL_LOCK(WIN)
16 "16-bit ??"
8 NUM_LOCK
4 CONTROL
2 CAPS_LOCK
1 SHIFT
} {
if {$s & $bit} {
lappend bits $id
}
}
puts [list KEY $A SYM $K $N CODE $k STATE $s ($bits)]
}
#------end of script----
Enrico
I didn't invent it myself: http://wiki.tcl.tk/4238
--
Francois Vogel
1-888-MY-ETHER ext. 01907199
<http://www.ether.com/CallButton/Francois-Vogel/6887475.aspx>
<http://www.ether.com>
> > Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
>
> can you detail with which variant of the bind line?
> bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
> (given that removing completely {%A} != {{}} does not seem an
> acceptable solution)
>
For TCL8.4+MinGW+Scilab4.1 , I do'nt need tod to modify TCL code,
bind line
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}}
works properlly for chinese.
> > but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
> > form TCL or OS difference.
>
> might be either. Notably, the input engines for chinese might be
> different in the two systems, and this is out of tcl. Also, Tcl 8.5 is
> alpha, and there are bugs being continuosly sorted out in the
> successive alphas (try to get the latest).
Futher test show that it might due to input engines which generate
locale code (BIG5,UTF8,GB)
rather than UNICODE in Fedora6 (or Linux) . In Windows the input
engines generate UNICODE
so your TCL code works.
If I change the bind line to
bind Text <KeyPress> {
puttext %W %A;
if {{%A} != {{}}} { puttext %W %A; }
puttext %W %A;
}
and input "abc " characters (with locale UTF8 ), the output shoud
be "aaabbbcccc " but becomes
"aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.
Since in Linux system, most input engines for chinese (or other mulit-
bytes language) following the same standard (XIM, GTKIM ..) . I
think it is better to accept locale code other then unicode in
"bind Text <KeyPress> " bind line.
Yung-Jang Lee
>
> and input "abc " characters (with locale UTF8 ), the output shoud
> be "aaabbbcccc " but becomes
> "aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.
>
Sory, chinese chararacters can not show up, what I mean is
> and input "abc XY" characters (with locale UTF8 ), the output shoud
> be "aaabbbcccc XXXYYY " but becomes
> "aaabbbcccXY" . The last two "puttext %W %A;" fail for chinese.
where X and Y means any chinese characters.
> Yung-Jang Lee
If you know how... For the little I understand here, tcl works in
unicode. %A seems to be the only relevant bind substitution variable.
> > and input "abc XY" characters (with locale UTF8 ), the output shoud
> > be "aaabbbcccc XXXYYY " but becomes
> > "aaabbbcccXY" .
I guess you meant three times c; anyway:
>The last two "puttext %W %A;" fail for chinese.
First, are you sure that it is the last two those who fail (not the
first two, for instance?); Second, why do they fail? Partial bytes
sent mangle the recognition of the character?
You could for example put a
tk_messageBox -message $text
in the beginning of proc puttext (file inputtext.tcl, around line 320)
and see what happens.
Enrico
TCL 8.4, 8.5a in fact works for UTF8, BIG5, GB code also, it seems to
me "{%A}" can only be used once
, after that %A become empty. So I suggect not to use "{%A}" before
"puttext %W %A;"
This way, your code can accept unicode , UTF8,BIG5 and GB code.
> First, are you sure that it is the last two those who fail (not the
> first two, for instance?); Second, why do they fail? Partial bytes
> sent mangle the recognition of the character?
I am sure it is the last two falil; another test prove it
bind Text <KeyPress> {
puttext %W %A; puttext %W "<-1";
if {{%A} != {{}}} { puttext %W %A; }
puttext %W "<-2";
puttext %W %A;
puttext %W "<-3";
}
produce a<-1a<-2a<-3<-1<-2<-3X<-1<-2<-3 when I input "aX" (again ,X is
a chinese character).
I guess, it is "{%A}" that empty the input buffer (because it is not
in UNICODE), so the last two puttext fail.
> You could for example put a
> tk_messageBox -message $text
> in the beginning of proc puttext (file inputtext.tcl, around line 320)
> and see what happens.
>
As suggest, and change bind line back to
bind Text <KeyPress> {
if {{%A} != {{}}} { puttext %W %A; }
}
then, if I input 'a', it show me a pop up dialog with message 'a', but
if I input 'X' (a chinese) the message pop up message is empty.
But put tk_messageBox -message $text in proc puttext seems to
interrupt the input engine, the dialog
pup up when I use <control-space> key to enable chinese inputs , and
the following chinese characters
are not handle by the input engine, they seem to be eaten by this
dialog.
Yung-Jang Lee
looks rather unconvincing.
Anyway, I have to say that I found reproducible (for me) problems with
hebrew input in scipad; I could in one windows system but not in another
linux one. There %A is constantly {}, %s==8192, keysims match the
unmapped keyboard, and so on (all tested with the snipped Francois
forwarded me from the tcl wiki). IIUC, hebrew is a matter of keyboard
mapping alone without input engine. I'm probably missing something basic
about i18n input; someday I might try to educate myself.
As last resort, I could suggest:
bind Text <KeyPress> {puttext %W %A} # in commonbindings.tcl
and
if {$text == {}} {return} #in the beginning of proc puttext
IIUC that should be just equivalent to the present code, and certainly
it doesn't show my hebrew, but see if ymmv. If it does, I wouldn't
consider that a fix, anyway.
Good luck, Enrico
ok, for that setenv LANG he_IL did the trick for me - I don't know if
something similar might help you. Enrico
It work for LANG=zh_TW.UTF-8 or LANG=zh_TW.BIG5 only if I move
the boolean test into proc puttext as have described.
YungLee