In ScialbGTK 4.1 (with Fedora 6) , SciPad fails to handle Chinese characters, they just disappeared..
After tracing TCL source at "tcl/scipadsources/commonbindings.tcl" , I modified following command bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}} to bind Text <KeyPress> { puttext %W %A} , and my Chinese characters show up.
Original code may fail for other multi-byte language,
> bind Text <KeyPress> { puttext %W %A} > , and my Chinese characters show up.
> Original code may fail for other multi-byte language,
Thank you for pointing out the problem. However, your proposed fix is poor: just select some text and press the control key (as if for example you wanted to issue a command), the selection is instantly removed (because puttext is called with an empty string for %A), and this is certainly not what is desired. A better workaround is due. I have no access at the moment to a multibyte input system, so I rely on your suggestions. Do you have any idea, for instance, why your chinese characters do not provide an unicode %A, and what could be instead? Or, couldn't the problem rather be that what is needed is something like bind Text <KeyPress> {if {{"%A"} != {{}}} {puttext %W "%A"}} or some other way to excape the first byte?
On Jun 19, 3:36 pm, Enrico Segre <s...@athena.polito.it> wrote:
> Thank you for pointing out the problem. However, your proposed fix is > poor:
Agree. I just want to pointing out the problem, there must be a better way to attack it. This bug is in fact nothing to do with Chinese characters. But to handle Chinese character make the problem hard to solve.
> Do you have any idea, for instance, why your > chinese characters do not provide an unicode %A, and what could be > instead?
Chinese documents are encoded in UTF8, BIG5(Taiwan), or GB (China) . A Chinese characters need 2 bytes for BIG5 and GB code, and usually 2-3 bytes for UTF8 encoding. Your TCL code handle a Chiese character as 2 or 3 seperated characters , which may not properlly show up Chinese characters. I think we need a buffer (say $B) to handle a Chinese character (or any multi-byte characters) and "puttext $W $B" just once.
I have some experiences in modified Scilab for BIG5 encoding ,the following is a sample for a C function in wsci/wtext.c for your reference,
--------------------Scilab code for Chinese BIG5 encoding-------------------------------------- int twobyteMode = 0 ; // <------------------------------------------To handle towbytes chacters EXPORT int WINAPI TextPutCh (LPTW lptw, BYTE ch) { int pos; #ifdef BIG5ENC //..> Chinese BIG5 encoding if(twobyteMode==1) { // <-----------------------------------------------for the second byte pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
>CursorPos.x;
lptw->ScreenBuffer[pos+1] = ch; lptw->AttrBuffer[pos+1] = lptw->Attr; UpdateText (lptw, 2); <--------------------------------To handele 2 byte twobyteMode = 0 ; return ch; } if(ch > 0x7F) { //<---------------------------------------if > 0x7F , it is first byte of a two bytes characters , twobyteMode = 1 ; //.. pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
Well, I'd be very wary of implementing at hand a 2 byte buffer (and why not 3 sometimes? how to decide? A hardcoded flag which prevents then 1byte input?), considering that, though I miss direct experience, chinese IS supported by tcl. Proof is, that with a simple hack you say you're able to input chinese characters. IIUC this %A, tcl internally converts what it receives at input to unicode, and only when a full unicode character is built, it passes it along to our proc puttext. No partial bytes sent therefore, but definitely the possibility that the resulting unicode string contains some offending character which needs to be properly literally wrapped or escaped.
A quick search on wiki.tcl.tk brought me scarce, but non zero leads:
It is difficult for me, though, to check them, as I miss a way of experimenting with chinese input (somewhen I had a kanji terminal at hand, long time ago, but alas) and understanding.
However, as a first step, could you confirm if replacing the line in commonbindings.tcl with
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
(quotes around the second "%A" only) doesn't affect the problem (I supsect it could)?
> (quotes around the second "%A" only) doesn't affect the problem (I > supsect it could)?
I test the suggested code, and the textarea show
{}
after I input several chinese characters.
Can you explain the meaning of the statement "{%A} != {{}} " to me ? May be I can find a way out.
Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 , but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come form TCL or OS difference.
"%A Substitutes the UNICODE character corresponding to the event, or the empty string if the event doesn't correspond to a UNICODE character (e.g. the shift key was pressed)".
The test is precisely for this:"if {{%A} != {{}}" means only when the keypress doesn't produce an empty string we pass it to puttext. One reason is that if the scipad buffer contains a selection and puttext receives an empty string, the selection is erased. In principle we could remove the test there and trap the empty argument case in puttext (IIRC there are anyway other scipad procedures dealing with erasing selections), but the basic question is what is really going on with your input.
> Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
can you detail with which variant of the bind line? bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}} (given that removing completely {%A} != {{}} does not seem an acceptable solution)
> but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come > form TCL or OS difference.
might be either. Notably, the input engines for chinese might be different in the two systems, and this is out of tcl. Also, Tcl 8.5 is alpha, and there are bugs being continuosly sorted out in the successive alphas (try to get the latest). The places to check are comp.lang.tcl and the tk bug tracker http://sourceforge.net/tracker/?group_id=12997&atid=112997&func=add . If you post there, there is good chance you get support, as the activity is high. But I leave posting to you, as you know the details of your input system(s), which I don't have.
A bare bone tcl test snippet might help you to better understand what is passed (I owe this to Francois Vogel):
On Jun 22, 6:29 pm, Enrico Segre <s...@athena.polito.it> wrote:
> Hi Yung-Jang, > > Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
> can you detail with which variant of the bind line? > bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}} > (given that removing completely {%A} != {{}} does not seem an > acceptable solution)
For TCL8.4+MinGW+Scilab4.1 , I do'nt need tod to modify TCL code, bind line
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}}
works properlly for chinese.
> > but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come > > form TCL or OS difference.
> might be either. Notably, the input engines for chinese might be > different in the two systems, and this is out of tcl. Also, Tcl 8.5 is > alpha, and there are bugs being continuosly sorted out in the > successive alphas (try to get the latest).
Futher test show that it might due to input engines which generate locale code (BIG5,UTF8,GB) rather than UNICODE in Fedora6 (or Linux) . In Windows the input engines generate UNICODE so your TCL code works.
If I change the bind line to
bind Text <KeyPress> { puttext %W %A; if {{%A} != {{}}} { puttext %W %A; } puttext %W %A;
}
and input "abc " characters (with locale UTF8 ), the output shoud be "aaabbbcccc " but becomes "aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.
Since in Linux system, most input engines for chinese (or other mulit- bytes language) following the same standard (XIM, GTKIM ..) . I think it is better to accept locale code other then unicode in "bind Text <KeyPress> " bind line.
> On Jun 22, 6:29 pm, Enrico Segre <s...@athena.polito.it> wrote:
> and input "abc " characters (with locale UTF8 ), the output shoud > be "aaabbbcccc " but becomes > "aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.
Sory, chinese chararacters can not show up, what I mean is
> and input "abc XY" characters (with locale UTF8 ), the output shoud > be "aaabbbcccc XXXYYY " but becomes > "aaabbbcccXY" . The last two "puttext %W %A;" fail for chinese.
>I think it is better to accept locale code other then unicode in >"bind Text <KeyPress> " bind line.
If you know how... For the little I understand here, tcl works in unicode. %A seems to be the only relevant bind substitution variable.
> > and input "abc XY" characters (with locale UTF8 ), the output shoud > > be "aaabbbcccc XXXYYY " but becomes > > "aaabbbcccXY" .
I guess you meant three times c; anyway:
>The last two "puttext %W %A;" fail for chinese.
First, are you sure that it is the last two those who fail (not the first two, for instance?); Second, why do they fail? Partial bytes sent mangle the recognition of the character? You could for example put a tk_messageBox -message $text in the beginning of proc puttext (file inputtext.tcl, around line 320) and see what happens.
On Jun 22, 10:54 pm, Enrico Segre <s...@athena.polito.it> wrote:
> >I think it is better to accept locale code other then unicode in > >"bind Text <KeyPress> " bind line.
> If you know how... For the little I understand here, tcl works in > unicode. %A seems to be the only relevant bind substitution variable.
TCL 8.4, 8.5a in fact works for UTF8, BIG5, GB code also, it seems to me "{%A}" can only be used once , after that %A become empty. So I suggect not to use "{%A}" before "puttext %W %A;"
This way, your code can accept unicode , UTF8,BIG5 and GB code.
> First, are you sure that it is the last two those who fail (not the > first two, for instance?); Second, why do they fail? Partial bytes > sent mangle the recognition of the character?
I am sure it is the last two falil; another test prove it
produce a<-1a<-2a<-3<-1<-2<-3X<-1<-2<-3 when I input "aX" (again ,X is a chinese character).
I guess, it is "{%A}" that empty the input buffer (because it is not in UNICODE), so the last two puttext fail.
> You could for example put a > tk_messageBox -message $text > in the beginning of proc puttext (file inputtext.tcl, around line 320) > and see what happens.
As suggest, and change bind line back to bind Text <KeyPress> { if {{%A} != {{}}} { puttext %W %A; }
}
then, if I input 'a', it show me a pop up dialog with message 'a', but if I input 'X' (a chinese) the message pop up message is empty.
But put tk_messageBox -message $text in proc puttext seems to interrupt the input engine, the dialog pup up when I use <control-space> key to enable chinese inputs , and the following chinese characters are not handle by the input engine, they seem to be eaten by this dialog.
> , after that %A become empty. So I suggect not to use "{%A}" before > "puttext %W %A;"
looks rather unconvincing.
Anyway, I have to say that I found reproducible (for me) problems with hebrew input in scipad; I could in one windows system but not in another linux one. There %A is constantly {}, %s==8192, keysims match the unmapped keyboard, and so on (all tested with the snipped Francois forwarded me from the tcl wiki). IIUC, hebrew is a matter of keyboard mapping alone without input engine. I'm probably missing something basic about i18n input; someday I might try to educate myself.
As last resort, I could suggest:
bind Text <KeyPress> {puttext %W %A} # in commonbindings.tcl
and
if {$text == {}} {return} #in the beginning of proc puttext
IIUC that should be just equivalent to the present code, and certainly it doesn't show my hebrew, but see if ymmv. If it does, I wouldn't consider that a fix, anyway.
> Anyway, I have to say that I found reproducible (for me) problems with > hebrew input in scipad; I could in one windows system but not in another > linux one. There %A is constantly {}, %s==8192, keysims match the > unmapped keyboard, and so on
ok, for that setenv LANG he_IL did the trick for me - I don't know if something similar might help you. Enrico
> > Anyway, I have to say that I found reproducible (for me) > problems with > > hebrew input in scipad; I could in one windows system but not in another > > linux one. There %A is constantly {}, %s==8192, keysims match the > > unmapped keyboard, and so on
> ok, for that setenv LANG he_IL did the trick for me - I don't know if > something similar might help you. Enrico
It work for LANG=zh_TW.UTF-8 or LANG=zh_TW.BIG5 only if I move the boolean test into proc puttext as have described.