Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Scipad bug: For Chinese characters

5 views
Skip to first unread message

yjle...@gmail.com

unread,
Jun 19, 2007, 2:38:52 AM6/19/07
to
Hi !

In ScialbGTK 4.1 (with Fedora 6) , SciPad fails to handle Chinese
characters, they just disappeared..

After tracing TCL source at "tcl/scipadsources/commonbindings.tcl"
, I modified following command
bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}}
to
bind Text <KeyPress> { puttext %W %A}
, and my Chinese characters show up.

Original code may fail for other multi-byte language,

Enrico Segre

unread,
Jun 19, 2007, 3:36:21 AM6/19/07
to
On Jun 19, 9:38 am, yjlee...@gmail.com wrote:

> bind Text <KeyPress> { puttext %W %A}
> , and my Chinese characters show up.
>
> Original code may fail for other multi-byte language,

Thank you for pointing out the problem. However, your proposed fix is
poor: just select some text and press the control key (as if for
example you wanted to issue a command), the selection is instantly
removed (because puttext is called with an empty string for %A), and
this is certainly not what is desired. A better workaround is due.
I have no access at the moment to a multibyte input system, so I rely
on your suggestions. Do you have any idea, for instance, why your
chinese characters do not provide an unicode %A, and what could be
instead? Or, couldn't the problem rather be that what is needed is
something like


bind Text <KeyPress> {if {{"%A"} != {{}}} {puttext %W "%A"}}

or some other way to excape the first byte?

Enrico

yjle...@gmail.com

unread,
Jun 19, 2007, 2:13:44 PM6/19/07
to
On Jun 19, 3:36 pm, Enrico Segre <s...@athena.polito.it> wrote:

> Thank you for pointing out the problem. However, your proposed fix is
> poor:

Agree. I just want to pointing out the problem, there must be a better
way to attack it.
This bug is in fact nothing to do with Chinese characters. But to
handle Chinese character make the problem hard to solve.

> Do you have any idea, for instance, why your
> chinese characters do not provide an unicode %A, and what could be
> instead?

Chinese documents are encoded in UTF8, BIG5(Taiwan), or GB
(China) . A Chinese characters need 2 bytes for BIG5 and GB code,
and usually 2-3 bytes for UTF8 encoding. Your TCL code handle a
Chiese character as 2 or 3 seperated characters , which may not
properlly show up Chinese characters. I think we need a buffer (say
$B) to handle a Chinese character (or any multi-byte characters)
and "puttext $W $B" just once.

I have some experiences in modified Scilab for BIG5 encoding ,the
following is a sample for a C function in wsci/wtext.c
for your reference,

--------------------Scilab code for Chinese BIG5
encoding--------------------------------------
int twobyteMode = 0 ; // <------------------------------------------To
handle towbytes chacters
EXPORT int WINAPI TextPutCh (LPTW lptw, BYTE ch)
{
int pos;
#ifdef BIG5ENC //..> Chinese BIG5 encoding
if(twobyteMode==1) { //
<-----------------------------------------------for the second byte
pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
>CursorPos.x;
lptw->ScreenBuffer[pos+1] = ch;
lptw->AttrBuffer[pos+1] = lptw->Attr;
UpdateText (lptw, 2); <--------------------------------To
handele 2 byte
twobyteMode = 0 ;
return ch;
}
if(ch > 0x7F) { //<---------------------------------------if >
0x7F , it is first byte of a two bytes characters ,
twobyteMode = 1 ; //..
pos = lptw->CursorPos.y * lptw->ScreenSize.x + lptw-
>CursorPos.x;
lptw->ScreenBuffer[pos] = ch; <------------------save to
buffer
lptw->AttrBuffer[pos] = lptw->Attr;
return ch;

}
#endif
........................
......................
return ch;
}
-------------------------------------------------------------

I have on hand at TCL/TK coding , hoping my C experiences would
help.

Yung-Jang Lee

Enrico Segre

unread,
Jun 19, 2007, 4:47:41 PM6/19/07
to
Well, I'd be very wary of implementing at hand a 2 byte buffer (and
why not 3 sometimes? how to decide? A hardcoded flag which prevents
then 1byte input?), considering that, though I miss direct experience,
chinese IS supported by tcl. Proof is, that with a simple hack you
say you're able to input chinese characters. IIUC this %A, tcl
internally converts what it receives at input to unicode, and only
when a full unicode character is built, it passes it along to our proc
puttext. No partial bytes sent therefore, but definitely the
possibility that the resulting unicode string contains some offending
character which needs to be properly literally wrapped or escaped.

A quick search on wiki.tcl.tk brought me scarce, but non zero leads:

http://www.tclchina.com/
http://wiki.tcl.tk/_search?S=chinese&_charset_=UTF-8

It is difficult for me, though, to check them, as I miss a way of
experimenting with chinese input (somewhen I had a kanji terminal at
hand, long time ago, but alas) and understanding.

However, as a first step, could you confirm if replacing the line in
commonbindings.tcl with

bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}

(quotes around the second "%A" only) doesn't affect the problem (I
supsect it could)?

Enrico

yjle...@gmail.com

unread,
Jun 22, 2007, 5:01:19 AM6/22/07
to
On Jun 20, 4:47 am, Enrico Segre <s...@athena.polito.it> wrote:

>
> However, as a first step, could you confirm if replacing the line in
> commonbindings.tcl with
>
> bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
>
> (quotes around the second "%A" only) doesn't affect the problem (I
> supsect it could)?

I test the suggested code, and the textarea show

{}

after I input several chinese characters.

Can you explain the meaning of the statement "{%A} != {{}} " to me ?
May be I can find a way out.

Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
form TCL or OS difference.

Enrico Segre

unread,
Jun 22, 2007, 6:29:02 AM6/22/07
to
Hi Yung-Jang,

> Can you explain the meaning of the statement "{%A} != {{}} " to me ?
> May be I can find a way out.

directly from http://www.tcl.tk/man/tcl8.4/TkCmd/bind.htm#M42 :

"%A Substitutes the UNICODE character corresponding to the event, or
the empty string if the event doesn't correspond to a UNICODE
character (e.g. the shift key was pressed)".

The test is precisely for this:"if {{%A} != {{}}" means only when the
keypress doesn't produce an empty string we pass it to puttext. One
reason is that if the scipad buffer contains a selection and puttext
receives an empty string, the selection is erased.
In principle we could remove the test there and trap the empty
argument case in puttext (IIRC there are anyway other scipad
procedures dealing with erasing selections), but the basic question is
what is really going on with your input.

> Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,

can you detail with which variant of the bind line?


bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}

(given that removing completely {%A} != {{}} does not seem an
acceptable solution)

> but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
> form TCL or OS difference.

might be either. Notably, the input engines for chinese might be
different in the two systems, and this is out of tcl. Also, Tcl 8.5 is
alpha, and there are bugs being continuosly sorted out in the
successive alphas (try to get the latest). The places to check are
comp.lang.tcl and the tk bug tracker http://sourceforge.net/tracker/?group_id=12997&atid=112997&func=add
. If you post there, there is good chance you get support, as the
activity is high. But I leave posting to you, as you know the details
of your input system(s), which I don't have.

A bare bone tcl test snippet might help you to better understand what
is passed (I owe this to Francois Vogel):

#------beginning of script----
toplevel .t
bind .t <Key> [list keystate %A %K %N %k %s]

proc keystate {A K N k s} {
set bits [list]
foreach {bit id} {
1024 "1024-bit RMB"
512 "512-bit MMB"
256 "256-bit LMB"
128 "128-bit ??"
64 "64-bit ??"
32 SCROLL_LOCK(WIN)
16 "16-bit ??"
8 NUM_LOCK
4 CONTROL
2 CAPS_LOCK
1 SHIFT
} {
if {$s & $bit} {
lappend bits $id
}
}

puts [list KEY $A SYM $K $N CODE $k STATE $s ($bits)]
}
#------end of script----

Enrico

Francois Vogel

unread,
Jun 22, 2007, 7:01:39 AM6/22/07
to
> A bare bone tcl test snippet might help you to better understand what
> is passed (I owe this to Francois Vogel):

I didn't invent it myself: http://wiki.tcl.tk/4238

--
Francois Vogel
1-888-MY-ETHER ext. 01907199
<http://www.ether.com/CallButton/Francois-Vogel/6887475.aspx>
<http://www.ether.com>

yjle...@gmail.com

unread,
Jun 22, 2007, 8:17:54 AM6/22/07
to
On Jun 22, 6:29 pm, Enrico Segre <s...@athena.polito.it> wrote:
> Hi Yung-Jang,

> > Scipad work fine for chinese characters in TCL8.4+MingW+Scilab4.1 ,
>
> can you detail with which variant of the bind line?
> bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W "%A"}}
> (given that removing completely {%A} != {{}} does not seem an
> acceptable solution)
>

For TCL8.4+MinGW+Scilab4.1 , I do'nt need tod to modify TCL code,
bind line

bind Text <KeyPress> {if {{%A} != {{}}} {puttext %W %A}}

works properlly for chinese.

> > but fail to work for TCL8.5a+Fedora6+ScilabGTK, the bug seem come
> > form TCL or OS difference.
>
> might be either. Notably, the input engines for chinese might be
> different in the two systems, and this is out of tcl. Also, Tcl 8.5 is
> alpha, and there are bugs being continuosly sorted out in the
> successive alphas (try to get the latest).

Futher test show that it might due to input engines which generate
locale code (BIG5,UTF8,GB)
rather than UNICODE in Fedora6 (or Linux) . In Windows the input
engines generate UNICODE
so your TCL code works.

If I change the bind line to

bind Text <KeyPress> {
puttext %W %A;
if {{%A} != {{}}} { puttext %W %A; }
puttext %W %A;
}

and input "abc " characters (with locale UTF8 ), the output shoud
be "aaabbbcccc " but becomes
"aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.

Since in Linux system, most input engines for chinese (or other mulit-
bytes language) following the same standard (XIM, GTKIM ..) . I
think it is better to accept locale code other then unicode in
"bind Text <KeyPress> " bind line.


Yung-Jang Lee


yjle...@gmail.com

unread,
Jun 22, 2007, 8:30:43 AM6/22/07
to
On Jun 22, 8:17 pm, yjlee...@gmail.com wrote:
> On Jun 22, 6:29 pm, Enrico Segre <s...@athena.polito.it> wrote:
>

>
> and input "abc " characters (with locale UTF8 ), the output shoud
> be "aaabbbcccc " but becomes
> "aaabbbcccc " . The last two "puttext %W %A;" fail for chinese.
>

Sory, chinese chararacters can not show up, what I mean is

> and input "abc XY" characters (with locale UTF8 ), the output shoud
> be "aaabbbcccc XXXYYY " but becomes
> "aaabbbcccXY" . The last two "puttext %W %A;" fail for chinese.

where X and Y means any chinese characters.

> Yung-Jang Lee

Enrico Segre

unread,
Jun 22, 2007, 10:54:18 AM6/22/07
to
>I think it is better to accept locale code other then unicode in
>"bind Text <KeyPress> " bind line.

If you know how... For the little I understand here, tcl works in
unicode. %A seems to be the only relevant bind substitution variable.


> > and input "abc XY" characters (with locale UTF8 ), the output shoud
> > be "aaabbbcccc XXXYYY " but becomes
> > "aaabbbcccXY" .

I guess you meant three times c; anyway:

>The last two "puttext %W %A;" fail for chinese.

First, are you sure that it is the last two those who fail (not the
first two, for instance?); Second, why do they fail? Partial bytes
sent mangle the recognition of the character?
You could for example put a
tk_messageBox -message $text
in the beginning of proc puttext (file inputtext.tcl, around line 320)
and see what happens.

Enrico

yjle...@gmail.com

unread,
Jun 22, 2007, 12:01:00 PM6/22/07
to
On Jun 22, 10:54 pm, Enrico Segre <s...@athena.polito.it> wrote:
> >I think it is better to accept locale code other then unicode in
> >"bind Text <KeyPress> " bind line.
>
> If you know how... For the little I understand here, tcl works in
> unicode. %A seems to be the only relevant bind substitution variable.
>

TCL 8.4, 8.5a in fact works for UTF8, BIG5, GB code also, it seems to
me "{%A}" can only be used once
, after that %A become empty. So I suggect not to use "{%A}" before
"puttext %W %A;"

This way, your code can accept unicode , UTF8,BIG5 and GB code.

> First, are you sure that it is the last two those who fail (not the
> first two, for instance?); Second, why do they fail? Partial bytes
> sent mangle the recognition of the character?

I am sure it is the last two falil; another test prove it

bind Text <KeyPress> {
puttext %W %A; puttext %W "<-1";


if {{%A} != {{}}} { puttext %W %A; }

puttext %W "<-2";
puttext %W %A;
puttext %W "<-3";
}

produce a<-1a<-2a<-3<-1<-2<-3X<-1<-2<-3 when I input "aX" (again ,X is
a chinese character).

I guess, it is "{%A}" that empty the input buffer (because it is not
in UNICODE), so the last two puttext fail.


> You could for example put a
> tk_messageBox -message $text
> in the beginning of proc puttext (file inputtext.tcl, around line 320)
> and see what happens.
>

As suggest, and change bind line back to
bind Text <KeyPress> {
if {{%A} != {{}}} { puttext %W %A; }
}

then, if I input 'a', it show me a pop up dialog with message 'a', but
if I input 'X' (a chinese) the message pop up message is empty.

But put tk_messageBox -message $text in proc puttext seems to
interrupt the input engine, the dialog
pup up when I use <control-space> key to enable chinese inputs , and
the following chinese characters
are not handle by the input engine, they seem to be eaten by this
dialog.


Yung-Jang Lee


Enrico Segre

unread,
Jun 24, 2007, 3:41:44 AM6/24/07
to
> , after that %A become empty. So I suggect not to use "{%A}" before
> "puttext %W %A;"

looks rather unconvincing.

Anyway, I have to say that I found reproducible (for me) problems with
hebrew input in scipad; I could in one windows system but not in another
linux one. There %A is constantly {}, %s==8192, keysims match the
unmapped keyboard, and so on (all tested with the snipped Francois
forwarded me from the tcl wiki). IIUC, hebrew is a matter of keyboard
mapping alone without input engine. I'm probably missing something basic
about i18n input; someday I might try to educate myself.

As last resort, I could suggest:

bind Text <KeyPress> {puttext %W %A} # in commonbindings.tcl

and

if {$text == {}} {return} #in the beginning of proc puttext

IIUC that should be just equivalent to the present code, and certainly
it doesn't show my hebrew, but see if ymmv. If it does, I wouldn't
consider that a fix, anyway.

Good luck, Enrico

Enrico Segre

unread,
Jun 25, 2007, 4:16:03 AM6/25/07
to
> Anyway, I have to say that I found reproducible (for me)
problems with
> hebrew input in scipad; I could in one windows system but not in another
> linux one. There %A is constantly {}, %s==8192, keysims match the
> unmapped keyboard, and so on

ok, for that setenv LANG he_IL did the trick for me - I don't know if
something similar might help you. Enrico

yjle...@gmail.com

unread,
Jun 25, 2007, 9:10:53 AM6/25/07
to

It work for LANG=zh_TW.UTF-8 or LANG=zh_TW.BIG5 only if I move
the boolean test into proc puttext as have described.

YungLee

0 new messages