Under 8.0.x I used
Tcl_SetObjLength(sObj, strlen(Tcl_GetStringFromObj(sObj,0)));
but this fails under 8.1/8.2 so how do I truncate strings?
I might add that I allocated the buffers with
Tcl_SetByteArrayLength( objPtr, length )
it seems to me that I might need to distinguish two different string
types in the dll caller.
--
Robin Becker
That would truncate the string at the first NUL byte but not a specific
length.
> but this fails under 8.1/8.2 so how do I truncate strings?
>
Of course that will fail because strings in 8.1/2 are UTF-8 and as such
do not contain any embedded NULs, hence no truncation can occur.
I am confused as to exactly how you want the truncation to occur, could
you give some more information like a C API and some Tcl code to call
it.
> I might add that I allocated the buffers with
>
> Tcl_SetByteArrayLength( objPtr, length )
>
> it seems to me that I might need to distinguish two different string
> types in the dll caller.
Almost certainly, in fact probably at least three 'character' types.
A buffer of bytes.
A string of UTF-8.
A string of Unicode.
--
Paul Duffin
DT/6000 Development Email: pdu...@hursley.ibm.com
IBM UK Laboratories Ltd., Hursley Park nr. Winchester
Internal: 7-246880 International: +44 1962-816880
*256 for an arg this means we pass the name of a tcl var. This will be
forced to be of length 256 bytes.
Under 8.0.x you could say
*256s meaning this was supposed to be a string variable. As a
convenience only I truncated the buffer to the first null byte. This was
easy under the old tcl api. Now that we have the possibility we can have
both wide chars and simple chars I need to distinguish what kinds of
length apply. The simple c string version of my original truncation will
be
Allocation
Tcl_SetByteArrayLength( sObj, 256+1);
and the truncation will be
Tcl_SetByteArrayLength(sObj, strlen(Tcl_GetByteArrayFromObj(sObj,0)));
so the wide character version ought to be
Allocation
Tcl_SetByteArrayLength( sObj, 2*256+1);
and the truncation will be
Tcl_SetObjLength(sObj, wcslen(Tcl_GetStringFromObj(sObj,0)));
but I'm not really certain about these last.
--
Robin Becker
You need to use the new Unicode functions to manage the Unicode form
of the string.
> but I'm not really certain about these last.
Windows APIs can take any of the following different 'string' types.
Some of the lengths may be fixed by the API and not require a distinct
length argument.
Buffer of bytes with length (input/output)
A NUL terminated byte string (input)
A NUL terminated UTF-8 string (input) (whatever UTF-8 NUL is)
A UTF-8 string with length (input/output)
A Unicode string with length (input/output)
Your DLL caller needs to be able to differentiate between these types
of string so that it can format the input data correctly, set the
length correctly and format the output data correctly.
strlen, wcslen, _mbslen, _mbstrlen
Get the length of a string.
size_t strlen( const char *string );
size_t wcslen( const wchar_t *string );
size_t _mbslen( const unsigned char *string );
size_t _mbstrlen( const char *string );
....
Parameter
string
Null-terminated string
ie I guess that M$ (at least) assumes all these string types have an
explicit zero byte as a guaranteed terminator. Indeed some of the Win32
API is schizoid in that they return lengths in characters which allow
for the fact that some characters may be Ascii Unicode mixes or DBCS eg
GetWindowTextLength.
So what is the UTF8 API in Tcl?
>
>Your DLL caller needs to be able to differentiate between these types
>of string so that it can format the input data correctly, set the
>length correctly and format the output data correctly.
>
--
Robin Becker
The ByteArray, String and Unicode thingies. What is used for UTF-8? Plus
a whole mess in the stuff about encodings.
I need to be able to allocate sufficient space for a particular type and
truncate in some sensible way when an external function may not fill the
whole thing. I am very confused by statements like the following from
the ByteArrayObj help.
Obtaining the string representation of a byte-array object (by calling
Tcl_GetStringFromObj) produces a properly formed UTF-8 sequence with a
one-to-one mapping between the bytes in the internal representation and
the UTF-8 characters in the string representation.
Do unicode producing functions have to set a length somehow? I mean is
there no easy way to tell the length of these things. Or is it utf8
that's tough.
Since UTF-8 is tcl's internal representation it's unlikely to appear in
external api's unless these come from tcl or am I being real stupid.
--
Robin Becker
: I need to be able to allocate sufficient space for a particular type and
: truncate in some sensible way when an external function may not fill the
: whole thing.
I'd just zero-out the whole buffer before calling the function.
: Do unicode producing functions have to set a length somehow? I mean is
: there no easy way to tell the length of these things. Or is it utf8
: that's tough.
IIRC UTF-8 is terminated by two NULs.
Bye, Heribert (da...@ifk20.mach.uni-karlsruhe.de)
set entrylist {title subtitle date importfile songs imagefile outputfile}
set r 0
set re 0
global r re
proc cr_ent {name} {
global r re
set lb lb
incr r
incr re
label .one.$name$lb -text [eval string toupper $name]
grid .one.$name$lb -row "$r" -column 1 -padx 5 -pady 2
entry .one.$name -background white -width 20 -textvariable $name
grid .one.$name -row "$re" -column 2 -padx 5 -pady 2
# this works....but grid would be better!
# pack [eval label .one.$name$lb -text [eval string toupper $name] ] -side top -padx 5 -pady 2
# pack [eval entry .one.$name -background white -width 20 -textvariable $name] -side top -padx 5 -pady 2
}
foreach value $entrylist {
cr_ent $value
}
???????????????????:-((((((!!!!!!!
Thanks for your help,
Johan B.
Tried the code under 8.0, works fine (except for the frame definition of .one) no hangs, no errors.
Maybe insert some puts to see, where the program hangs.
Gerhard
There are two types, ByteArray and String (Unicode), and one
pseudo-type - the string rep in the Tcl_Obj structure.
Desired type of data|Function to get data |Function to get length
--------------------+-----------------------+------------------------
ByteArray |Tcl_GetByteArrayFromObj|Tcl_GetByteArrayFromObj
Unicode |Tcl_GetUnicode |Tcl_GetCharLength
UTF-8 |Tcl_GetStringFromObj |Tcl_GetStringFromObj
Don't attempt to work out which is the most appropriate to do
according to the data being passed in to you; just go from the
definition of what the functions in question want input to them.
When going the other way, you need to know the length of data you want
to manipulate into Tcl (if you don't know and can't work out that, you
really are SOL) but knowing that you can quite easily use:
Tcl_NewByteArrayObj (for byte sequences)
Tcl_NewStringObj (for UTF-8 strings)
Tcl_SetUnicodeObj (for Unicode strings)
(All this is working from the 8.2.0 distribution sources.)
HTH!
Donal.
--
Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ fell...@cs.man.ac.uk
-- The small advantage of not having California being part of my country would
be overweighed by having California as a heavily-armed rabid weasel on our
borders. -- David Parsons <o r c @ p e l l . p o r t l a n d . o r . u s>
As it stands, the code should work exactly as advertised. *However*
if you happen to have anything [pack]ed in .one as well as all your
[grid]ded widgets then you will be in trouble (since the two geometry
managers end up fighting over window sizes, which manifests itself as
a hang.) The fact that your code works when you use [pack] instead of
[grid] is also consistent with this diagnosis.
Solutions:
a) Put the widgets to be gridded into a frame that you then pack
into .one - this is probably the best solution for you, and it is
really easy to do.
b) Convert to either using [grid] or [pack] throughout - fine if you
can do this, but it does change the look of your GUI.
c) Turn off propagation for either [grid] or [pack] on .one - almost
certainly not what you want to do, but if it works (i.e. stops
the hang,) you know that my diagnosis is correct...
Also, please keep in mind that Paul has raised valid points about how
unsatisfactory this (8.2) setup is: namely, the fact that
String(Unicode) is a Tcl type per se, which leads to heavy shimmering at
seemingly innocuous points (like string manipulations for debugging).
I know that Jeff has more or less expressed agreement, and that
something better was in the works. All this to say: Robin, once you're
done with 8.2, be prepared to spend yet another major amount of energy
for 8.3 or 8.4.
Personal feeling: that transparent introduction of unicode-awareness in
routines that were traditionally used also for binary (like [string
range]) was a BAD DECISION. Reason: before that anyway, non-latin
characters were not at all usable. So why not have defined an entire set
of explicitly-unicode string-handling routines (like [unicode
range]...) ??? Jeff ?
Now it looks like we're stuck with a mess. Should we stick to it and
spend 10x the energy to work around every gotcha, or should we take the
courageous option of cutting that rotten branch (it may not be too late,
but time is running) ? Jeff ?
-Alex
I guess this is impossible for tcl as one of the primitive types was a
string. What I can't figure out is why an intermediate was chosen which
wasn't in one of the two categories. I guess for performance reasons and
perhaps to allow for stuff which I can't think of like Koreans writing
scripts in Sanskrit.
I guess the main fault I have with the tcl mess is that there remain
some very commonly used API calls which have changed quite subtly. For
me they may work for a while until I accidentally embed a null character
and then all hell breaks loose. Moan whinge whine; mumble mumble etc etc
:)
--
Robin Becker
Donal provided a very good, concise view into the state of 8.2.
While, in cases where Unicode can bite you, it is an improvement
over 8.1, it was just an interim hack (no not mine) to improve
things.
> done with 8.2, be prepared to spend yet another major amount of energy
> for 8.3 or 8.4.
This could be the case. However, we are looking at the Feather stuff
to improve this to the point that old stuff will magically be more
efficient, although there might also be a newer way to make it even
better.
> Personal feeling: that transparent introduction of unicode-awareness in
> routines that were traditionally used also for binary (like [string
> range]) was a BAD DECISION. Reason: before that anyway, non-latin
> characters were not at all usable. So why not have defined an entire set
> of explicitly-unicode string-handling routines (like [unicode
> range]...) ??? Jeff ?
I argued for making the binary command match the string command, so
there was a [binary range|replace|index|...], but ... This might change
later (upwards compatible). We don't want to separate the Unicode ops,
this would be a step backwards. We need to just improve how things are
handled under the covers.
--
Jeffrey Hobbs The Tcl Guy
jeffrey.hobbs at scriptics.com Scriptics Corp.
Very nice if it is possible. However, at each such "success" the overall
internal tension of Tcl increases. What about next time we face a
similar problem ? Deflating the bubble early takes courage, but is an
excellent investment.
-Alex
Your diagnosis is absolutely correct! Thank you!:-))
Regards,
Johan B.
>
> Donal.
> --
> Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/
fell...@cs.man.ac.uk
> -- The small advantage of not having California being part of my
country would
> be overweighed by having California as a heavily-armed rabid weasel
on our
> borders. -- David Parsons <o r c @ p e l l . p o r t l a n d . o
r . u s>
>
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
I think we could keep that tension running until 9, when we have a
better opportunity to fiddle with the APIs and backwards compatibility.
Perhaps principal due to sheer mass, but Java is Unicode incarnate. And
i18n and threads as well I believe. So it seems to me that if we are
going to continue to communicate and be able to be embedded in future
apps, we need to deal with the situation once and for all - in a compatible
manner.
--
<URL: mailto:lvi...@cas.org> Quote: Save us from the snobs.
<*> O- <URL: http://www.purl.org/NET/lvirden/>
Unless explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.