Signed and Unsigned integers in Tcl

Sektor van Skijlen

unread,

Sep 14, 2006, 3:23:19 PM9/14/06

to

Hmm...

set value -32768
format %08x [expr {$value >> 4}]

Result: fffff800

set value 0xffff8000
format %08x [expr {$value >> 4}]

Result: 0ffff800

So, the only way to control signedness of integers (and results of shift
operations thereby) is to use hex and dec?

So, for example, to "retype" the integer value from signed to unsigned should
be accomplished with:

set value [format %x $value]

?

I tried "unsigned" keyword on Wiki, but nothing is present.

--
// _ ___ Michal "Sektor" Malecki <sektor(whirl)kis.p.lodz.pl>
\\ L_ |/ `| /^\ ,() <ethouris(O)gmail.com>
// \_ |\ \/ \_/ /\ C++ bez cholesterolu: http://www.intercon.pl/~sektor/cbx
"I am allergic to Java because programming in Java reminds me casting spells"

suchenwi

unread,

Sep 15, 2006, 4:54:31 AM9/15/06

to

Sektor van Skijlen schrieb:

> So, the only way to control signedness of integers (and results of shift
> operations thereby) is to use hex and dec?
>
> So, for example, to "retype" the integer value from signed to unsigned should
> be accomplished with:
>
> set value [format %x $value]

You can have unsigned decimal "looks" as well, with [format %u $value].
Just make sure how this is re-parsed by [expr] - int32? int64? bigint?

> I tried "unsigned" keyword on Wiki, but nothing is present.
>

For full-text search, append an asterisk: http://wiki.tcl.tk/unsigned*
(many hits :^)

stephan...@yahoo.fr

unread,

Sep 15, 2006, 4:59:17 AM9/15/06

to

Sektor van Skijlen a écrit :

> Hmm...
>
> set value -32768
> format %08x [expr {$value >> 4}]
>
> Result: fffff800

That is because our signed integers are (at C level)
a special way when they are negative.
INT_MAX = 2^31
When 0 <= int < INT_MAX: the internal representation
(hex) is the same number as the decimal value.

When -(INT_MAX) <= int < 0 :
the internal representation is
INT_MAX - abs(int)

That is why [format %x -1] gives 'ffffffff' !

A way that your integers to
behave like in algebrics is to use bignums.
These are available in tcllib, and natively in Tcl 8.5.
See:
http://wiki.tcl.tk/15167 (math::bignum)

Cheers,
Stéphane

Andreas Leitgeb

unread,

Sep 15, 2006, 6:15:41 AM9/15/06

to

Sektor van Skijlen <etho...@guess.if.gmail.com.is.valid.or.invalid> wrote:
> Hmm...
> % set value -32768; format %08x [expr {$value >> 4}]
> fffff800
>
> % set value 0xffff8000; format %08x [expr {$value >> 4}]
> 0ffff800

That's indeed strange, especially as it also happens in Tcl8.4 (.9 here)
It appears to be a "feature" more than a bug, though:
% expr 0xfffff800
4294965248
% set value 4294965248; format %08x [expr {$value >> 4}]
0fffff80

For Tcl8.5 this would be the expected behaviour, anyway, since
integers do no longer wrap around there at all.

> So, for example, to "retype" the integer value from signed to unsigned should
> be accomplished with:
> set value [format %x $value]

You'd also have to prepend the 0x, of course, but yes.

Pat Thoyts

unread,

Sep 18, 2006, 7:21:57 AM9/18/06

to

Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> writes:

Tcl performs sign extension so that negative values stay negative when
shifted left. This has always been more trouble thatn anything for me
but the solution is just to mask off the bits you do not need.
So: expr {(-32768 >> 4) & 0x0fffffff}
It's particularly irritating that binary scan reads bytes in as signed
integers IMO
eg: binary scan \x9e c c ; set c gives -98. Just means a lot of &0xff later on.

--
Pat Thoyts http://www.patthoyts.tk/
To reply, rot13 the return address or read the X-Address header.
PGP fingerprint 2C 6E 98 07 2C 59 C8 97 10 CE 11 E6 04 E0 B9 DD

Andreas Leitgeb

unread,

Sep 21, 2006, 5:40:51 AM9/21/06

to

Pat Thoyts <cngg...@hfref.fbheprsbetr.arg> wrote:
> Tcl performs sign extension so that negative values stay negative when
> shifted left. This has always been more trouble thatn anything for me
> but the solution is just to mask off the bits you do not need.
> So: expr {(-32768 >> 4) & 0x0fffffff}

Of course, code like this will prevent larger positive integers
(tcl8.5...) to be correctly shifted right. Actually, it should
be considered, why there appears a negative integer in the first
place: If it's a wrap-around of some previous (positive) operation,
then masking it might thwart the otherwise advantage of future
8.5-porting. If it results from usage of "-1" for "0xffffffff",
(or the like) then the "-1" had better be written out as positive
number.

> It's particularly irritating that binary scan reads bytes in as signed
> integers IMO
> eg: binary scan \x9e c c ; set c gives -98.
> Just means a lot of &0xff later on.

Yes, I banged my head on this one quite a lot of times.

Doing expr {... & 0xff} is a performance-catastrophy
for large chunks of data, because each expr-result will
be it's own Tcl_Object.
Instead create a global array that maps (0..127,-128..-1) to (0..255)
and use the array instead of expr. That way you will have only
references into a pool of constant 256 Tcl_Objects.

This was mentioned in this group a while ago. I'd bet it's also
somewhere on wiki...

Nevertheless, having a char like "c" but to create a list of
unsigned numbers would be really useful. So it would be for
shorts and (tcl8.5) ints+wides, although I haven't had even
nearly the same need for them as for an unsigned "c".

suchenwi

unread,

Sep 21, 2006, 6:22:45 AM9/21/06

to

Andreas Leitgeb schrieb:

> > eg: binary scan \x9e c c ; set c gives -98.
> > Just means a lot of &0xff later on.

For scanning characters (up to \uFFFF), good old [scan] is best suited:
% scan \x9E %c
158

Andreas Leitgeb

unread,

Sep 22, 2006, 2:56:53 AM9/22/06

to

suchenwi <richard.suchenw...@siemens.com> wrote:
> Andreas Leitgeb did not write but only quote:

>> > eg: binary scan \x9e c c ; set c gives -98.
>> > Just means a lot of &0xff later on.

> For scanning characters (up to \uFFFF), good old [scan] is best suited:
> % scan \x9E %c
> 158

That's fine as long as you only read byte by byte.
If you need to do this for an e.g. 10000-bytes string,
then theres currently hardly any (at least almost) sane
way other than with binary scan combined with an array-
lookup-loop.

To get unicode from an international string with binary scan,
you need to play with encoding:

% set x "\u3b1\u3b2\u3b3 \u3ce" ;# these are: alpha beta gamma tonated-omega
% binary scan [encoding convertto unicode $x] "s*" c; set c
945 946 947 32 974

Pat Thoyts

unread,

Sep 25, 2006, 7:01:36 PM9/25/06

to

Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at> writes:

>> It's particularly irritating that binary scan reads bytes in as signed
>> integers IMO
>> eg: binary scan \x9e c c ; set c gives -98.
>> Just means a lot of &0xff later on.
>
>Yes, I banged my head on this one quite a lot of times.
>

[snip]

>Nevertheless, having a char like "c" but to create a list of
>unsigned numbers would be really useful. So it would be for
>shorts and (tcl8.5) ints+wides, although I haven't had even
>nearly the same need for them as for an unsigned "c".

Turns out the patch for a 'u' as an unsigned char for the binary
command is really pretty small....

Index: generic/tclBinary.c
===================================================================
RCS file: /cvsroot/tcl/tcl/generic/tclBinary.c,v
retrieving revision 1.26
diff -u -r1.26 tclBinary.c
--- generic/tclBinary.c 27 Sep 2005 15:20:35 -0000 1.26
+++ generic/tclBinary.c 30 Sep 2005 01:28:37 -0000
@@ -1204,6 +1204,7 @@
break;
}
case 'c':
+ case 'u':
size = 1;
goto scanNumber;
case 't':
@@ -1749,6 +1750,10 @@
*/

switch (type) {
+ case 'u':
+ value = buffer[0];
+ goto returnNumericObject;
+
case 'c':
/*
* Characters need special handling. We want to produce a signed

Donal K. Fellows

unread,

Sep 25, 2006, 7:14:27 PM9/25/06

to

Pat Thoyts wrote:

> Andreas Leitgeb writes:
>> Nevertheless, having a char like "c" but to create a list of
>> unsigned numbers would be really useful. So it would be for
>> shorts and (tcl8.5) ints+wides, although I haven't had even
>> nearly the same need for them as for an unsigned "c".
>
> Turns out the patch for a 'u' as an unsigned char for the binary
> command is really pretty small....

I'd rather have a flag so that you could specify whether an integer
value is to be parsed as signed or unsigned. Now that we have arbitrary
precision integers (probably the real reason why nothing was done
before) that should be practical. (I don't think it is useful when
applied to floating-point numbers.)

Donal.

stephan...@yahoo.fr

unread,

Sep 26, 2006, 3:13:44 AM9/26/06

to

Sektor van Skijlen a écrit :

> Hmm...

>
> set value -32768
> format %08x [expr {$value >> 4}]
>
> Result: fffff800
>
>
> set value 0xffff8000
> format %08x [expr {$value >> 4}]
>
> Result: 0ffff800

When I learned Java, I saw *two* rightshift
operators : ">>" and ">>>".
The first behaves like with unsigned integers,
the second like with signed integers.
Shouldn't it be in the Tcl core?
I admit this is a strange Java feature because it
is not well-documented, but it works.

But (since Java is typed), we get control
about the size (32/64 bits) of our integers.
I do not know very much how this would be taken
in Tcl.

Regards,
Stéphane

Pat Thoyts

unread,

Sep 26, 2006, 4:54:32 AM9/26/06

to

stephan...@yahoo.fr writes:

>When I learned Java, I saw *two* rightshift
>operators : ">>" and ">>>".
>The first behaves like with unsigned integers,
>the second like with signed integers.
>Shouldn't it be in the Tcl core?
>I admit this is a strange Java feature because it
>is not well-documented, but it works.

in crypto stuff >>> is usually used to mean rotate left where >> is
shift left. ie: 0x00000001 >>> 1 becomes 0x80000000

Donal K. Fellows

unread,

Sep 27, 2006, 6:18:28 AM9/27/06

to

Pat Thoyts wrote:
> in crypto stuff >>> is usually used to mean rotate left where >> is
> shift left. ie: 0x00000001 >>> 1 becomes 0x80000000

Surely that should be <>> (to indicate that some bits go back the other
way, of course.)

Donal.

Andreas Leitgeb

unread,

Sep 27, 2006, 8:19:21 AM9/27/06

to

But then ">><", because it's the rightmost bit that goes left :-)

*scnr*

Sektor van Skijlen

unread,

Sep 27, 2006, 4:18:54 PM9/27/06

to

Dnia 27 Sep 2006 12:19:21 GMT, Andreas Leitgeb skrobie:

Is that the only way to solve the problem with signedness? Or maybe it would
be possible to add a 'u' letter to the end of number to make it unsigned?
Consider something like that:

set x -1u
>>> -1u
format %x $x
>>> ffffffff

In this case you would be able to add this 'u' immediately to the value of the
variable:

set a [expr ${x}u >> 2]

Either an unlimited number of 'u' would be added or the number should be first
normalized using, for example, [set x [expr $x+0]].

Stéphane A.

unread,

Sep 29, 2006, 5:16:24 AM9/29/06

to

stephan...@yahoo.fr a écrit :

> When I learned Java, I saw *two* rightshift
> operators : ">>" and ">>>".
> The first behaves like with unsigned integers,
> the second like with signed integers.

<QUOTE from Thinking in Java by Bruce Eckel>
The signed right shift >> uses sign extension: if the value is
positive, zeroes are inserted at the higher-order bits; if the value is
negative, ones are inserted at the higher-order bits. Java has also
added the unsigned right shift >>>, which uses zero extension:
regardless of the sign, zeroes are inserted at the higher-order bits.
</QUOTE>

1. As Java uses this feature for its integers,
and does not provide unsigned integers, why
couldn't we apply this semantic in Tcl?

2. Casting from int/wide to int/wide is easy.
But the bignum part of Tcl 8.5 could
put some strange behavior.

3. Think about whether we should apply
bit-to-bit operators on bignums?
math::bigfloat(in tcllib) already relies on
left- and right-shifts with such numbers.

Cheers,
Stéphane

Donal K. Fellows

unread,

Sep 29, 2006, 6:43:48 AM9/29/06

to

Stéphane A. wrote:
> 1. As Java uses this feature for its integers,
> and does not provide unsigned integers, why
> couldn't we apply this semantic in Tcl?

That's a good argument. Care to TIP it up? :-)

> 2. Casting from int/wide to int/wide is easy.
> But the bignum part of Tcl 8.5 could
> put some strange behavior.

It has been in the past, but I think the bugs are (mostly) fixed now.

> 3. Think about whether we should apply
> bit-to-bit operators on bignums?
> math::bigfloat(in tcllib) already relies on
> left- and right-shifts with such numbers.

The answer is "yes" and the only place where it matters is in how the
code treats _left_ shifts. However, since the width of Tcl's numbers was
never clearly defined in the first place (trust me on that) anyone using
left shifts and relying on the absence of bignums actually had a bug in
their code anyway. Deal with the problem through explicit masks stating
how many bits are to be retained.

Donal.

Donal K. Fellows

unread,

Sep 29, 2006, 10:44:12 AM9/29/06

to

Pat Thoyts wrote:
> Turns out the patch for a 'u' as an unsigned char for the binary
> command is really pretty small....

For everyone's reference, this issue is being addressed by TIP#275 which
is currently the subject of a vote. Details available at:
http://tip.tcl.tk/275.html

Donal.

Andreas Leitgeb

unread,

Sep 29, 2006, 12:27:39 PM9/29/06

to

Donal K. Fellows <donal.k...@manchester.ac.uk> wrote:

> For everyone's reference, this issue is being addressed by TIP#275 which
> is currently the subject of a vote. Details available at:
> http://tip.tcl.tk/275.html

I would have preferred a symbol to a letter for flags.
Just replace "u" with "+" and I find it already better,
then let the plus precede the affected format-char and
I'd be ecstatic :-)

for large ints (>64bit) there should also be a way to
extract them from a (byte-aligned) bitstream, although
the number would then mean the length of data for one
item just like with "a".

Andreas Leitgeb

unread,

Sep 29, 2006, 12:33:05 PM9/29/06

to

Sektor van Skijlen <etho...@guess.if.gmail.com.is.valid.or.invalid> wrote:

> Dnia 27 Sep 2006 12:19:21 GMT, Andreas Leitgeb skrobie:
>> Donal K. Fellows <donal.k...@manchester.ac.uk> wrote:
>> > Pat Thoyts wrote:
>> >> in crypto stuff >>> is usually used to mean rotate left where >> is
>> >> shift left. ie: 0x00000001 >>> 1 becomes 0x80000000
>> > Surely that should be <>> (to indicate that some bits go back the other
>> > way, of course.)
>> But then ">><", because it's the rightmost bit that goes left :-)
>
> Is that the only way to solve the problem with signedness?

Actually, it doesn't even have anything to do with
signedness (the topic drifted off a bit in this subthread)
and my comment about ">><" was a joking answer to Donal's
(also joking) "<>>"

Because, tcl8.5 is "wrap-around free", we fortunately do no
longer need such clutches in tcl8.5 any more. And it quite
surely won't make it into 8.4.*, either.

Donal K. Fellows

unread,

Sep 29, 2006, 4:35:28 PM9/29/06

to

Andreas Leitgeb wrote:
> I would have preferred a symbol to a letter for flags.
> Just replace "u" with "+" and I find it already better,
> then let the plus precede the affected format-char and
> I'd be ecstatic :-)

We discussed that (well, mocked it up quickly in code) and it wasn't
really better. The 'u' was Pat's choice, and since he's actually doing
the work, I'm not very inclined to argue. ;-)

Donal.

Andreas Leitgeb

unread,

Oct 5, 2006, 7:00:30 AM10/5/06

to

Just to remove any doubt, "u" for unsigned is still much better than
no unsigned-marker at all.

Anyway I wonder in what way a leading "+" would have been
inferior to the trailing "u".

suchenwi

unread,

Oct 5, 2006, 7:04:47 AM10/5/06

to

Andreas Leitgeb schrieb:

> Anyway I wonder in what way a leading "+" would have been
> inferior to the trailing "u".

"+" means "positive", or "one or more" in regexps. Signed numbers can
also be positive :)
As "u" also means "unsigned" in [format] (though stand-alone, not as a
modifier), I think using "u2 is following the principle of least
surprise.