set value -32768
format %08x [expr {$value >> 4}]
Result: fffff800
set value 0xffff8000
format %08x [expr {$value >> 4}]
Result: 0ffff800
So, the only way to control signedness of integers (and results of shift
operations thereby) is to use hex and dec?
So, for example, to "retype" the integer value from signed to unsigned should
be accomplished with:
set value [format %x $value]
?
I tried "unsigned" keyword on Wiki, but nothing is present.
--
// _ ___ Michal "Sektor" Malecki <sektor(whirl)kis.p.lodz.pl>
\\ L_ |/ `| /^\ ,() <ethouris(O)gmail.com>
// \_ |\ \/ \_/ /\ C++ bez cholesterolu: http://www.intercon.pl/~sektor/cbx
"I am allergic to Java because programming in Java reminds me casting spells"
> So, the only way to control signedness of integers (and results of shift
> operations thereby) is to use hex and dec?
>
> So, for example, to "retype" the integer value from signed to unsigned should
> be accomplished with:
>
> set value [format %x $value]
You can have unsigned decimal "looks" as well, with [format %u $value].
Just make sure how this is re-parsed by [expr] - int32? int64? bigint?
> I tried "unsigned" keyword on Wiki, but nothing is present.
>
For full-text search, append an asterisk: http://wiki.tcl.tk/unsigned*
(many hits :^)
> Hmm...
>
> set value -32768
> format %08x [expr {$value >> 4}]
>
> Result: fffff800
That is because our signed integers are (at C level)
a special way when they are negative.
INT_MAX = 2^31
When 0 <= int < INT_MAX: the internal representation
(hex) is the same number as the decimal value.
When -(INT_MAX) <= int < 0 :
the internal representation is
INT_MAX - abs(int)
That is why [format %x -1] gives 'ffffffff' !
A way that your integers to
behave like in algebrics is to use bignums.
These are available in tcllib, and natively in Tcl 8.5.
See:
http://wiki.tcl.tk/15167 (math::bignum)
Cheers,
Stéphane
That's indeed strange, especially as it also happens in Tcl8.4 (.9 here)
It appears to be a "feature" more than a bug, though:
% expr 0xfffff800
4294965248
% set value 4294965248; format %08x [expr {$value >> 4}]
0fffff80
For Tcl8.5 this would be the expected behaviour, anyway, since
integers do no longer wrap around there at all.
> So, for example, to "retype" the integer value from signed to unsigned should
> be accomplished with:
> set value [format %x $value]
You'd also have to prepend the 0x, of course, but yes.
Tcl performs sign extension so that negative values stay negative when
shifted left. This has always been more trouble thatn anything for me
but the solution is just to mask off the bits you do not need.
So: expr {(-32768 >> 4) & 0x0fffffff}
It's particularly irritating that binary scan reads bytes in as signed
integers IMO
eg: binary scan \x9e c c ; set c gives -98. Just means a lot of &0xff later on.
--
Pat Thoyts http://www.patthoyts.tk/
To reply, rot13 the return address or read the X-Address header.
PGP fingerprint 2C 6E 98 07 2C 59 C8 97 10 CE 11 E6 04 E0 B9 DD
Of course, code like this will prevent larger positive integers
(tcl8.5...) to be correctly shifted right. Actually, it should
be considered, why there appears a negative integer in the first
place: If it's a wrap-around of some previous (positive) operation,
then masking it might thwart the otherwise advantage of future
8.5-porting. If it results from usage of "-1" for "0xffffffff",
(or the like) then the "-1" had better be written out as positive
number.
> It's particularly irritating that binary scan reads bytes in as signed
> integers IMO
> eg: binary scan \x9e c c ; set c gives -98.
> Just means a lot of &0xff later on.
Yes, I banged my head on this one quite a lot of times.
Doing expr {... & 0xff} is a performance-catastrophy
for large chunks of data, because each expr-result will
be it's own Tcl_Object.
Instead create a global array that maps (0..127,-128..-1) to (0..255)
and use the array instead of expr. That way you will have only
references into a pool of constant 256 Tcl_Objects.
This was mentioned in this group a while ago. I'd bet it's also
somewhere on wiki...
Nevertheless, having a char like "c" but to create a list of
unsigned numbers would be really useful. So it would be for
shorts and (tcl8.5) ints+wides, although I haven't had even
nearly the same need for them as for an unsigned "c".
> > eg: binary scan \x9e c c ; set c gives -98.
> > Just means a lot of &0xff later on.
For scanning characters (up to \uFFFF), good old [scan] is best suited:
% scan \x9E %c
158
> For scanning characters (up to \uFFFF), good old [scan] is best suited:
> % scan \x9E %c
> 158
That's fine as long as you only read byte by byte.
If you need to do this for an e.g. 10000-bytes string,
then theres currently hardly any (at least almost) sane
way other than with binary scan combined with an array-
lookup-loop.
To get unicode from an international string with binary scan,
you need to play with encoding:
% set x "\u3b1\u3b2\u3b3 \u3ce" ;# these are: alpha beta gamma tonated-omega
% binary scan [encoding convertto unicode $x] "s*" c; set c
945 946 947 32 974
>> It's particularly irritating that binary scan reads bytes in as signed
>> integers IMO
>> eg: binary scan \x9e c c ; set c gives -98.
>> Just means a lot of &0xff later on.
>
>Yes, I banged my head on this one quite a lot of times.
>
[snip]
>Nevertheless, having a char like "c" but to create a list of
>unsigned numbers would be really useful. So it would be for
>shorts and (tcl8.5) ints+wides, although I haven't had even
>nearly the same need for them as for an unsigned "c".
Turns out the patch for a 'u' as an unsigned char for the binary
command is really pretty small....
Index: generic/tclBinary.c
===================================================================
RCS file: /cvsroot/tcl/tcl/generic/tclBinary.c,v
retrieving revision 1.26
diff -u -r1.26 tclBinary.c
--- generic/tclBinary.c 27 Sep 2005 15:20:35 -0000 1.26
+++ generic/tclBinary.c 30 Sep 2005 01:28:37 -0000
@@ -1204,6 +1204,7 @@
break;
}
case 'c':
+ case 'u':
size = 1;
goto scanNumber;
case 't':
@@ -1749,6 +1750,10 @@
*/
switch (type) {
+ case 'u':
+ value = buffer[0];
+ goto returnNumericObject;
+
case 'c':
/*
* Characters need special handling. We want to produce a signed
I'd rather have a flag so that you could specify whether an integer
value is to be parsed as signed or unsigned. Now that we have arbitrary
precision integers (probably the real reason why nothing was done
before) that should be practical. (I don't think it is useful when
applied to floating-point numbers.)
Donal.
> Hmm...
>
> set value -32768
> format %08x [expr {$value >> 4}]
>
> Result: fffff800
>
>
> set value 0xffff8000
> format %08x [expr {$value >> 4}]
>
> Result: 0ffff800
When I learned Java, I saw *two* rightshift
operators : ">>" and ">>>".
The first behaves like with unsigned integers,
the second like with signed integers.
Shouldn't it be in the Tcl core?
I admit this is a strange Java feature because it
is not well-documented, but it works.
But (since Java is typed), we get control
about the size (32/64 bits) of our integers.
I do not know very much how this would be taken
in Tcl.
Regards,
Stéphane
>When I learned Java, I saw *two* rightshift
>operators : ">>" and ">>>".
>The first behaves like with unsigned integers,
>the second like with signed integers.
>Shouldn't it be in the Tcl core?
>I admit this is a strange Java feature because it
>is not well-documented, but it works.
in crypto stuff >>> is usually used to mean rotate left where >> is
shift left. ie: 0x00000001 >>> 1 becomes 0x80000000
Surely that should be <>> (to indicate that some bits go back the other
way, of course.)
Donal.
But then ">><", because it's the rightmost bit that goes left :-)
*scnr*
Is that the only way to solve the problem with signedness? Or maybe it would
be possible to add a 'u' letter to the end of number to make it unsigned?
Consider something like that:
set x -1u
>>> -1u
format %x $x
>>> ffffffff
In this case you would be able to add this 'u' immediately to the value of the
variable:
set a [expr ${x}u >> 2]
Either an unlimited number of 'u' would be added or the number should be first
normalized using, for example, [set x [expr $x+0]].
> When I learned Java, I saw *two* rightshift
> operators : ">>" and ">>>".
> The first behaves like with unsigned integers,
> the second like with signed integers.
<QUOTE from Thinking in Java by Bruce Eckel>
The signed right shift >> uses sign extension: if the value is
positive, zeroes are inserted at the higher-order bits; if the value is
negative, ones are inserted at the higher-order bits. Java has also
added the unsigned right shift >>>, which uses zero extension:
regardless of the sign, zeroes are inserted at the higher-order bits.
</QUOTE>
1. As Java uses this feature for its integers,
and does not provide unsigned integers, why
couldn't we apply this semantic in Tcl?
2. Casting from int/wide to int/wide is easy.
But the bignum part of Tcl 8.5 could
put some strange behavior.
3. Think about whether we should apply
bit-to-bit operators on bignums?
math::bigfloat(in tcllib) already relies on
left- and right-shifts with such numbers.
Cheers,
Stéphane
That's a good argument. Care to TIP it up? :-)
> 2. Casting from int/wide to int/wide is easy.
> But the bignum part of Tcl 8.5 could
> put some strange behavior.
It has been in the past, but I think the bugs are (mostly) fixed now.
> 3. Think about whether we should apply
> bit-to-bit operators on bignums?
> math::bigfloat(in tcllib) already relies on
> left- and right-shifts with such numbers.
The answer is "yes" and the only place where it matters is in how the
code treats _left_ shifts. However, since the width of Tcl's numbers was
never clearly defined in the first place (trust me on that) anyone using
left shifts and relying on the absence of bignums actually had a bug in
their code anyway. Deal with the problem through explicit masks stating
how many bits are to be retained.
Donal.
For everyone's reference, this issue is being addressed by TIP#275 which
is currently the subject of a vote. Details available at:
http://tip.tcl.tk/275.html
Donal.
I would have preferred a symbol to a letter for flags.
Just replace "u" with "+" and I find it already better,
then let the plus precede the affected format-char and
I'd be ecstatic :-)
for large ints (>64bit) there should also be a way to
extract them from a (byte-aligned) bitstream, although
the number would then mean the length of data for one
item just like with "a".
Actually, it doesn't even have anything to do with
signedness (the topic drifted off a bit in this subthread)
and my comment about ">><" was a joking answer to Donal's
(also joking) "<>>"
Because, tcl8.5 is "wrap-around free", we fortunately do no
longer need such clutches in tcl8.5 any more. And it quite
surely won't make it into 8.4.*, either.
We discussed that (well, mocked it up quickly in code) and it wasn't
really better. The 'u' was Pat's choice, and since he's actually doing
the work, I'm not very inclined to argue. ;-)
Donal.
Just to remove any doubt, "u" for unsigned is still much better than
no unsigned-marker at all.
Anyway I wonder in what way a leading "+" would have been
inferior to the trailing "u".
"+" means "positive", or "one or more" in regexps. Signed numbers can
also be positive :)
As "u" also means "unsigned" in [format] (though stand-alone, not as a
modifier), I think using "u2 is following the principle of least
surprise.