Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tcl 9: integer representation with "_" - shimmering effects?

183 views
Skip to first unread message

Ralf Fassel

unread,
Jul 21, 2023, 12:00:08 PM7/21/23
to
During the very informative EuroTCL online session about the future of
Tcl, namely Tcl 9, one item caught my attention:

- in Tcl 9 it will be possible to write numbers with "_" for better
readability:

set x 1_234_567
string is integer $x
=> 1

Since we use that "_"-notation in some context for separating various
parts of eg. filenames, and it can't be guaranteed that all the
separated parts are NOT triples of 0-9, I wondered whether the "_" in $x
will have any chance of disappearing by some shimmering?

Note that I'm not talking about [incr x] in the above example, but if eg
a simple [string is integer $x] will make $x = 1234567 from 1_234_567,
this would be Not Good, since obviously [split $x "_"] would no longer
work then.

R'

saitology9

unread,
Jul 21, 2023, 12:17:45 PM7/21/23
to
On 7/21/2023 12:00 PM, Ralf Fassel wrote:
>
> Note that I'm not talking about [incr x] in the above example, but if eg
> a simple [string is integer $x] will make $x = 1234567 from 1_234_567,
> this would be Not Good, since obviously [split $x "_"] would no longer
> work then.
>

I agree. IMO, if true, a completely unnecessary feature with limited
benefit and huge risk potential to break things as you pointed out.

Perhaps the "-strict" version will work as before and reject it?


stefan

unread,
Jul 21, 2023, 12:57:26 PM7/21/23
to
This was discussed on site, NO, right now -strict mode does not catch this:

% package req Tcl
9.0a4
% string is integer -strict 1_234_567
1

This should be fixed. (Also, -strict mode for "string is" should become default in Tcl 9, anyways.)

Stefan

Ashok

unread,
Jul 21, 2023, 1:38:01 PM7/21/23
to
With respect to your first question, no the string representation will
not shimmer away. This is no different than "set x 0x10" and worrying
about the string shimmering to "16" (assuming no arithmetic operations
are applied of course).

With respect to the second, the "string is integer" identifies what
strings are accepted by *Tcl* as integer, not humans. Since 1_234 is
always acceptable the command will return 1 irrespective of -strict.

As to -strict being the default in Tcl 9, I think that ship has sailed.
Iirc, it would have too much impact on Tk.

/Ashok

Ted Nolan <tednolan>

unread,
Jul 21, 2023, 1:59:13 PM7/21/23
to
In article <u9efpk$3brt2$2...@dont-email.me>,
Ashok <apnmbx...@yahoo.com> wrote:
>With respect to your first question, no the string representation will
>not shimmer away. This is no different than "set x 0x10" and worrying
>about the string shimmering to "16" (assuming no arithmetic operations
>are applied of course).
>
>With respect to the second, the "string is integer" identifies what
>strings are accepted by *Tcl* as integer, not humans. Since 1_234 is
>always acceptable the command will return 1 irrespective of -strict.
>
>As to -strict being the default in Tcl 9, I think that ship has sailed.
>Iirc, it would have too much impact on Tk.
>
>/Ashok
>

I'm curious how this feature was motivated. Does any other language use
this notation? Perhaps I'm missing something, but to me it looks like
a solution in search of a problem, and something likely to have unintended
consequences.

Or maybe I'm missing something.
--
columbiaclosings.com
What's not in Columbia anymore..

Rich

unread,
Jul 21, 2023, 2:50:16 PM7/21/23
to
Ted Nolan <tednolan> <t...@loft.tnolan.com> wrote:
> In article <u9efpk$3brt2$2...@dont-email.me>,
> Ashok <apnmbx...@yahoo.com> wrote:
>>With respect to your first question, no the string representation will
>>not shimmer away. This is no different than "set x 0x10" and worrying
>>about the string shimmering to "16" (assuming no arithmetic operations
>>are applied of course).
>>
>>With respect to the second, the "string is integer" identifies what
>>strings are accepted by *Tcl* as integer, not humans. Since 1_234 is
>>always acceptable the command will return 1 irrespective of -strict.
>>
>>As to -strict being the default in Tcl 9, I think that ship has sailed.
>>Iirc, it would have too much impact on Tk.
>>
>>/Ashok
>>
>
> I'm curious how this feature was motivated. Does any other language use
> this notation?

At least one other language is Python:

https://peps.python.org/pep-0515/

Whether Python's usage motivated this change for Tcl I cannot say.

> Perhaps I'm missing something, but to me it looks like a solution in
> search of a problem, and something likely to have unintended
> consequences.
>
> Or maybe I'm missing something.

The intent appears to be to allow for a locale independent "thousands
separator" for long integer constants.

Instead of 17283747283748234

One can write 17_283_747_283_748_234

Also, according to the Phython PEP, the following other languages have
similar allowances for "thousand's separators":


Ada: single, only between digits [8]
C# (open proposal for 7.0): multiple, only between digits [6]
C++14: single, between digits (different separator chosen) [1]
D: multiple, anywhere, including trailing [2]
Java: multiple, only between digits [7]
Julia: single, only between digits (but not in float exponent parts) [9]
Perl 5: multiple, basically anywhere, although docs say it’s
restricted to one underscore between digits [3]
Ruby: single, only between digits (although docs say “anywhere”) [10]
Rust: multiple, anywhere, except for between exponent “e” and digits [4]
Swift: multiple, between digits and trailing (although textual
description says only “between digits”) [5]

stefan

unread,
Jul 21, 2023, 2:55:38 PM7/21/23
to
But in those languages, this syntactic sugar is thrown away early during processing, in Tcl, it keeps sitting in the stringrep. This makes a difference.

Stefan

Ted Nolan <tednolan>

unread,
Jul 21, 2023, 2:56:52 PM7/21/23
to
Interesting, thanks!

et99

unread,
Jul 21, 2023, 3:32:05 PM7/21/23
to
The _ can be used freely between digits in any way desired to make large literal numbers more readable. In particular, it's quite useful with hex and binary. With 64 bit ints, it is even more helpful.

100_000_000
0xffff_ffff
0b1111_1111_1111_1110
0b1111_1111___1111_1111___1111_1111___1111_1111

Once it is shimmered into an integer, the string representation changes:

%info patch
8.7a6

% set mask 0b1111_0000_1010
0b1111_0000_1010

% expr $mask
3850

% puts $mask
0b1111_0000_1010

% tcl::unsupported::representation $mask
value is a exprcode with a refcount of 4, object pointer at 0xffffffffac327ae0, internal representation 0xffffffffae0a38b0:0x0, string representation "0b1111_0000_1010"

% incr mask
3851

% tcl::unsupported::representation $mask
value is a wideInt with a refcount of 2, object pointer at 0xffffffffac3272d0, internal representation 0xf0b:0x3b59bd8, string representation "3851"


Do you think that the above expr $mask should do a shimmer? I don't know, perhaps. But clearly it should with the incr.

Please write a ticket if you think this is wrong.



et99

unread,
Jul 21, 2023, 3:56:20 PM7/21/23
to
Forgot to address the OP's example:

% set a 123_456_789
123_456_789

% string is integer $a
1

% tcl::unsupported::representation $a
value is a wideInt with a refcount of 4, object pointer at 0xfffffffff62a0190, internal representation 0x75bcd15:0x0, string representation "123_456_789"

% split $a _
123 456 789

saitology9

unread,
Jul 21, 2023, 5:23:07 PM7/21/23
to
On 7/21/2023 3:32 PM, et99 wrote:
>
> The _ can be used freely between digits in any way desired to make large
> literal numbers more readable. In particular, it's quite useful with hex
> and binary. With 64 bit ints, it is even more helpful.
>
>  100_000_000
>  0xffff_ffff
>  0b1111_1111_1111_1110
>  0b1111_1111___1111_1111___1111_1111___1111_1111
>

I don't access to Tcl 9 to test. So, it looks like "_" can be placed
anywhere as a non-enforcing separator. So earlier statements about this
being a thousand separator do not apply, I guess.

In any case, whatever the reasoning may be, I think Tcl already has a
way to address this:

% set what_number_is_this 123_4_56_789

% set ok_I_guess [string cat 123 4 56 789]



et99

unread,
Jul 22, 2023, 2:13:28 AM7/22/23
to
On 7/21/2023 2:23 PM, saitology9 wrote:

>
> I don't access to Tcl 9 to test. So, it looks like "_" can be placed anywhere as a non-enforcing separator. So earlier statements about this being a thousand separator do not apply, I guess.

That is correct. the _ can be anywhere and any number of times between numeric or hex digits, but cannot be between the radix e.g. 0x and the first digit nor after the last digit.

TIP 551 was approved for 8.7 and that's the version I tested, as I too don't have a tcl 9 to test. Somewhere, that escapes memory, there are single file 8.7 binaries for windows and linux. It might have been on github that I found one.

The motivation for the TIP began with some discussions here on clt 5-6 years ago. The list of languages that support this was lifted from a page in the perl universe. I first learned of its use in Ada.

Brian Griffin did the actual implementation.





et99

unread,
Jul 22, 2023, 2:26:37 AM7/22/23
to
Actually, it appears that we can have an _ after the radix, since this works:

% expr 0x_123
291
% expr 0x123
291

But this is not permitted:

% expr 0x_123_
invalid bareword "0x_123_"
in expression "0x_123_";
should be "$0x_123_" or "{0x_123_}" or "0x_123_(...)" or ...

briang

unread,
Jul 22, 2023, 2:36:51 AM7/22/23
to
The motivation stems from Hardware Description Languages, in particular VHDL (an HDL language derived from Ada).
Given that Tcl was conceived in a research lab working on Hardware Design related software, it seems fitting that Tcl share some similarities with the tools that use it. In this case, simulators for HDL's (primarily VHDL and Verilog), which use Tcl for command/control, and which are essentially debuggers (a la gdb), it can be helpful to, say, copy literal values out of the source window to the command line...

Ralf Fassel

unread,
Jul 22, 2023, 4:56:42 AM7/22/23
to
* et99 <et...@rocketship1.me>
| Forgot to address the OP's example:
>
| % set a 123_456_789
| 123_456_789
>
| % string is integer $a
| 1
>
| % tcl::unsupported::representation $a
| value is a wideInt with a refcount of 4, object pointer at 0xfffffffff62a0190, internal representation 0x75bcd15:0x0, string representation "123_456_789"
>
| % split $a _
| 123 456 789

Thanks, that last part is what I was hoping for.

R'

Ralf Fassel

unread,
Jul 22, 2023, 5:01:02 AM7/22/23
to
* t...@loft.tnolan.com (Ted Nolan <tednolan>)
| ["_" as thousands separator in numbers]
| I'm curious how this feature was motivated. Does any other language use
| this notation?

C++ has something similar since c++14 with the

int i = 18'446'744;

notation, but here this is not a problem, since the variable is never
interpreted as a string.

R'

et99

unread,
Jul 22, 2023, 7:02:46 AM7/22/23
to
As someone wrote in the tcl 9 new features list, there's a 1_000_001 uses. I was very pleased that Brian took on this task; if there's anyone who can do it right with respect to the tcl object system, it's Brian.

I am eagerly awaiting a video replay of the recent tcl conference so I can watch his presentation on Tcl Abstract Lists that was sadly given during my sleep time. I'm looking forward to seeing some more magic with tcl objects.


Ted Nolan <tednolan>

unread,
Jul 22, 2023, 12:24:18 PM7/22/23
to
Yikes! I really don't like that. Makes me glad I don't c++

Kenny McCormack

unread,
Jul 22, 2023, 1:13:36 PM7/22/23
to
In article <ygav8ec...@akutech.de>, Ralf Fassel <ral...@gmx.de> wrote:
>* et99 <et...@rocketship1.me>
>| Forgot to address the OP's example:
>>
>| % set a 123_456_789
>| 123_456_789
>>
>| % string is integer $a
>| 1
>>
>| % tcl::unsupported::representation $a
>| value is a wideInt with a refcount of 4, object pointer at 0xfffffffff62a0190,
>internal representation 0x75bcd15:0x0, string representation "123_456_789"
>>
>| % split $a _
>| 123 456 789

(Somewhat related...)

What about going the other way? Suppose I have a large integer number in
TCL, and I want to print it out with commas (or any other character) so
that it is more legible? I.e., :

I have: 123456789
I want: 123,456,789

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/RightWingMedia

Harald Oehlmann

unread,
Jul 22, 2023, 2:16:11 PM7/22/23
to
Am 22.07.2023 um 18:24 schrieb Ted Nolan <tednolan>:
> In article <ygar0p0...@akutech.de>, Ralf Fassel <ral...@gmx.de> wrote:
>> * t...@loft.tnolan.com (Ted Nolan <tednolan>)
>> | ["_" as thousands separator in numbers]
>> | I'm curious how this feature was motivated. Does any other language use
>> | this notation?
>>
>> C++ has something similar since c++14 with the
>>
>> int i = 18'446'744;
>>
>> notation, but here this is not a problem, since the variable is never
>> interpreted as a string.
>>
>> R'
>
> Yikes! I really don't like that. Makes me glad I don't c++

I want to express, that I am also very unhappy with the change.
Rolf have given details in his porting speech in Vienna.

I would love to have something like "string is
integernounderscoreandnoprefix" to check, if a string is ok for external
processing, so those are not accepted: "0xab", "0d12", "12_34".

I have a deja-vu about all the "scan $d %d" necessary due to the octal
interpretation of tcl8.x which will not be necessary for 9, but other
new issues come...

Anyway, TCL 9 is great !
Harald

briang

unread,
Jul 22, 2023, 5:53:56 PM7/22/23
to
On Saturday, July 22, 2023 at 8:16:11 PM UTC+2, Harald Oehlmann wrote:
> Am 22.07.2023 um 18:24 schrieb Ted Nolan <tednolan>:
> > In article <ygar0p0...@akutech.de>, Ralf Fassel <ral...@gmx.de> wrote:
> >> * t...@loft.tnolan.com (Ted Nolan <tednolan>)
> >> | ["_" as thousands separator in numbers]
> >> | I'm curious how this feature was motivated. Does any other language use
> >> | this notation?
> >>
> >> C++ has something similar since c++14 with the
> >>
> >> int i = 18'446'744;
> >>
> >> notation, but here this is not a problem, since the variable is never
> >> interpreted as a string.
> >>
> >> R'
> >
> > Yikes! I really don't like that. Makes me glad I don't c++
> I want to express, that I am also very unhappy with the change.
> Rolf have given details in his porting speech in Vienna.
>
> I would love to have something like "string is
> integernounderscoreandnoprefix" to check, if a string is ok for external
> processing, so those are not accepted: "0xab", "0d12", "12_34".

I disagree with this idea. [string is integer] can only be useful as an input filter. It is inappropriate for output filtering. The only way to "output" filter is by using [format]. I cannot think of any other way to guarantee an output of the exactly needed string value.

>
> I have a deja-vu about all the "scan $d %d" necessary due to the octal
> interpretation of tcl8.x which will not be necessary for 9, but other
> new issues come...

Not familiar with this octal/%d issue, but I can see how it could be trouble. I would not expect %d to parse anything but decimal digits. This is a good example of why output operations must use [format], assuming $d was obtained externally.

>
> Anyway, TCL 9 is great !
> Harald


-Brian

heinrichmartin

unread,
Jul 23, 2023, 7:52:00 AM7/23/23
to
-strict has a unique (and confusing) interpretation that stumped me at least once: "If -strict is specified, then an empty string returns 0, otherwise an empty string will return 1 on any class."
I.e. -strict should not be used to restrict allowed representations. It is really just about the empty string.

Ralf Fassel

unread,
Jul 24, 2023, 4:50:41 AM7/24/23
to
* gaz...@shell.xmission.com (Kenny McCormack)
| What about going the other way? Suppose I have a large integer number in
| TCL, and I want to print it out with commas (or any other character) so
| that it is more legible? I.e., :
>
| I have: 123456789
| I want: 123,456,789

https://wiki.tcl-lang.org/page/Delimiting+Numbers

has some examples for that, though maybe TCL 9 has something built-in?

HTH
R'

Kenny McCormack

unread,
Jul 24, 2023, 5:52:07 AM7/24/23
to
In article <ygamszl...@akutech.de>, Ralf Fassel <ral...@gmx.de> wrote:
>* gaz...@shell.xmission.com (Kenny McCormack)
>| What about going the other way? Suppose I have a large integer number in
>| TCL, and I want to print it out with commas (or any other character) so
>| that it is more legible? I.e., :
>>
>| I have: 123456789
>| I want: 123,456,789
>
>https://wiki.tcl-lang.org/page/Delimiting+Numbers

Yuck. So, there's no builtin way to do it (yet). You have to do it in
code. The examples at the URL seem awfully complex.

For what it is worth, here's how I do it in AWK; it seems something similar
ought to be do-able in Tcl:

# insert() (not shown) was also written by me.
function comma(num, i) {
for (i=length(num)-2; i>1; i-=3)
num = insert(num,i,0,",")
return num
}

>has some examples for that, though maybe TCL 9 has something built-in?

That'd be nice.

--
The single most important statistic in the US today - the one that explains all the
others - is this: 63 million people thought it was a good idea to vote for this clown
(and will probably do so again). Everything else is secondary to that. Everything else
could be fixed if we can revert this one statistic. Nothing can be fixed until we do.

saitology9

unread,
Jul 24, 2023, 10:37:45 AM7/24/23
to
On 7/24/2023 5:52 AM, Kenny McCormack wrote:
> In article <ygamszl...@akutech.de>, Ralf Fassel <ral...@gmx.de> wrote:
>> * gaz...@shell.xmission.com (Kenny McCormack)
>> | What about going the other way? Suppose I have a large integer number in
>> | TCL, and I want to print it out with commas (or any other character) so
>> | that it is more legible? I.e., :
>>>
>> | I have: 123456789
>> | I want: 123,456,789
>>
>> https://wiki.tcl-lang.org/page/Delimiting+Numbers
>
> Yuck. So, there's no builtin way to do it (yet). You have to do it in
> code. The examples at the URL seem awfully complex.
>
> For what it is worth, here's how I do it in AWK; it seems something similar
> ought to be do-able in Tcl:
>


The other direction is easier as it is under your control.

This is what I use and it should replace your awk script. You can use
use period instead of commas by supplying it as the second argument:

proc format_nums {num {sep ,}} {
while {[regsub -- {^([-+]?\d+)(\d\d\d)} $num "\\1$sep\\2" num]} {}
return $num
}



Kenny McCormack

unread,
Jul 24, 2023, 11:57:29 AM7/24/23
to
In article <u9m2bj$mtj3$1...@dont-email.me>,
saitology9 <saito...@gmail.com> wrote:
...
>The other direction is easier as it is under your control.
>
>This is what I use and it should replace your awk script. You can use
>use period instead of commas by supplying it as the second argument:
>
>proc format_nums {num {sep ,}} {
> while {[regsub -- {^([-+]?\d+)(\d\d\d)} $num "\\1$sep\\2" num]} {}
> return $num
>}

Thanks. That looks useful and good.

--
If Jeb is Charlie Brown kicking a football-pulled-away, Mitt is a '50s
housewife with a black eye who insists to her friends the roast wasn't
dry.

Andreas Leitgeb

unread,
Jul 24, 2023, 5:51:45 PM7/24/23
to
Harald Oehlmann <wort...@yahoo.com> wrote:
> I would love to have something like "string is
> integernounderscoreandnoprefix" to check, if a string is ok for external
> processing, so those are not accepted: "0xab", "0d12", "12_34".

Hello Harald!

regexp {^-?[0-9]+$} $candidate

or a bit more advanced, without leading zeros except for 0 itself:

regexp {^(0|-?[1-9][0-9]*)$} $candidate

This can also be wrapped in a proc, of course ;-)

For any particular external processing, you'd also need to check
for the acceptable value range.

Andreas Leitgeb

unread,
Jul 24, 2023, 5:56:27 PM7/24/23
to
stefan <stefan....@wu.ac.at> wrote:
> % package req Tcl
> 9.0a4
> % string is integer -strict 1_234_567
> 1
>
> This should be fixed.

yes! should be fixed! in the particular script. ;-)

Namely, not to use "string is integer", when a regexp check is desired.

Harald Oehlmann

unread,
Aug 14, 2023, 8:34:08 AM8/14/23
to
Dear Stefan, dear Andreas,

my reaction on the question was not adequate, I appologize. It is ok as
it is.
Make TCL 9 great !
Harald
0 new messages