Why does shimmering occur, when ...

MartinLemburg@Siemens-PLM

unread,

Jan 14, 2010, 9:50:39 AM1/14/10

to

Hi,

I know ... 1000s of times it is said ... don't care, while scripting
in tcl, about the internal type of the data you are working with.

But ... I care, because handling bigger structured data (dicts with
nesting dicts, lists (containing dicts, ...), ...) it is not that
nice, if shimmering occurs.
And if producing e.g. a report of a complex, high precision
calculation and its parameters and suddenly all the numerical data
lost its numerical representation? For me a "Ooh NOO".

So what's about this:

% info patchlevel
8.6b1.1
% proc objtype {arg} {return [string map {"pure" "string"} [lindex
[split [::tcl::unsupported::representation $arg]] 3]];}
% set d [dict create a 1 b 2 c 3]
a 1 b 2 c 3
% set d2 $d
a 1 b 2 c 3
% set l [lrepeat 3 $d]
{a 1 b 2 c 3} {a 1 b 2 c 3} {a 1 b 2 c 3}
% objtype $d
dict
% objtype $d2
dict
% objtype $l
list
% objtype [lindex $l 0]
dict
% format {%30s} $d; # expecting that the string representation of
d will be formatted, ...
a 1 b 2 c 3
% objtype $d; # ... but the data itself is converted to a string
for formatting
string
% objtype $d2; # even the connection from d to d2 was not cut,
while converting to a string
string
% objtype [lindex $l 0]; # even the values in the list are not
"duplicated" to save their internal type
string
% set d [dict create {*}$d]
a 1 b 2 c 3
% objtype $d
dict
% objtype $d2; # wow, the connection from d to d2 still existed
while expanding d
list

So, why "format" converts the data to be formatted instead of using
the string representation?

Using "format" I never expected that the data I want to format will
shimmer!
I expected the result to be a string, not the format argument to be
converted to a string.
So I really recommend not to use format on numbers directly for output
purposes, that shouldn't loose their internal representation and
"exactness":

% set r [expr {double(1)}]
1.0
% objtype $r
double
% format %s $r; # expecting the number not to convert to a string,
but to format it to a string
1.0
% objtype $r
string
% set r [expr {double(1)}]
1.0
% format %s [expr {$r}]
1.0
% objtype $r
double

And I wouldn't like to loose the dict representation while doing
something like the following:

% set fontSpec [font configure TkDefaultFont]; # returns a list of
options
-family Tahoma -size 8 -weight normal -slant roman -underline 0 -
overstrike 0
% dict set fontSpec -size 12; # converting the options list to a
dict
-family Tahoma -size 12 -weight normal -slant roman -underline 0 -
overstrike 0
% font create TkFont12Pt {*}$f; # implicitly converting the dict
to a list again
TkFont12Pt

What kind of pitfalls are hidden in tcl, that causes shimmering in
unexpected cases?

And ... a good news:

% set l [list 1 2 3]
1 2 3
% set arr($l) [list A B C]; # using a list as name of an array
element (a variable name), ...
A B C
% objtype $l; # ... does not convert the list to a "parsedVarName"
list
% set $l $arr($l); # while using the list as name of a scalar
variable, ...
A B C
% objtype $l; # ... shimmers the list to a "parsedVarName"
parsedVarName

The only conclusion from this confusion for me is - really try to
forget about the internal representation, concentrate on the EIAS rule
and hope, that my tcl application does not consume to much time in
restructuring "structured" data or destroys too much calculation
precision.

Greetings,

Martin

Andreas Leitgeb

unread,

Jan 14, 2010, 9:55:13 AM1/14/10

to

MartinLemburg@Siemens-PLM <martin.lembur...@gmx.net> wrote:
> % proc objtype {arg} {return [string map {"pure" "string"} [lindex
> [split [::tcl::unsupported::representation $arg]] 3]];}

Why that string map??

Andreas Leitgeb

unread,

Jan 14, 2010, 10:12:18 AM1/14/10

to

MartinLemburg@Siemens-PLM <martin.lembur...@gmx.net> wrote:
> % format {%30s} $d; # expecting that the string representation of d will be formatted, ...
> a 1 b 2 c 3
> % objtype $d; # ... but the data itself is converted to a string for formatting
> string

The problem with format lies in the %s "conversion". It's not really a
conversion *to string*, but rather formatting "of a string". That is, it
expects a string argument and will make whatever it gets into one.

Unfortunately, I don't know any way to tell tcl to generate a string
for some (arbitrary) object without adding the string rep to the obj.

Well, The problem would not show up with numbers directly formatted to
string with e.g. a "%f" conversion:
% set d [expr {atan(1)*4}] ;# -> 3.141592653589793
% objtype $d ;# -> double
% format %f $d ;# -> 3.141593
% objtype $d ;# (still) double

But then again, there must be some way, because printing out the results
of the assignments to d,d1,l in your original example didn't immediately
turn them into strings.

PS: my question about the mapping in objtype wasn't really relevant to
the problem, but I'm still curious.

MartinLemburg@Siemens-PLM

unread,

Jan 14, 2010, 11:04:11 AM1/14/10

to

Hi Andreas,

the mapping is only "useful", because for strings
"::tcl::unsupported::representation" returns "pure string", so that
the 3rd word of the result string is "pure", not "string". The other
internal object types seems to be "one worded".

That's all!

Greet's,

Martin

On 14 Jan., 15:55, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

MartinLemburg@Siemens-PLM

unread,

Jan 14, 2010, 11:14:01 AM1/14/10

to

Hi Andreas,

so "format {%s} ..." causes the generation of a string representation,
where none is found, and replaces the original internal object
representation by the new generated string representation?

% interp alias {} rep {} ::tcl::unsupported::representation
% set n [expr {3.0}]
3.0
% rep $n
value is a double with a refcount of 2, object pointer at
00DCCAE8, internal representation 00000000:40080000, string
representation "3.0".
% format {%s} $n
3.0
% rep $n
value is a string with a refcount of 2, object pointer at
00DCCAE8, internal representation 00D8C0B8:40080000, string
representation "3.0".

Even if "format {%s} ..." formats "of a string", it may take the,
probably new generated, string representation, but should not need to
destroy the objects internal representation!

If every data in tcl, no matter how it is internally represented, has
a string representation - and may it be built dynamically - than no
tcl command should have the need to throw away the internal
represetation, because it excepts only strings as input. Every tcl
data has its string representation as input for every tcl command
expecting strings.

I always thought this could be one way of EIAS.

Greet's

Martin

On 14 Jan., 16:12, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

Alexandre Ferrieux

unread,

Jan 14, 2010, 11:57:12 AM1/14/10

to

On Jan 14, 3:50 pm, "MartinLemburg@Siemens-PLM"

The reason is that there are two kinds of "string": the "string rep",
which does coexist with any internal rep, and is basically a modified-
UTF-8 string with a terminating \0, and the String internal rep, which
is an Unicode string. As it is an internal rep, the String obviously
erases whatever previous intrep was there.

So it all boils down to Tcl_Format requesting its arguments facing
'%s' to be first converted to a String object. This happens here:
tclStringObj.c, line 1875:

numChars = Tcl_GetCharLength(segment);

As you can guess, the reason is that the whole [format] concatenation
is done on such objects (in unicode). This in turn, I guess, is due to
field width specifiers which are expressed in characters, not bytes;
Unicode is the realm of character-counting...

As a workaround, just don't use [format %s], just use EIAS "naked":

set d {1 2 3 4}
=> 1 2 3 4
dict get $d 1
=> 2
puts [format %d 1]$d[format %d 2]
=> 11 2 3 42
::tcl::unsupported::representation $d
=> value is a dict with a refcount of 4, object pointer at
0x97236f8, internal representation 0x9733738:0x97236e0, string
representation "1 2 3 4".

-Alex

Andreas Leitgeb

unread,

Jan 14, 2010, 1:09:12 PM1/14/10

to

MartinLemburg@Siemens-PLM <martin.lembur...@gmx.net> wrote:
> so "format {%s} ..." causes the generation of a string representation,
> where none is found, and replaces the original internal object
> representation by the new generated string representation?

Now, I think I understand, and I agree with you, that just querying
a string-rep of an object should not "zap" the original rep. But
does it really? Why is there a difference between "pure string"
and "string" in the output of [rep $d]?

% set d [expr double(1)] ;# -> 1.0
% rep $d
value is a double with a refcount of 4, object pointer at 0x9c07f08,
internal representation (nil):0x3ff00000, string representation "1.0".

% set d [expr double(1)]; rep $d
value is a double with a refcount of 3, object pointer at 0x9c2b238,
internal representation (nil):0x3ff00000, no string representation.

% set d [expr double(1)]; format %s $d; rep $d
value is a string with a refcount of 3, object pointer at 0x9c2b928,
internal representation 0x9c253c8:0x3ff00000, string representation "1.0".

This was strange, as you (imho) rightly complained: why make string the
new primary rep? But the double is probably still there, as well

% set d [expr double(1)]; set d "$d "; rep $d
value is a pure string with a refcount of 3, object pointer at 0x9c2b760,
string representation "1.0 ".

Now, this time it was zapped, but thats fine here.

% set d [expr double(1)]; append d "x"; rep $d
value is a string with a refcount of 3, object pointer at 0x9c24870,
internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".

I'd really want to know, what the old internal rep turned to here...;
why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?
One really cannot use $d in a numeric expr-operation now (tried it).
Glitch in "::tcl::unsupported::representation" or elsewhere?

% set d [expr double(1)]; lappend d x; rep $d
value is a list with a refcount of 3, object pointer at 0x9c24690,
internal representation 0x9c2a9b0:(nil), no string representation.

Here, the double rep is really gone.

PS: Don't mind the refcounts. I've got Alex's original lastresult patch
compiled in (which accounts for the "4" in my first example) and I don't
know why it's 3 rather than 2 in the other cases.

PPS: info patchlevel == 8.6b1.1 (last updated last week, or so).

Alexandre Ferrieux

unread,

Jan 14, 2010, 5:42:41 PM1/14/10

to

On Jan 14, 7:09 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:
>
> [...] that just querying

> a string-rep of an object should not "zap" the original rep. But
> does it really? Why is there a difference between "pure string"
> and "string" in the output of [rep $d]?

[rep] just describes the truth, he's innocent :}
See my other post: when rep says "string" it means
typePtr==&tclStringType (no invention, the "name" field of
tclStringType is "string"). When it says "pure string" it means
typePtr==NULL.

So, the only things that "zaps" a previous internal rep is _forcing_
to tclStringType.
And it happens within [format %s] as explained in my post.

Now after digging a bit in the code it appears that:

- forcing to tclStringType is indeed useful to count chars
- the actual concatenation is done by Tcl_AppendObjToObj, which for
all types except two (String and ByteArray-without-string-rep) works
at the string-rep level, hence is not responsible for shimmering.

As a conclusion I'd say that the String-forcing in [format] is (1)
useless in the absence of width specifiers, and (2) might be avoided
in all cases, by using (slightly slower) UTF-8-char-counting
functions.

I'll open a low-prio bug for this. Thanks Martin for unearthing it.

> This was strange, as you (imho) rightly complained: why make string the
> new primary rep? But the double is probably still there, as well

Nope. Just one internal rep at any given time. When it's string the
double is gone.
(If it were "pure string" also, since pure string means a null type
pointer)

> % set d [expr double(1)]; append d "x"; rep $d
> value is a string with a refcount of 3, object pointer at 0x9c24870,
> internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".
>
> I'd really want to know, what the old internal rep turned to here...;
> why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?

Apparently [append] suffers from the same suboptimaliy. Will track it
in the same bug report, thanks.

> One really cannot use $d in a numeric expr-operation now (tried it).
> Glitch in "::tcl::unsupported::representation" or elsewhere?

No. 1.0x is hard on anybody's math ;-)
And I don't see why it should be rep's fault that "1.0x" doesn't
cooperate with [expr]...

> PS: Don't mind the refcounts. I've got Alex's original lastresult patch
> compiled in (which accounts for the "4" in my first example) and I don't
> know why it's 3 rather than 2 in the other cases.

Flattered :)

3 == 1 in global var d, 2 in proc unknown's handling of "rep".
If you want to see a "1" as a refcount, I'd suggest:
- avoiding shortcuts that call [unknown]
- avoiding aliases
- avoiding variables
- avoiding [history lastresult] by appending ";set foo 1" to all
lines

-Alex

Alexandre Ferrieux

unread,

Jan 14, 2010, 6:03:16 PM1/14/10

to

On Jan 14, 11:42 pm, I wrote:
>
> I'll open a low-prio bug for this. Thanks Martin for unearthing it.
>

Done as https://sourceforge.net/tracker/?func=detail&aid=2932421&group_id=10894&atid=110894

-Alex

Andreas Leitgeb

unread,

Jan 15, 2010, 6:14:40 AM1/15/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
> See my other post: when rep says "string" it means
> typePtr==&tclStringType

Yes, sorry, I saw that only after writing mine.

Most of my comments were kind of voided then, but the one about
append is still strange to my understanding.

>> % set d [expr double(1)]; append d "x"; rep $d
>> value is a string with a refcount of 3, object pointer at 0x9c24870,
>> internal representation 0x9c11cc8:0x3ff00000, string representation "1.0x".
>>
>> I'd really want to know, what the old internal rep turned to here...;
>> why doesn't rep name it a "pure string" now, and (nil)ify the secend rep?
>
> Apparently [append] suffers from the same suboptimaliy. Will track it
> in the same bug report, thanks.

But it's still all different from the format-case.
I'd have expected "append" to destroy all but any of the two string-reps,
So, in any case, I'd have expected the ":0x3ff00000" to disappear.
That this ":0x3ff00000" was still there was the reason why I even made
that goofy expr-test on it. Why is it *not* replaced by ":(nil)", as it
happens with other operations that eventually thwart the previous type,
like "lappend" ?

% set d [expr {atan(1)*4}]; lappend d x; rep $d
value is a list with a refcount of 3, object pointer at 0x86528f8,
internal representation 0x8651848:(nil), no string representation.

>> PS: Don't mind the refcounts. I've got Alex's original lastresult patch
>> compiled in (which accounts for the "4" in my first example) and I don't
>> know why it's 3 rather than 2 in the other cases.
> Flattered :)

And I didn't even see that you were participating in this thread
when I wrote it :)

> If you want to see a "1" as a refcount, I'd suggest:

> - ...
> - avoiding aliases
Wasn't aware that aliases kept another ref, so now it's clear to me.

Thanks! (also for the bugreport)

MartinLemburg@Siemens-PLM

unread,

Jan 15, 2010, 7:26:03 AM1/15/10

to

Hi Andreas,

your "lappend" example made curious, so I did:

% set d [expr {1.0}]

1.0
% rep $d
value is a double with a refcount of 4, object pointer at

00D2AE60, internal representation 00000000:3FF00000, string
representation "1.0".
% lappend d x; rep $d
value is a list with a refcount of 4, object pointer at 00DCC2D8,
internal representation 00DC8070:00000000, no string representation.
% rep [lindex $d 0]
value is a pure string with a refcount of 4, object pointer at
00DCC428, string representation "1.0".
% rep [lindex $d 1]
value is a pure string with a refcount of 8, object pointer at
00D2B550, string representation "x".

Why the hell the first element of the new list "d" is not a double
anymore?

Yes ... if a value given to a list command is not a list it is
converted into a list - currently:

1. d contains a double
2. the contents of d are converted to list (with one atom)
3. the one and only atom of the list in d is converted to a string

I expected:

1. d contains a double
2. the contents of d are converted to list (with one atom)
3. the previous content of d (double) is used as the one and only
atom of the new list in d

There is NO need for shimmering! If "d" would have no string
representation, than it can be created from the internal object
representation!

Some days ago, we thought for our simulation application to create
"complex" object types, hiding complex structures in C++ via a simple
string.
I'm glad, we didn't go this way, because not everybody is writing
clean (non-shimmering) code and I'm afraid, that nearly nobody is able
to write non-shimmering code!

One example:

% set d [expr {1.0}]

1.0
% rep $d
value is a double with a refcount of 4, object pointer at

00D2AE60, internal representation 00000000:3FF00000, string
representation "1.0".
% llength $d
1
% rep $d
value is a list with a refcount of 4, object pointer at 00D2AE60,
internal representation 00D59720:00000000, string representation
"1.0".
% rep [lindex $d 0]
value is a pure string with a refcount of 4, object pointer at
00D2B130, string representation "1.0".
% set d [expr {1.0}]
1.0
% string is list $d
1
% rep $d
value is a list with a refcount of 4, object pointer at 00DCC4A0,
internal representation 00D63D98:00000000, string representation
"1.0".
% rep [lindex $d 0]
value is a pure string with a refcount of 4, object pointer at
00DCC428, string representation "1.0".

Even trying to detect a value being usable as list (string is list $d)
causes the value in d to shimmer to a list with one pure string atom.
I always thought, that all the "string is ..." usages won't alter any
data pushed into this command, but even the following converts the
internal object type:

% set d [expr {1.0}]
1.0
% string is list $d
1
% rep $d
value is a list with a refcount of 4, object pointer at 00DCC4A0,
internal representation 00D63D98:00000000, string representation
"1.0".
% string is double $d
1

% rep $d
value is a double with a refcount of 4, object pointer at

00DCC4A0, internal representation 00000000:3FF00000, string
representation "1.0".

Not that nice, isn't it?

Using variables containing values of "complex" application specific
tcl object types (e.g. 3D geometric facetted representations based on
numerically exact 3D geometry, with a string representation describing
the tesselation parameters) is so a bit risky, because this shimmering
chaos (IMO) would cause a lot of actions of building up internal C++
structures and tearing them down again.
And the more I dive into, the more I don't expect to be able to
prevent shimmering!

So a last question: may it be useful to have a "string is dict"
capability to allow a quick conversion from a list to a dict?

Best regards,

Martin

On 15 Jan., 12:14, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

Kevin Kenny

unread,

Jan 15, 2010, 8:24:30 AM1/15/10

to

That might be all right, but it's horribly complicated. The code in
[lappend] tries to be fairly general; if it's confronted with a pure
object, it has to stringify it in case it's not a single list element.
(This can happen with the "pure String" internal rep that comes back
from some of the operations, for instance.) To avoid the string
conversion, we'd have to have special cases, "a double converts to
a one-element list", "an integer converts to a one-element list",
and so on. These would bloat the code and invite coding errors,
to very little benefit. Observe that in any case, this code
is really requesting shimmering (it constructs a double, and then
treats it as a list!); avoiding it is as simple as:

set d [list [expr {1.0}]]; # don't treat the double as a list,
# but instead make a list containing it.
lappend d x

Here you construct the list explicitly from the double, and append a
second element to it. The only shimmering occurs in the construction
of the pure '1.0' result, and of course with that being a constant,
you could save it off.

--
73 de ke9tv/2, Kevin

MartinLemburg@Siemens-PLM

unread,

Jan 15, 2010, 9:00:11 AM1/15/10

to

Hi Kenny,

you are right - right coding can avoid shimmering, but taking a look
at all the developers I know writing tcl code ... everyone is trying
to be "on the line" of good coding, but everyone has its habits and
style and even if everyone tries to full fill a style or development
guide, it never really happens.

And if some rapid developed code is introduced to be replaced later
on, than this RaD code is often not that good either!

Greet's,

Martin

> 73 de ke9tv/2, Kevin- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Andreas Leitgeb

unread,

Jan 15, 2010, 9:18:43 AM1/15/10

to

MartinLemburg@Siemens-PLM <martin.lembur...@gmx.net> wrote:
> your "lappend" example made curious, so I did:

> % set d [expr {1.0}]; lappend d x; rep [lindex $d 0]

> value is a pure string with a refcount of 4, object pointer at
> 00DCC428, string representation "1.0".

(somewhat contracted quote)
I had this example in mind, too, but removed it from my post before
sending it off, for two reasons:

It distracted from the actual question: (why was that second
(out-of-date) representation still visible in the result of
[rep $d] *after* [append] had "destroyed" the old value?)

And, converting a double (or int) directly into a list is not the
common case, but rather a sign that one forgot to initially create
a singleton list of that number.
It's rather a coincidence that numbers can be "safely" reinterpreted
as list of that number, and this coincidence does not extend to e.g.
dicts. It's better to get used to distinguishing numbers and lists,
than have tcl actively support that mixup which fails almost every-
where except with numbers.

Alexandre Ferrieux

unread,

Jan 15, 2010, 10:34:42 AM1/15/10

to

On Jan 15, 3:00 pm, "MartinLemburg@Siemens-PLM"

<martin.lemburg.siemens-...@gmx.net> wrote:
> Hi Kenny,
>
> you are right - right coding can avoid shimmering, but taking a look
> at all the developers I know writing tcl code ... everyone is trying
> to be "on the line" of good coding, but everyone has its habits and
> style and even if everyone tries to full fill a style or development
> guide, it never really happens.
>
> And if some rapid developed code is introduced to be replaced later
> on, than this RaD code is often not that good either!

Not sure exactly what you're arguing about here (please avoid top
posting when the quoted message is rich), but note this:

[append]'s shimmering to a string object or to a pure string is
perfectly justified.

(you'll also note I did't mention [append] in the bugreport's title
for this reason)

Why ? Because for all other types (Dicts, Lists, varied Numbers,
Channels, etc), the result of appending is almost certainly not
convertible to that type.
And when it is, [appending] is just the most stupid way of doing it...

So, it's not at all a matter of style or educated optimization, it's
one of logic. If you want to be both fast and maintainable, stay in
the definition domain of the type. To extend a dict use dict (or even
list [*]) operations. To extend a list use [lappend]. But [append] is
a string tool, period. Don't mix oil and water.

-Alex

[*] it turns out Lists and Dicts are friends, and have conversion
shortcuts.

Kevin Kenny

unread,

Jan 15, 2010, 11:02:33 PM1/15/10

to

Andreas Leitgeb wrote:
> It distracted from the actual question: (why was that second
> (out-of-date) representation still visible in the result of
> [rep $d] *after* [append] had "destroyed" the old value?)

I'd say it's a bug in the 'rep' command. The internal rep is
meaningless if the type pointer is NULL, and I think we just
never bother to overwrite the pointers.

Alexandre Ferrieux

unread,

Jan 16, 2010, 1:13:06 PM1/16/10

to

Sorry, I got lost in the various cases mentioned here. Can one of you
gentlemen describe again that case of outdated representation ? It'll
be a matter of honour for me to fix [rep] accordingly if it is proved
guilty ;-)

-Alex

Kevin Kenny

unread,

Jan 16, 2010, 1:32:17 PM1/16/10

to

Andreas Leitgeb wrote:
>>> It distracted from the actual question: (why was that second
>>> (out-of-date) representation still visible in the result of
>>> [rep $d] *after* [append] had "destroyed" the old value?)

I replied:

>> I'd say it's a bug in the 'rep' command. The internal rep is
>> meaningless if the type pointer is NULL, and I think we just
>> never bother to overwrite the pointers.

Alexandre Ferrieux wrote:
> Sorry, I got lost in the various cases mentioned here. Can one of you
> gentlemen describe again that case of outdated representation ? It'll
> be a matter of honour for me to fix [rep] accordingly if it is proved
> guilty ;-)

The case in question was from Martin Lemburg:

> > % set d [expr {1.0}]; lappend d x; rep [lindex $d 0]
> > value is a pure string with a refcount of 4, object pointer at
> > 00DCC428, string representation "1.0".

The object pointer is meaningless if the value is a pure string.

Andreas Leitgeb

unread,

Jan 16, 2010, 3:30:12 PM1/16/10

to

after something was [append]ed, it doesn't talk of a pure string (obviously
it has a full string-rep then), but still mentions the old pointer (or
inline value) among "internal representation"

Don't know if it is the same thing, but at least to me the object pointer
appears like a slightly different thing as the "internal representation",
and for each of them there is a (different) situation, where it shows a
value, but rather shouldn't.

% set d [expr {double(1)}]; append d x; rep $d

value is a string with a refcount of 3, object pointer at 0x8e778b8,
internal representation 0x8e6a9e8:0x3ff00000, string representation "1.0x".

It's the :0x3ff00000 that's no longer relevant. (Don't know exactly what
the other part 0x8e6a9e8 is, but at least it changed from before the append.)

Alexandre Ferrieux

unread,

Jan 17, 2010, 4:40:00 AM1/17/10

to Kevin Kenny

No, the "object pointer" is the Tcl_Obj's address. So it does have a
meaning even for pure strings: it is the value's identity !

[rep] pleads Not Guilty ;-)

-Alex

Alexandre Ferrieux

unread,

Jan 17, 2010, 4:46:41 AM1/17/10

to

On Jan 16, 9:30 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:
>

> % set d [expr {double(1)}]; append d x; rep $d
> value is a string with a refcount of 3, object pointer at 0x8e778b8,
> internal representation 0x8e6a9e8:0x3ff00000, string representation "1.0x".
>
> It's the :0x3ff00000 that's no longer relevant. (Don't know exactly what
> the other part 0x8e6a9e8 is, but at least it changed from before the append.)

Ah, I see. That's not a bug in [rep] either, it's just a consequence
of code genericity. A Tcl_Obj always holds two words for the internal
value, and it is up to the type-specific code (hooked to through the
typePtr) to interpret them. See the internalRep union in tcl.h:

union { /* The internal representation: */
long longValue; /* - an long integer value. */
double doubleValue; /* - a double-precision floating value. */
VOID *otherValuePtr; /* - another, type-specific value. */
Tcl_WideInt wideValue; /* - a long long value. */
struct { /* - internal rep as two pointers. */
VOID *ptr1;
VOID *ptr2;
} twoPtrValue;
struct { /* - internal rep as a wide int, tightly
* packed fields. */
VOID *ptr; /* Pointer to digits. */
unsigned long value;/* Alloc, used, and signum packed into a
* single word. */
} ptrAndLongRep;
} internalRep;

For some types, both words are useful (e.g. doubles); for others,
only the first, like for the "string" type where it is a pointer to
the type-specific block of data.

Then when writing [rep], the only way to be generic, is to display
both words in hex (there's no introspection info in types, saying "hey
I use only the first word").

-Alex

Andreas Leitgeb

unread,

Jan 17, 2010, 6:42:28 AM1/17/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
>> It's the :0x3ff00000 that's no longer relevant. (Don't know exactly what
>> the other part 0x8e6a9e8 is, but at least it changed from before the append.)

> Ah, I see. That's not a bug in [rep] either, it's just a consequence
> of code genericity. A Tcl_Obj always holds two words for the internal
> value, and it is up to the type-specific code (hooked to through the

> typePtr) to interpret them. [...]

The Defendant made a good job - The Accusant retracts this case :-)

Finally:

Is there any idiom to obtain a string-rep of some arbitrary object
without even adding a "string representation" to the original obj
(i.e. have the generated string-rep *not* stored in that object at
all, but only created and kept in a separate new object) ?

Even if a string-rep is never supposed to make a logical difference
in Tcl, it obviously can make a difference in performance(see concat)
and memory-footprint(storing the string for longer than needed).

Kevin Kenny

unread,

Jan 17, 2010, 10:24:13 AM1/17/10

to

Alexandre Ferrieux wrote:

> No, the "object pointer" is the Tcl_Obj's address. So it does have a
> meaning even for pure strings: it is the value's identity !
>
> [rep] pleads Not Guilty ;-)

Complainant withdraws the charge. :}

Donal K. Fellows

unread,

Jan 17, 2010, 11:24:25 AM1/17/10

to

On 17 Jan, 11:42, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

> Is there any idiom to obtain a string-rep of some arbitrary object
> without even adding a "string representation" to the original obj
> (i.e. have the generated string-rep *not* stored in that object at
> all, but only created and kept in a separate new object) ?

You mean by duplicating the object? Apart from that, no. Why would we
want such a thing when caching the rep in case it is needed is almost
always the right thing?

> Even if a string-rep is never supposed to make a logical difference
> in Tcl, it obviously can make a difference in performance(see concat)
> and memory-footprint(storing the string for longer than needed).

That issue with [concat] has been fixed for a while. The memory
footprint issue appears real, until you remember that a representation
is only present if it was needed for some reason. Caching it until
displaced usually saves and doesn't lose except for total peak memory
consumption; there's an overall speed/space trade-off which is
acceptable for virtually all code on even vaguely modern hardware.

Donal.

Andreas Leitgeb

unread,

Jan 17, 2010, 2:04:34 PM1/17/10

to

Donal K. Fellows <donal.k...@manchester.ac.uk> wrote:
>> Is there any idiom to obtain a string-rep of some arbitrary object

>> without even adding a "string representation" to the original obj?

> You mean by duplicating the object? Apart from that, no.

Is there even a way to duplicate an arbitrary obj?

> Why would we want such a thing when caching the rep in case it is needed
> is almost always the right thing?

You already answered that question yourself by using the word "almost".

And yes, I admit it's a very narrow "almost". Even more so with the
concat-issue already fixed (which I wasn't aware of) and the issue
with format %s perhaps soon fixed, and no other such problem currently
on radar.

Donal K. Fellows

unread,

Jan 17, 2010, 3:18:23 PM1/17/10

to

On 17/01/2010 19:04, Andreas Leitgeb wrote:
> You already answered that question yourself by using the word "almost".
> And yes, I admit it's a very narrow "almost".

That's the other key point: when the fraction of things that are served
by a mechanism are very high, pushing a bit of cost onto the rest is
entirely justifiable overall. I care much more for the 99.9% case than
the 0.1% case.

Donal (warning: arbitrary fractions above).

tom.rmadilo

unread,

Jan 17, 2010, 5:06:32 PM1/17/10

to

On Jan 17, 11:04 am, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

> Donal K. Fellows <donal.k.fell...@manchester.ac.uk> wrote:
>
> >> Is there any idiom to obtain a string-rep of some arbitrary object
> >> without even adding a "string representation" to the original obj?
> > You mean by duplicating the object? Apart from that, no.
>
> Is there even a way to duplicate an arbitrary obj?

It seems to me the problem here is not duplicating an object, but
making a copy and leaving the original object as-is. I'm guessing you
do this via some interface to the object, like [dict get].

Alexandre Ferrieux

unread,

Jan 17, 2010, 5:24:18 PM1/17/10

to

On Jan 17, 8:04 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

> Donal K. Fellows <donal.k.fell...@manchester.ac.uk> wrote:
>
> >> Is there any idiom to obtain a string-rep of some arbitrary object
> >> without even adding a "string representation" to the original obj?
> > You mean by duplicating the object? Apart from that, no.
>
> Is there even a way to duplicate an arbitrary obj?

Yes, though a bit contorted:

# first make a second reference
set obj2 $obj
# then do something to the 2nd
append obj2 ""

If you play with [rep] on this example, you'll see that the first line
did just share the reference, but the 2nd applied copy-on-write, and
hence split off $obj2 just before smashing it into a string object.
But fortunately at this point $obj is untouched, and now its life is
well separated from $obj2's.

Of course some time has been lost smashing $obj2; in cases where this
matters, you can be smarter by replacing [append] by something type-
specific: for a list, [lappend]: for a dict, [dict set]. (but you'll
also need to revert that modification, while [append ""]'s is nil).

-Alex

Andreas Leitgeb

unread,

Jan 17, 2010, 5:52:17 PM1/17/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
>> Is there even a way to duplicate an arbitrary obj?
> Yes, though a bit contorted:
> # first make a second reference
> set obj2 $obj
> # then do something to the 2nd
> append obj2 ""

Ah, appending an empty string ... I had tried just [append obj2], and it
was a noop (i.e. didn't even create a string-rep). I vaguely remember the
discussion about [append var] and "$var" skipping stringification, so I'm
not very confident, that [append var "" "" "" ""] wouldn't someday be
optimized away, as well.

Andreas Leitgeb

unread,

Jan 17, 2010, 6:33:55 PM1/17/10

to

Donal K. Fellows <donal.k...@manchester.ac.uk> wrote:

> On 17/01/2010 19:04, Andreas Leitgeb wrote:
>> You already answered that question yourself by using the word "almost".
>> And yes, I admit it's a very narrow "almost".
> That's the other key point: when the fraction of things that are served
> by a mechanism are very high, pushing a bit of cost onto the rest is
> entirely justifiable overall. I care much more for the 99.9% case than
> the 0.1% case.

[format %d $num] does not add a string rep to an integer $num

some new [format %O $obj] could do the same for any object.

In (much) later versions, there could even be optimizations for
[format %*O $len $obj], that avoid generating the complete string,
but only create up to $len chars of it. It could be used for stackdumps
such like $::errorInfo, and probably lots of other places, where only
a prefix of some stringrep is of interest, and that string-rep is not
expected to be re-obtained from the same obj value anytime soon.

I mean, it wouldn't need to have any extra cost for the 99.9% of cases.

PS: %O is perhaps not best choice, given the similarity of O and 0 and
the frequency of 0's in that context. %P ? %S ? whatever.

Alexandre Ferrieux

unread,

Jan 17, 2010, 6:47:56 PM1/17/10

to

On Jan 18, 12:33 am, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:
>

> [format %d $num] does not add a string rep to an integer $num
> some new [format %O $obj] could do the same for any object.

Instead of a new % specifier, a [dup] could do. Unless Donal threatens
to beat me, even if I hide it in ::tcl::unsupported::duplicate :-/

-Alex

Alexandre Ferrieux

unread,

Jan 17, 2010, 7:36:50 PM1/17/10

to

On Jan 17, 11:52 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

Indeed, no guarantee :/
However, as of today, yet another (and somewhat simpler) hack that
works is:

set void ""
set obj2 $void$obj

What it does is:

(1) obtain $obj's string rep if not already computed
(2) set obj2 to a pure string with (a copy of) that string rep.

Note that $void must be on the left. As Kevin notes, the $void on the
right is optimized away in support of constructs like the K-free K
combinator: $x[unset x].

-Alex

Andreas Leitgeb

unread,

Jan 18, 2010, 5:29:32 AM1/18/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
> On Jan 17, 11:52 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>

>> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
>> >> Is there even a way to duplicate an arbitrary obj?
>> > Yes, though a bit contorted:
>> > # first make a second reference
>> > set obj2 $obj
>> > # then do something to the 2nd
>> > append obj2 ""

>> I'm not very confident, that [append var "" "" "" ""] wouldn't someday be
>> optimized away, as well.

> However, as of today, yet another (and somewhat simpler) hack that
> works is: set void ""; set obj2 $void$obj

Or even without a dummy var: set obj2 []$obj

But that isn't the same as the append-thing. append does a copy on write
first, so the original object doesn't get a string rep. *That* was the
intention.

Andreas Leitgeb

unread,

Jan 18, 2010, 6:02:20 AM1/18/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
>> [format %d $num] does not add a string rep to an integer $num
>> some new [format %O $obj] could do the same for any object.
> Instead of a new % specifier, a [dup] could do. Unless Donal threatens
> to beat me, even if I hide it in ::tcl::unsupported::duplicate :-/

My primary intention here was not so much cloning of objects, but rather
extraction of a generated string-rep. I wouldn't want to waste cpu&memory
making a copy of the original object, if that could be avoided.

format seems like a good metaphor to me, in that it doesn't really force
one to think about tcl object internals in the first place, but rather,
at a high level convey the purpose of creating a text-description of
something, that isn't meant to be *normally used* as a string.

The implementation (once a format-specifier is agreed on) should
be trivial: see, if the object already has a string rep. If so, just
share that. Otherwise request a string-rep of the object, share it to
a new object and then remove it from the original object.

As I wrote, the real potential of this approach will show up later, if
a maxlen field-modifier could further save the effort of creating the
complete string-rep in the first place.

Even later, then, the "#"-modifier could be used to format objects in
different ways, that are more trimmed for human readability, than for
reconstructability. That's already [format]'s job, but so far only for
a few types)

It's just dreams and brainstorming for now, but I like that vision.

Alexandre Ferrieux

unread,

Jan 18, 2010, 7:32:25 AM1/18/10

to

On Jan 17, 8:04 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:
> [...]

> and the issue
> with format %s perhaps soon fixed, and no other such problem currently
> on radar.

[format %s] fixed in HEAD 8.[56].

-Alex

Donal K. Fellows

unread,

Jan 18, 2010, 9:31:59 AM1/18/10

to

On 18 Jan, 12:32, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> [format %s] fixed in HEAD 8.[56].

Thanks for making the patch. (I made a test for 8.6 so it shouldn't
break in the future without warning.)

Donal.

Andreas Leitgeb

unread,

Jan 18, 2010, 10:16:46 AM1/18/10

to

Alexandre Ferrieux <alexandre...@gmail.com> wrote:
>> and the issue with format %s perhaps soon fixed ...

> [format %s] fixed in HEAD 8.[56].

Thanks!

PS: (after looking at the changes)

If a "precision" is given, then the argument is still stringified.
That's not much of a problem, but good to know, when one uses the
"precision" for a big list [format %.30s $biglist], that here the
list will still shimmer to string even after this change.

And a tidbid:
Out of curiosity, on line 1869 of tclStringObj.c, why is 'i' mapped
to 'd' *before* the switch, and not just another "case 'i':" added
to the bunch of cases already accompanying "case 'd':" (and, if
at all necessary, the char mapped only in that switch-arm)?

Alexandre Ferrieux

unread,

Jan 18, 2010, 10:28:45 AM1/18/10

to

On Jan 18, 4:16 pm, Andreas Leitgeb <a...@gamma.logic.tuwien.ac.at>
wrote:

> Alexandre Ferrieux <alexandre.ferri...@gmail.com> wrote:
> >> and the issue with format %s perhaps soon fixed ...
> > [format %s] fixed in HEAD 8.[56].
>
> Thanks!
>
> PS: (after looking at the changes)
>
> If a "precision" is given, then the argument is still stringified.
> That's not much of a problem, but good to know, when one uses the
> "precision" for a big list [format %.30s $biglist], that here the
> list will still shimmer to string even after this change.

Yes. This patch was the quick and easy part, still doing unicode
conversion for character counting, but only when counting is needed
(like for precision or width).
A more ambitious thing I have in mind is to remove the use of the
String object in [format] entirely, but I have little time for this
right now.

> And a tidbid:
> Out of curiosity, on line 1869 of tclStringObj.c, why is 'i' mapped
> to 'd' *before* the switch, and not just another "case 'i':" added
> to the bunch of cases already accompanying "case 'd':" (and, if
> at all necessary, the char mapped only in that switch-arm)?

No idea. Try 'cvs blame' (I mean, annotate), I'm not the father, just
the surgeon ;-)

-Alex

MartinLemburg@Siemens-PLM

unread,

Jan 19, 2010, 7:55:54 AM1/19/10

to

Hi Alex,

first - thanks for fixing these bugs!

Second - one thought of mine is, that using the EIAS principle, there
should never be a need to convert a value into the internal string
type. If not existent, the string representation must be built and
used, but that's all.
And if the string representation is e.g. encoded in UTF-8 and the
command using the string needs another encoding, than a conversion is
needed, but the source of the string representation shouldn't change,
not even the string representation!

So ...

% set d [expr {double(1.0)}]
1.0
% set s1 [format {%s} [expr (int($d)}]];
1
% set s2 [string range $d 0 [string first $d "."]-1]
1
% set s3 [string range $d 0 [string length $d]-2]
1
% set s4 [format {%.0f} $d]
1

... none of those statements above should change the internal object
type of the contents of "d" stricly working on the string
representation.

If this is possible, I start to think not of shimmering, while
developing/writing tcl code anymore.

Best regards,

Martin

On 18 Jan., 16:28, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

Alexandre Ferrieux

unread,

Jan 19, 2010, 8:47:59 AM1/19/10

to

On Jan 19, 1:55 pm, "MartinLemburg@Siemens-PLM"

<martin.lemburg.siemens-...@gmx.net> wrote:
> Hi Alex,
>
> first - thanks for fixing these bugs!
>
> Second - one thought of mine is, that using the EIAS principle, there
> should never be a need to convert a value into the internal string
> type. If not existent, the string representation must be built and
> used, but that's all.

Yes, you're in line with Kevin who regrets the creation of the String
type :}
I haven't dived long enough in the unicode-related parts of the code
to have an educated opinion on this, so what I can offer is just a
bird's view: in general, the internal rep is a cache meant to speed up
subsequent similar accesses to the object. When two callers compete by
asking for different types, shimmering occurs, and the alternative
becomes:

(1) - either reconcile the callers so that they use the same type
(hard)
(2) - or have a two-slots internal rep with a flush-oldest policy
(horrendous)
(3) - or let shimmering happen, proof admitting a conversion was
unavoidable at this spot anyway

Usually, we live with (3) and get (1) with an effort (which can be a
healthy move).
And (2) makes me sick.

Now, while String is certainly parasitic in the inputs to [format]
(further patch soon), there *are* situations where it is the wanted
type, to be stored for a long time in the cache: [puts] to a non-
binary-encoded channel for example. Of course the statistical
importance of this is hard to measure; that's why extreme care is due
when planning such evolutions for our beloved "untyped" language's ...
type system ;-)

-Alex

Kevin Kenny

unread,

Jan 22, 2010, 5:51:15 PM1/22/10

to

Alexandre Ferrieux wrote:
> Yes, you're in line with Kevin who regrets the creation of the String
> type :}

Oh, String is useful, I grant that. But

(1) It's more heavyweight than it needs to be.
(2) We're too eager to use it when we don't need it.

If I had it to do over, I'd have a data structure that would index
into the string, giving the byte position of every Nth character for
some small N (16? 64? Would have to measure performance...), with
perhaps an optimisation to handle long strings of ASCII.

A general String overhaul would also be A Good Idea. We really need
to consider UTF-8 normalisation (and perhaps even fix character
counting for combining forms). Laying the infrastructure for things
like bidi rendering would also be helpful.

So delving once again into our Unicode handling might be a useful
project.

tom.rmadilo

unread,

Jan 22, 2010, 7:22:02 PM1/22/10

to

On Jan 22, 2:51 pm, Kevin Kenny <kenn...@acm.org> wrote:
> Alexandre Ferrieux wrote:
> > Yes, you're in line with Kevin who regrets the creation of the String
> > type :}
>
> Oh, String is useful, I grant that. But
>
> (1) It's more heavyweight than it needs to be.
> (2) We're too eager to use it when we don't need it.

I might be confused here, so smack me down if necessary. The problem
with any string in Tcl is that there is no index into the string.
Forget characters. The first thing needed is the ability to iterate
over the string one byte at a time. Then, you could easily create a
proc which transforms a generic string into a character string. For
binary data, this is a noop. Maybe we need a parallel [octets] command
to support [string]. You could also generalize with [bitsring] with an
option -charbits to handle different char sets, in the case of fixed
bit-length character sets. Anyway this has the scent of a
continuation, if you want to avoid the extra storage required.

> If I had it to do over, I'd have a data structure that would index
> into the string, giving the byte position of every Nth character for
> some small N (16? 64? Would have to measure performance...), with
> perhaps an optimisation to handle long strings of ASCII.

Personally I think all the cost should go to the non-default users.
Optimization in this case already exists, so anything added is a long
route back to the current situation. Just create a separate string
command for the non-default cases. For ASCII, this might be faster,
also UTF-16.

The main point is that you need a separate non-blob-like structure
that can efficiently index into the blob (a string really is a blob,
sometimes dropping the l). The default structure is just a binary
string with fixed bit-length chars. The option is to transform this
into variable length chars, which would require a memory consuming
index. Still, the index could be sparse, only including segments
already found.

So first you need a metric, probably bits, to measure a string. Fixed
length character sets are mapped by addition, subtraction and
multiplication. Some character sets can be mapped from the beginning
or the end (UTF-8), some only from the beginning. None of these
mappings will be easy without the fixed metric and an efficient index
into the metric.