% set s "\u005C\u0065\u0020\u0066\u005C\u0067"
\e f\g
% set j [join $s]
e fg
% set k $s
\e f\g
Why does join strip backslashes? This isn't mentioned in the
documentation and isn't what I expected. Is it a bug or am I missing
something?
Isn't this because [join] treats its argument as a list? Which
means that each element may contain backslashes to escape
special characters? So I think you're simply falling foul - in
a somewhat unexpected way - of the "not all strings are sensible
lists" problem. [string map {{ } {}} $s] may be closer to
your intent, perhaps?
--
Jonathan Bromley, Consultant
DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan...@MYCOMPANY.com
http://www.MYCOMPANY.com
The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
Ah, that makes sense. That must be the explanation.
[join] is a list operation. This means that given a random string, it
must first convert it to a list -- a "best effort" being the only
guarantee.
Some strings get reversibly translated (i.e. the string rep of the
convert list equals the initial string): these are canonical lists.
Some are noncanonical and get interpreted (like the one you found).
Some are not convertible like "{" and raise an error on conversion.
The rule for canonicity can most conveniently be tried out by using
the [list] operation on "chemically active" individual items (i.e.
things containing \$[] or unbalanced "{}). Basically any dangerous
compound is protected by {} or \ when [list]ified and converted back
to string.
Most get simply protected by a pair of {} around the item, but out of
necessity the single "}" character, listified by [list \}], gives the
canonical form
\}
In reverse, this means that unprotected backslashes in the toplevel of
the string rep of a would-be list, must be *interpreted*, i.e
backslash-substituted. It turns out that \e and \g have no special
meaning, hence the operation yields e and g. Hence your result.
HTH,
-Alex
That's expected behavior.
Join expects its argument to be a list. Since you give it a string, you
are relying on the implicit conversion of the string to a list. And note
that {\e f\g} is decidedly *not* the same as the string form of the list
with a first element of \e and a second element of f\g.
What you are doing is no different than:
% join {\e f\g}
e fg
Notice how that is different than
% set l [list {\e} {f\g}]
{\e} {f\g}
% join $l
\e f\g
If you are trying to build a list with the first element of the two byte
sequence "\e" and a second element of the three byte sequence "f\g",
you're going about it the wrong way.
My advice is to always stick to the rule of thumb "never use list
commands on strings".
--
Bryan Oakley
http://www.tclscripting.com
Yes, I know not to do that. I ran into this when I applied join to
the result of a previous operation, which was intended to produce a
list but did not. Unfortunately, that didn't prompt me to think about
what join would do in its attempt to treat the non-list as a list.
Oooh! I like that description.
Donal.
<a nice discussion of why strings aren't necessarily lists>
Is the *precise* behaviour of [list], in encapsulating
non-list-friendly strings, documented anywhere? Or is
it necessary to resort to the source code to find out?
Obviously I'm aware that, in many cases, [list] has several
choices for what to do to a string in order to render it
safe as a list...
While one would have thought that looking at the list man page would
provide that information, unfortunately, it does not. It just uses the
phrase "as necessary", without detailing what is necessary.
Perhaps someone could post a Feature Request at http://tcl.sf.net/
asking that the list man page be updated, using Alex's explanation as
fodder for the proposed expansion.
Not as far as I know. But you shouldn't care.
set l [ list $a $b $c $d $e $f]
now, now matter what those variables contain they will be safely
handled in the list - that may mean nothing has changed, it may
mean it was wrapped in "{" and "}" or it may mean it has some "\"
added at one or more places. but if you do
lindex $l 3
you will get the *exact* value of $d - if it had been mucked with
internal to the list, it will be *unmucked* with when poulled back
out. If you do
puts $l
you will see the quoting chars, but if you use
puts [join $l]
you won't.
Bruce
>Jonathan Bromley wrote:
>> Is the *precise* behaviour of [list], in encapsulating
>> non-list-friendly strings, documented anywhere?
>Not as far as I know. But you shouldn't care.
I don't care, if I'm being well-behaved and hygienic
so that I never allow my lists to shimmer into strings.
But there are at least two reasons why I would rather
like to know, and why I think I'm at least partly
justified in asking:
(1) purely for (self?) tutorial purposes;
(2) in order to get better understanding of exactly
what would make a string be illegal as a list.
In any case, given that it's a behaviour that can be
exposed by legal operations in the language, it would
be nice for it to be robustly documented.
It ain't a priority though, that's for sure.
> (2) in order to get better understanding of exactly
> what would make a string be illegal as a list.
I understand the curiousity angle. The best bet, however, is to always
make certain that you use "split" on strings input from external
sources (files, users) and that you use list when constructing values.
That way, if sometime in the future a new wrinkle arises and something
new needs to be done for some special character or sequence, you would
less likely have application failures.
One minor problem is that [list] escapes things that don't really need
to be escaped to make a proper list, and other routines (like [eval])
rely on this escaping.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
I didn't mean to imply you shouldn't ask, or that it's not informative,
or useful to know in general. But more along the line of don't rely
on that knowledge to do something special or tricky in code.
doing that leads you down a broken path as those internal details
are *not* specified and theoretically subject to change and if you
rely on them things can go bad. (Very much true for anyone who writes
interfaces that rely on *guessing* types or wanting type info inherent
in a value)
Bruce
Consider that different implementations of Tcl are free to quote the
string in different ways as long as it generates a valid list. Also
consider that future and past versions of the "official" Tcl are also
free to quote the string differently from the current 8.4/8.5. This
means that the most "robust" way to document the behavior of [list] is
that it will quote "as necessary".
In light of this, IF it is documented, then it should not be
documented in the actual man page but in the implementation's README/
changelog/errata instead.
> On 2 May 2007 13:34:27 -0700, Alexandre Ferrieux
> <alexandre...@gmail.com> wrote:
>
> <a nice discussion of why strings aren't necessarily lists>
>
> Is the *precise* behaviour of [list], in encapsulating
> non-list-friendly strings, documented anywhere? Or is
> it necessary to resort to the source code to find out?
> Obviously I'm aware that, in many cases, [list] has several
> choices for what to do to a string in order to render it
> safe as a list...
I'm sure it has changed in the past. Might there be
cause to change it in the future? Who knows, but if it
is *specified* it would require more approval to change.
I think you are really asking, and the man page talks about
the reverse: how a list gets converted to a string. That's
when "braces and backslashes get added as necessary". The
man page has a tone rooted in the pretense that "everything
is a string" and that is all; a list is just a special class
of string.
--
Donald Arseneau as...@triumf.ca
Just being curious... do you have some example code for that?
R'
Here you go - together with an example why relying on this stuff is
"dangerous", we sometimes try to "improve" it.
mig@ice:~$ tclsh8.4
% set a [list #boo foo]
#boo foo
% proc #boo x {puts $x}
% eval $a
%
mig@ice:~$ tclsh
% set a [list #boo foo]
{#boo} foo
% proc #boo x {puts $x}
% eval $a
foo
% info pa
8.5a5
In my opinion, this is an argument FOR it to be documented in the man
page. The README, etc. are not always available. And so details on how
a command in a language works should always appear in the reference
documentation for that language.
> mig@ice:~$ tclsh
> % set a [list #boo foo]
> {#boo} foo
> % proc #boo x {puts $x}
> % eval $a
> foo
> % info pa
> 8.5a5
If this change in behavior has caused someone to have to change code,
it needs to be at least documented, if not reported as a bug.
https://sourceforge.net/tracker/index.php?func=detail&aid=489537&group_id=10894&atid=110894
http://www.tcl.tk/cgi-bin/tct/tip/148
The following entry is in the 'changes' file for Tcl8.5a1:
* [TIP #148] correct [list]-quoting of the '#' character
*** POTENTIAL INCOMPATIBILITY ***
For scripts that assume a particular (buggy) string rep for lists.
The Changelog has the entry
2003-09-04 Don Porter <d...@users.sourceforge.net>
* doc/SplitList.3: Implementation of TIP 148. Fixes [Bug 489537].
* generic/tcl.h: Updated Tcl_ConvertCountedElement() to quote
* generic/tclUtil.c: the leading "#" character of all list elements
unless the TCL_DONT_QUOTE_HASH flag is passed in.
* generic/tclDictObj.c: Updated Tcl_ConvertCountedElement() callers
* generic/tclListObj.c: to pass in the TCL_DONT_QUOTE_HASH flags
* generic/tclResult.c: when appropriate.
Where else should these things be documented?
Maybe we need to document the fact that Larry needs a telegram about
each added feature when it is added? ;-)
Donal.
Or in this case bug fix.
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
No, it is an internal detail that should not be relied on in any way by
end users, use list commands to build and access list and the exact form
of quoting is irrelevant. Same reason that individual man pages for tcl
script level commands have no details about byte codes used, or internal
object reps - these are implementation details that can change and are
(and should be) opaque to the end users (script writers). The only place
it needs to be worried about is at the maintiner of the commands that
handle it, so the documentation should be good comments and or change
logs if it is modified. (Or other maintainer used documentation).
Bruce
PS - this is just *my* opinion, I am not a member of the TCT or a
maintainer of any part of tcl - just a long time user of the language.
I'd agree if that were the case. But the details on how the [list]
command works is accurately documented. The details of how it is
*implemented* is not however. That's why it shouldn't be in the
official manuals. Implementation details should not be relied on.
[list] works by converting a "list" of words (in the parser sense)
into a valid tcl "list" (in lindex sense). That's all you need to
know. Relying on the string representation of the generated list is a
bug since there are many possible strings that may represent the same
list and the Tcl interpreter is free to choose any which may or may
not be the same representation next year.
set a "hello"
set b ";"
set c "there"
puts [lindex "hello ; there" 1]
puts [list $a $b $c]
[list] winds up quoting the ; even though the unquoted semicolon is a
valid list representation.
Well, yes, this is what I'm talking about. Saying "[list] is for making
lists and thus you don't need to know about the internals" is a bit
disingenuous when you discover that [list] is really not just quoting
list processing but 11-rules processing as well.
> Where else should these things be documented?
Good question - how best to document a language? Should every detail
be in the man pages, in a reference document of some other sort, or
should people writing in the language expect to pour over 8 years of
changelogs, hoping that the descriptions there will be worded in a way
they could recognize it?
Between the changelogs, and the wiki summary of changes (and the
appendices in the Tcl and Tk book), perhaps that is sufficient. Time,
and continued productive use of Tcl, will be the test of time.
That might be helpful - then I could type the telegram contents into
the Wiki, for the rest of the community's benefit.
but how is this a problem for users of a list?
does not [lindex $l 1] give the correct result in either case?
does not [join $l] give the same result in either case?
Bruce
Yes: Tcl is not Lisp nor Scheme (thank $god): the list and code
subgrammars are distinct (The language of code is vastly larger than
that of lists).
Indeed, "puts {};puts {}" is valid for eval and invalid as a list
(extra characters after close-brace).
Also, "puts yes\nputs yo" is valid but noncanonical as a list, and
valid as code, and its canonical list form has a different code
semantics.
I guess that's pretty normal for different automata to parse different
sub-languages and yield different semantics.
So what ?
-Alex
It's a problem for the user of the list if they pass it to a non-list
routine that relies on the string representation for accurate
processing. Like, say, [eval].
Since the "appropriate" way of quoting arguments for [eval] is to use
[list], and [eval] isn't a list routine, it's useful to know how [list]
quotes its arguments.
Now, if [eval] was defined to take a string, or if [eval]'s relationship
to [list] was documented, that would be sufficient.
But to have [list] start quoting words that start with a # to be
considered a "bug fix", that means something other than "the list
operators" rely on the quoting that [list] does.
If there was a separate operator (say, [quote]) that quoted arguments
appropriately for [eval], I'd be 100% in agreement saying that [list]
shouldn't be documented.
[eval] *is* defined to take a string. Specifically a script. And a
what is a script is well defined and the definition is not the same as
the definition of a list. What is a valid "script" is defined in man
tcl a.k.a the dodekalogue.
Now, it just so happens that list quoting conforms to exactly 10 of
the 12 rules of Tcl. Specifically it doesn't honor rule 1 (; is not a
list seperator but \n is, space and tab are list seperators but
invalid command seperator) and it also appears that while Tcl8.4
honors rule 9 (comments) Tcl8.5 doesn't which in my opinion is correct
for 8.5 and a bug for 8.4 (A command that starts with # is perfectly
valid in tcl but is very hard to call, making [list] quote the command
correctly should be considered correct behavior).
Because of this, seasoned tclers often recommend to [list] quote
arguments to [eval] and -command. But it is always with the caveat
that only a single command may be [list] quoted. If your operation
requires you to invoke multiple commands then the recommendation is
always to wrap them up in a proc.
> But to have [list] start quoting words that start with a # to be
> considered a "bug fix", that means something other than "the list
> operators" rely on the quoting that [list] does.
>
> If there was a separate operator (say, [quote]) that quoted arguments
> appropriately for [eval], I'd be 100% in agreement saying that [list]
> shouldn't be documented.
>
In fact there are numerous ways to do this as seen in the recent
thread:
http://groups.google.com/group/comp.lang.tcl/browse_frm/thread/8450625bbb478250/#
My personal favorite is to use [string map] but perhaps a more
"regular" method is to use format:
eval [format {
set x [doSomething %s %s]
thenDoSomethingElse $x %s
} [list [getValue]] [list $someValue] [list $anotherValue]]
Note that we're still using list quoting to protect word boundries but
this should be OK since for this purpose we only need to conform to
the four word boundry rules which are the same for the dodekalogue and
[list].
The code subgrammar is basically that of lists, interspersed with
human-friendly sugar like the end-of-line character, semicolons,
backslash-at-end-of-line, and comments. In particular, any single list
is valid for eval, with [lindex $l 0] being used as the command name
and [lrange $l 1 end] as args. So [list] *is* the proper, unequivocal
quoting tool for eval.
> or if [eval]'s relationship
> to [list] was documented, that would be sufficient.
Yup. It is.
-Alex
More specifically, the grammar of a single command of tcl (note the
wording "single command" not "single line") is generally the same as
the description of lists.
At the Tcl level, that's true. Or it's *supposed* to be.
If a script can tell two values apart when the values have
identical string representations, that is a bug. The contract
offered by the [list] command is that its result is a string
that:
(a) is syntactically correct as a list.
(b) when interepreted as a list, contains elements that
are precisely the arguments proffered to [list].
(c) is also syntactically correct as a single Tcl command
if [list] had more than one argument, and is the
empty script otherwise.
(d) when interpreted as a Tcl command, consists of words
that are precisely the arguments to [list].
The precise nature of that string is left unspecified
(intentionally, but I'd be willing to entertain a proposal
to expand the specification). Thus, for
list a {b c} {d {e f}}
the command could generate
a {b c} {d {e f}}
but would be equally correct (if less convenient) if it
generated
a b\ c d\ \{e\ f\}
or any of several other inconvenient representations.
If you parse lists generated by list only by feeding them to
the list-oriented commands such as [lindex] and [foreach],
or by [eval]ing them as commands, you'll never see the difference.
We've left the details undocumented, in general, because we
sometimes encounter an opportunity to improve them.
--
73 de ke9tv/2, Kevin