How can I avoid evaluation in "lindex"

Alexandru

unread,

Apr 4, 2019, 11:16:34 AM4/4/19

to

Hi,

I have following line of code (simplified for better understanding):

set elem {"Geh\X2\00E4\X0\use" ""}
lindex $elem 0

The result of last command is
GehX2 E4X0use

but some of the characters in the middle are not UTF-8 and I cannot copy the true result to this forum.

The problem is that using index evaluates the result.

I guess I need to use something like "split" but then I have a problem if the list elements also contain empty spaces.

How can I convert the "elem" value to a list without evaluation of the content?

Many thanks.
Alexandru

Rich

unread,

Apr 4, 2019, 11:39:28 AM4/4/19

to

Alexandru <alexandr...@meshparts.de> wrote:
> Hi,
>
> I have following line of code (simplified for better understanding):
>
> set elem {"Geh\X2\00E4\X0\use" ""}
> lindex $elem 0
>
> The result of last command is
> GehX2 E4X0use
>
> but some of the characters in the middle are not UTF-8 and I cannot
> copy the true result to this forum.
>
> The problem is that using index evaluates the result.

It does not "evaluate" (where "evaluate" means interpret as Tcl code).
But because you passed a string, Tcl has to parse the string into a
list.

> I guess I need to use something like "split" but then I have a
> problem if the list elements also contain empty spaces.
>
> How can I convert the "elem" value to a list without evaluation of
> the content?

Your example is not "evaluating" the content. It is attempting to
parse a string into a list (because you started with a string). If you
want a list, use the [list] command to build the list, then you won't
reparse a string into a list:

$ rlwrap tclsh
% set elem [list "Geh\X2\00E4\X0\use" ""]
GehX2E4X0use {}
% lindex $elem 0
GehX2E4X0use
%

Mixing up strings and lists is a common error to make, and it often
results in weird, input data dependent errors. But if you only use
list commands to create and manipulate lists, you'll avoid those weird
input data dependent re-parsing situations.

Andreas Leitgeb

unread,

Apr 4, 2019, 12:31:21 PM4/4/19

to

Alexandru <alexandr...@meshparts.de> wrote:
> I have following line of code (simplified for better understanding):
> set elem {"Geh\X2\00E4\X0\use" ""}
> lindex $elem 0

$elem still has the backslashes in it, because they were protected by
braces.

When you treat it as a list, afterwards, it does some (but not all)
of the substitutions defined in Tcl.n

The rules it applies are those for nested braced strings and "-quoted
strings, of whose only the latter allows for backslash-substs to happen,
and neither of these do $ or [] subst when list-parsing.

If you need the backslashes preserved in the element, change the list
literal that way (and pray that backslashes will never occur at end
of an item) : set elem {{Geh\X2\00E4\X0\use} ""}

> How can I convert the "elem" value to a list without evaluation of the content?

In case you get the value of $elem from an external source, then I'm
afraid it's not going to be generally easy. It might boil down to you
having to write your own parser.
Maybe you're lucky, though, and duplicating all nested backslashes
with a [string map {\\ \\\\} $elem] already solves it for you.
That's not a general solution, e.g. if you have mixed style quoting
in the list literal, like: {"\{" {\{}} that would break this simple
approach.

Alexandru

unread,

Apr 4, 2019, 12:35:58 PM4/4/19

to

I read the string from a file just like it is. Yes, I want to convert the string to a list but any atempts failed.

Alexandru

unread,

Apr 4, 2019, 12:39:58 PM4/4/19

to

Yes, I read the string from a file. That's my problem, I need to convert the string to a list. I tried the CSV package. For some reason, I don't understand, this package packs the string into a double list which is not compatible to the rest of the code. But I'm unsure if it is save to take lindex 0 from that list. Better would be a solution without the CSV package...

Rich

unread,

Apr 4, 2019, 1:14:31 PM4/4/19

to

Alexandru <alexandr...@meshparts.de> wrote:
> Am Donnerstag, 4. April 2019 17:39:28 UTC+2 schrieb Rich:
>> Alexandru <alexandr...@meshparts.de> wrote:
>> > Hi,
>> >
>> > I have following line of code (simplified for better understanding):
>> >
>> > set elem {"Geh\X2\00E4\X0\use" ""}
>> > lindex $elem 0
>> >
>> > The result of last command is
>> > GehX2 E4X0use
>> >
>> > but some of the characters in the middle are not UTF-8 and I cannot
>> > copy the true result to this forum.
>> >
>> > The problem is that using index evaluates the result.
>

> I read the string from a file just like it is.

A detail that should have been in the initial posting.

Did you read the file in utf-8 input mode?

> Yes, I want to convert the string to a list but any atempts failed.

If you have a string, with a delimiter character, and you want a list,
broken on the delimiter character, then 'split' is the command to use.

If you don't have a delimiter character, but want to break it some
other way, you'll have to provide us more detail of what you really
want to occur.

Alexandru

unread,

Apr 4, 2019, 1:29:23 PM4/4/19

to

Am Donnerstag, 4. April 2019 19:14:31 UTC+2 schrieb Rich:
> Alexandru <alexandr...@meshparts.de> wrote:
> > Am Donnerstag, 4. April 2019 17:39:28 UTC+2 schrieb Rich:
> >> Alexandru <alexandr...@meshparts.de> wrote:
> >> > Hi,
> >> >
> >> > I have following line of code (simplified for better understanding):
> >> >
> >> > set elem {"Geh\X2\00E4\X0\use" ""}
> >> > lindex $elem 0
> >> >
> >> > The result of last command is
> >> > GehX2 E4X0use
> >> >
> >> > but some of the characters in the middle are not UTF-8 and I cannot
> >> > copy the true result to this forum.
> >> >
> >> > The problem is that using index evaluates the result.
> >
> > I read the string from a file just like it is.
>
> A detail that should have been in the initial posting.

Yes, sorry, missed that.

>
> Did you read the file in utf-8 input mode?

Yes I read in utf-8

>
> > Yes, I want to convert the string to a list but any atempts failed.
>
> If you have a string, with a delimiter character, and you want a list,
> broken on the delimiter character, then 'split' is the command to use.
>
> If you don't have a delimiter character, but want to break it some
> other way, you'll have to provide us more detail of what you really
> want to occur.

Well, I have a delimiter: It is comma ",".
For a moment I thought "Yes, that could be the solution". But then I realized I also have things like this:

{'',(1.0,0.0,0.0)}

BTW: The format I'm parsing is STEP (universal CAD format).

Robert Heller

unread,

Apr 4, 2019, 2:08:34 PM4/4/19

to

It sounds like you need to read up on the -encoding option to fconfigure. If
your data is that "different", you might also want to select binary for
-translation, in which case you will *have* to "properly" parse the data and
essentually process the file byte-by-byte. Basically you will need to parse
the data. If it only "text", but something beyond Latin1 UTF-8 / ASCII (as it
appears to be), setting the right value for the -encoding should make things
easier. You might still need to apply your own parsing process, but with the
proper encoding, it will be a little easier.

--
Robert Heller -- 978-544-6933
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

Rich

unread,

Apr 4, 2019, 2:40:33 PM4/4/19

to

Alexandru <alexandr...@meshparts.de> wrote:
>> If you don't have a delimiter character, but want to break it some
>> other way, you'll have to provide us more detail of what you really
>> want to occur.
>
> Well, I have a delimiter: It is comma ",".
> For a moment I thought "Yes, that could be the solution". But then I
> realized I also have things like this:
>
> {'',(1.0,0.0,0.0)}

And breaking that above like this:

$ rlwrap tclsh
% set string {{'',(1.0,0.0,0.0)}}
{'',(1.0,0.0,0.0)}
% set list [split $string ,]
\{'' (1.0 0.0 0.0)\}
% foreach item $list { puts $item }
{''
(1.0
0.0
0.0)}
%

does not work, I presume?

> BTW: The format I'm parsing is STEP (universal CAD format).

In that case, you'd likely be much more successful creating an actual
parser for the portions of the format you want to consume. The parser
tools modules in Tcllib can help with that.
https://core.tcl.tk/tcllib/doc/trunk/embedded/md/tcllib/files/modules/page/page_intro.md

Robert Heller

unread,

Apr 4, 2019, 3:56:11 PM4/4/19

to

At Thu, 4 Apr 2019 18:40:31 -0000 (UTC) Rich <ri...@example.invalid> wrote:

>
> Alexandru <alexandr...@meshparts.de> wrote:
> >> If you don't have a delimiter character, but want to break it some
> >> other way, you'll have to provide us more detail of what you really
> >> want to occur.
> >
> > Well, I have a delimiter: It is comma ",".
> > For a moment I thought "Yes, that could be the solution". But then I
> > realized I also have things like this:
> >
> > {'',(1.0,0.0,0.0)}
>
> And breaking that above like this:
>
> $ rlwrap tclsh
> % set string {{'',(1.0,0.0,0.0)}}
> {'',(1.0,0.0,0.0)}
> % set list [split $string ,]
> \{'' (1.0 0.0 0.0)\}
> % foreach item $list { puts $item }
> {''
> (1.0
> 0.0
> 0.0)}
> %
>
> does not work, I presume?

I'm guessing the OP wants {{'',(1.0,0.0,0.0)}} to be converted to the
effective equivalent of [list {''} {(1.0,0.0,0.0)}]

>
> > BTW: The format I'm parsing is STEP (universal CAD format).
>
> In that case, you'd likely be much more successful creating an actual
> parser for the portions of the format you want to consume. The parser
> tools modules in Tcllib can help with that.
> https://core.tcl.tk/tcllib/doc/trunk/embedded/md/tcllib/files/modules/page/page_intro.md
>

Donal K. Fellows

unread,

Apr 4, 2019, 7:37:47 PM4/4/19

to

On 04/04/2019 16:16, Alexandru wrote:
> I guess I need to use something like "split" but then I have a
> problem if the list elements also contain empty spaces.

In that case, you need to write a parser that actually gets the list of
content words that are there. Often, but not always, this is done with a
bit of [regexp], possibly with [lmap] to filter the result:

set elem {"Geh\X2\00E4\X0\use" ""}

set elems [lmap {quoted value} \
[regexp -inline -all {"([^""]*)"} $elem] {set value}]

That produces the list “{Geh\X2\00E4\X0\use} {}” (excluding quote
marks), in which the first element is what I believe to be the first
word in your input, and the second element is the empty string.

I've not necessarily got it right! But further decoding (which would be
nice to put inside the [lmap] body) depends on knowing more about the
format than I actually do. Perhaps this does it?

set elems [lmap {- value} [regexp -inline -all {"([^""]*)"} $elem] {
subst -nocommands -novariables [
regsub -all {\\X2\\(\w{4})\\X0\\} $value {\\u\1}]
}]

Decoding this sort of thing can be a total black art, and doing it well
requires knowing what is really going on with quoting rules. (In this
case, Tcl 8.7's [regsub -command] would be very helpful.)

Donal.
--
Donal Fellows — Tcl user, Tcl maintainer, TIP editor.

Alexandru

unread,

Apr 4, 2019, 7:46:51 PM4/4/19

to

Thanks guys for the help. I think that totally makes it clear that I need true parsing of the content. Until now simple Tcl tricks did the job but if I realy want 100% fail proof code, then much more parsing effort will be needed.

Emiliano

unread,

Apr 16, 2019, 8:09:36 PM4/16/19

to

TIP 407 about string representation of lists is worth reading. The problem that started this thread is explained there.

https://core.tcl.tk/tips/doc/trunk/tip/407.md

Regards
Emiliano