TCL question

JSLE...@stanfordmed.org

unread,

Aug 6, 2007, 6:50:12 PM8/6/07

to

I am trying to split a string into lines. The string contains
embedded
> new lines and I just want to break it into a list of separate lines,
> using the "split' command - page 45 in "Exploring Expect". But it actually splits the string
> into separate words, apparently using the space " " character instead
> of the "\n". I also tried the "\r" but the effect was no different.
>
> The command I used was:
> split $zzz "\n"
>
> Thank you,
>
> > John.

Bryan Oakley

unread,

Aug 6, 2007, 7:03:34 PM8/6/07

to

What you describe sounds unusual. If you have a string in "zzz" that is
a bunch of characters split by newlines, [split $zzz "\n"] _will_ split
on newlines. To wit:

tclsh % set zzz "this is\nthree lines\nof text"
this is
three lines
of text
tclsh % split $zzz "\n"
{this is} {three lines} {of text}

One explanation would be that you've loaded up your own version of a
command named "split" which differs from the standard Tcl "split"
command. Could that be possible?

can you show us an example of your data and the actual output you are
seeing? Also, tell us what you get if you do "info body split". You
should expect to see the error ""split" isn't a procedure".

--
Bryan Oakley
http://www.tclscripting.com

EKB

unread,

Aug 7, 2007, 7:02:32 AM8/7/07

to

On Aug 6, 7:03 pm, Bryan Oakley <oak...@bardo.clearlight.com> wrote:

Also, just a thought, but is it possible that if you take a file, from
say, a Unix environment to a Windows environment without changing the
newline character, that Tcl would fail to recognize it?

Mark Janssen

unread,

Aug 7, 2007, 7:17:16 AM8/7/07

to

Not if you don't touch the translation mode on the file handle using
fconfigure. In that case it will be 'auto' and Tcl will recognize line
separators in all varieties (i.e. win, mac and nix)

Mark

Larry W. Virden

unread,

Aug 7, 2007, 10:56:36 AM8/7/07

to

How did the values get into $zzz? How do you know that the string
contains embedded newlines? How do you know that you are getting words
split as space?

The reason I ask is that, for me, when I think I know what a problem
is, and head off to solve it, I discover, eventually, that the problem
was something else altogether.

For instance, sometimes I think lines are going to be separated by
newlines, when in fact, the way I got the lines in converted the
newlines into spaces.

However, in that case, the result would not be a string split at
spaces.

Try this:
% set zzz "abc\nxyz\n123\n"
abc
xyz
123

% string length $zzz
12
% set l [split $zzz "\n"]
abc xyz 123 {}
% llength $l
4

This is what you should see. If you do not see this, then there's
something not Tcl in your application. If this is what you see,
however, then try the string length, split, and llength with your
version of zzz, and then tell us what results you are getting...

JSLE...@stanfordmed.org

unread,

Aug 7, 2007, 2:59:22 PM8/7/07

to

> > > John.- Hide quoted text -
>
> - Show quoted text -

Thanks for the info so far! More details as follows.
I captured several lines of output in expect_out(buffer) - said "set
zzz $expect_out(buffer)". I did "puts $dbug "$expect_out(buffer)""
where dbug was a file to capture debug output. It was several lines as
expected.
Trying to split a bunch of lines into a list seems like a very common,
ordinary task. But is there some better way? Maybe an array?
The basic problem is that I get a number of lines output in response
to a command, and I need to select one of them. They each start with a
number. I need to select the right line and echo back the number to
the program.
thanks, JS.

Bryan Oakley

unread,

Aug 7, 2007, 4:26:00 PM8/7/07

to

JSLE...@stanfordmed.org wrote:
> On Aug 6, 3:50 pm, JSLEE...@stanfordmed.org wrote:
>> I am trying to split a string into lines. The string contains
>> embedded
>>
>>
>>
>>> new lines and I just want to break it into a list of separate lines,
>>> using the "split' command - page 45 in "Exploring Expect". But it actually splits the string
>>> into separate words, apparently using the space " " character instead
>>> of the "\n". I also tried the "\r" but the effect was no different.
>>> The command I used was:
>>> split $zzz "\n"
>>> Thank you,
>>>> John.- Hide quoted text -
>> - Show quoted text -
>
> Thanks for the info so far! More details as follows.
> I captured several lines of output in expect_out(buffer) - said "set
> zzz $expect_out(buffer)". I did "puts $dbug "$expect_out(buffer)""
> where dbug was a file to capture debug output. It was several lines as
> expected.
> Trying to split a bunch of lines into a list seems like a very common,
> ordinary task. But is there some better way? Maybe an array?

Yes, it is a very common, ordinary task. No, there is not a better way.
[split $zzz \n] is exactly the tool that you should use. It absolutely
does what it is documented to do.

You said the command you used was:

split $zzz "\n"

Are you aware that split does not change zzz, but returns the results as
a list? If you truly ran the above command exactly as printed, that is
the problem. You need to do this:

set list [split $zzz \n]
puts "zzz as a list is $list"

sleb...@gmail.com

unread,

Aug 7, 2007, 10:48:56 PM8/7/07

to

Indeed, the default -translation auto setting of file channels is a
little gem that I treasure. Not only does it automatically recognise
and convert newlines to tcl's internal "\n" but it recognizes all of
them all the time. So you can even take a mixed newline file like:

unix\n
mac\r
windows\r\n
end

and tcl will still recognise it as four lines. And simply writing out
the file will fix the newline error to the current platform's newline.

On the other hand, forgetting to set -translation binary is how I
usually corrupt binary files (realizing too late only after I
overwrite the original ;-)

pn8830

unread,

Aug 8, 2007, 9:02:27 AM8/8/07

to

On Aug 7, 2:59 pm, JSLEE...@stanfordmed.org wrote:
> Thanks for the info so far! More details as follows.
> I captured several lines of output in expect_out(buffer) - said "set
> zzz $expect_out(buffer)". I did "puts $dbug "$expect_out(buffer)""
> where dbug was a file to capture debug output. It was several lines as
> expected.
> Trying to split a bunch of lines into a list seems like a very common,
> ordinary task. But is there some better way? Maybe an array?
> The basic problem is that I get a number of lines output in response
> to a command, and I need to select one of them. They each start with a
> number. I need to select the right line and echo back the number to
> the program.
> thanks, JS.

I have noticed that Expect always returns lines terminated by \r. It
took me some time to split Expect output on new lines. Here is how I
did it:

proc expToList i {
regsub -all \r $i {} i
regsub ^\n $i {} i
set i [split $i \n]
return $i
}

sleb...@gmail.com

unread,

Aug 8, 2007, 10:28:47 AM8/8/07

to

Isn't that doing it the hard way? Split already does a good job:

# split on either \r or \n:
set i [split $i "\r\n"]

Or do you mean that the lines terminate in CRLF (\r\n)? In which case
you can use the venerable [string map] to normalise newlines:

set i [split [string map [list "\r\n" "\n"] $i] \n]

but IMHO this really isn't necessary. [split $i \r\n] will simply
generate extra empty lines which can easily be skipped.

pn8830

unread,

Aug 8, 2007, 11:09:52 AM8/8/07

to

On Aug 8, 10:28 am, "slebet...@yahoo.com" <slebet...@gmail.com> wrote:
> On Aug 8, 9:02 pm, pn8830 <pnovozhi...@gmail.com> wrote:
> > I have noticed that Expect always returns lines terminated by \r. It
> > took me some time to split Expect output on new lines. Here is how I
> > did it:
>
> > proc expToList i {
> > regsub -all \r $i {} i
> > regsub ^\n $i {} i
> > set i [split $i \n]
> > return $i
>
> > }
>
> Isn't that doing it the hard way? Split already does a good job:
>
> # split on either \r or \n:
> set i [split $i "\r\n"]

Yeah, probably yes but if you split on "\r\n" you will be getting too
many empty elements in the resulting list as you mentioned. In my case
I had empty lines too so things like:

text1\r\n
\r\n
text2\r\n

were turning into:

{text1} {} {} {} {} {text2} {}

> set i [split [string map [list "\r\n" "\n"] $i] \n]
>
> but IMHO this really isn't necessary. [split $i \r\n] will simply
> generate extra empty lines which can easily be skipped.

I will test this out. But there was another example when Expect was
sending a grep command to parse a file on the remote system which
already had \r\n. I was getting two \r characters at the end of each
line plus \n. So I used a straight forward approach - no matter what
you receive from Expect, remove all \r, split on \n

Cheers,
Pavel.

JSLE...@stanfordmed.org

unread,

Aug 8, 2007, 2:10:25 PM8/8/07

to

On Aug 6, 3:50 pm, JSLEE...@stanfordmed.org wrote:

> > > John.- Hide quoted text -
>
> - Show quoted text -

Thanks for all the very educational material! As this progresses I
found that there are also embedded backspaces (invisible initially)
which showed up in debug output using "od". If I can replace them with
blanks, I think the problem may be nearly solved. I assume that \b
represents a backspace to TCL.

spoo...@cox.net

unread,

Aug 8, 2007, 3:59:08 PM8/8/07

to

In article <1186585792....@o61g2000hsh.googlegroups.com>,

pn8830 <pnovo...@gmail.com> wrote:
>On Aug 8, 10:28 am, "slebet...@yahoo.com" <slebet...@gmail.com> wrote:

>> # split on either \r or \n:
>> set i [split $i "\r\n"]
>
>Yeah, probably yes but if you split on "\r\n" you will be getting too
>many empty elements in the resulting list as you mentioned. In my case
>I had empty lines too so things like:

Ok, here's a strange new addition to the question.... I've always split
files into a list containing lines using something like

set f [open foo.bar r]
set flist [split [read $f] \n\r]
close $f

No extra blank lines...just the file's content.

If, on the other hand, I set a variable, e.g.,

set foo "this\n\ris\n\ra test"
set flist [split $foo \n\r]

I get empty list elements where each \n\r (or \r\n) is in the string.

Why?

Later,
--jim

--
73 DE N5IAL (/4) | Peter da Silva: No, try "rm -rf /"
spoo...@cox.net | Dave Aronson: As your life flashes before
ICBM / Hurricane: | your eyes, in the unit of time known as an
30.39735N 86.60439W | ohnosecond.... (alt.sysadmin.recovery)

Cameron Laird

unread,

Aug 8, 2007, 3:39:35 PM8/8/07

to

In article <1186596625.5...@g12g2000prg.googlegroups.com>,
<JSLE...@stanfordmed.org> wrote:
.
.
.

>Thanks for all the very educational material! As this progresses I
>found that there are also embedded backspaces (invisible initially)
>which showed up in debug output using "od". If I can replace them with
>blanks, I think the problem may be nearly solved. I assume that \b
>represents a backspace to TCL.
>

It does.

\b often arises in Expect discussions, because command-line shells
tend to put all sorts of funny characters in the vicinity of the
prompt.

Donal K. Fellows

unread,

Aug 8, 2007, 5:30:34 PM8/8/07

to

spoo...@cox.net wrote:
> Ok, here's a strange new addition to the question.... I've always split
> files into a list containing lines using something like
>
> set f [open foo.bar r]
> set flist [split [read $f] \n\r]
> close $f
>
> No extra blank lines...just the file's content.

You know, you should be able to just split on \n as that's what Tcl uses
as a line separator internally (showing our UNIX origins).

> If, on the other hand, I set a variable, e.g.,
>
> set foo "this\n\ris\n\ra test"
> set flist [split $foo \n\r]
>
> I get empty list elements where each \n\r (or \r\n) is in the string.
>
> Why?

Firstly, this isn't something you'd get out of [read] under normal
circumstances (see [fconfigure]'s -translation option for why).

Secondly, the second argument to [split] isn't really a string, but
rather a set of characters, all of which are to be split upon equally.
In that situation, when it sees a '\n\r' sequence, it splits on the
first of the two and then on the second. Between them, an empty string,
so you get an empty string in the resulting list.

When I want to split on something more sophisticated, I like to use the
[regexp] command, something like this:

set flist [regexp -all -inline {[^\r\n]+} $foo]

Donal.

Bryan Oakley

unread,

Aug 8, 2007, 5:32:45 PM8/8/07

to

spoo...@cox.net wrote:
> Ok, here's a strange new addition to the question.... I've always split
> files into a list containing lines using something like
>
> set f [open foo.bar r]
> set flist [split [read $f] \n\r]
> close $f
>
> No extra blank lines...just the file's content.
>
> If, on the other hand, I set a variable, e.g.,
>
> set foo "this\n\ris\n\ra test"
> set flist [split $foo \n\r]
>
> I get empty list elements where each \n\r (or \r\n) is in the string.
>
> Why?

Probably because your files have lines that end in a simple \n. I'm not
sure there's any better explanation.

Bruce Hartweg

unread,

Aug 8, 2007, 5:33:29 PM8/8/07

to

spoo...@cox.net wrote:
> In article <1186585792....@o61g2000hsh.googlegroups.com>,
> pn8830 <pnovo...@gmail.com> wrote:
>> On Aug 8, 10:28 am, "slebet...@yahoo.com" <slebet...@gmail.com> wrote:
>
>>> # split on either \r or \n:
>>> set i [split $i "\r\n"]
>> Yeah, probably yes but if you split on "\r\n" you will be getting too
>> many empty elements in the resulting list as you mentioned. In my case
>> I had empty lines too so things like:
>
> Ok, here's a strange new addition to the question.... I've always split
> files into a list containing lines using something like
>
> set f [open foo.bar r]
> set flist [split [read $f] \n\r]
> close $f
>
> No extra blank lines...just the file's content.
>
> If, on the other hand, I set a variable, e.g.,
>
> set foo "this\n\ris\n\ra test"
> set flist [split $foo \n\r]
>
> I get empty list elements where each \n\r (or \r\n) is in the string.
>
> Why?
>

because Tcl is nice to you ;)
when reading from a file, the IO in the Tcl core by default translates
all the different external line endings ("\n" "\r" or "\n\r") into a
single \n internally. And if you write a file all the internal line endings
of \n are written out in the correct platform specific line ending.

so in general, if you are working on Data read a [split $data \n] is
sufficient.

bruce

Message has been deleted

sleb...@gmail.com

unread,

Aug 8, 2007, 9:57:01 PM8/8/07

to

On Aug 8, 11:09 pm, pn8830 <pnovozhi...@gmail.com> wrote:
> On Aug 8, 10:28 am, "slebet...@yahoo.com" <slebet...@gmail.com> wrote:
>
>
>
> > On Aug 8, 9:02 pm, pn8830 <pnovozhi...@gmail.com> wrote:
> > > I have noticed that Expect always returns lines terminated by \r. It
> > > took me some time to split Expect output on new lines. Here is how I
> > > did it:
>
> > > proc expToList i {
> > > regsub -all \r $i {} i
> > > regsub ^\n $i {} i
> > > set i [split $i \n]
> > > return $i
>
> > > }
>
> > Isn't that doing it the hard way? Split already does a good job:
>
> > # split on either \r or \n:
> > set i [split $i "\r\n"]
>
> Yeah, probably yes but if you split on "\r\n" you will be getting too
> many empty elements in the resulting list as you mentioned.

Why is too many empty elements an issue at all?

> In my case I had empty lines too so things like:
>
> text1\r\n
> \r\n
> text2\r\n
>
> were turning into:
>
> {text1} {} {} {} {} {text2} {}

You're already getting too many empty elements using your convoluted
method anyway and need to skip them. Sikpping 1 empty element or 10
makes little difference. Unless of course, empty lines mean something.
If you're talking about a file I'd agree but you're talking about
unperdictable program output gathered from Expect, in which case empty
lines should not have meaning.

Jim

unread,

Aug 8, 2007, 10:31:20 PM8/8/07

to

In article <_Bqui.42230$7c....@fe2.news.blueyonder.co.uk>,
Donal K. Fellows <donal.k...@manchester.ac.uk> wrote:
>spoo...@cox.net wrote:

>> set flist [split [read $f] \n\r]

>You know, you should be able to just split on \n as that's what Tcl uses

>as a line separator internally (showing our UNIX origins).

Yeah.... :-) I actually didn't know that Tcl handled \r on its own
just by using \n until I read it in this thread today ... I'd always
assumed that 'dog and 'doze files would need special handling since
they're different from the norm (that, of course, being your favorite
Unix variant <grin> ... mine being pretty much any BSD variant...from
there, I'm not as picky...BSD on the old DEC VAXes---can't remember
what it was called, SunOS 4.x, my old Linux machine---custom build from
bits and pieces before there even were any Linux distributions, and
now FreeBSD...that's been the progression for me to date).

>Firstly, this isn't something you'd get out of [read] under normal
>circumstances (see [fconfigure]'s -translation option for why).

No need...the word "translation" says enough. :-) Should have
considered the insanely obvious....

>Secondly, the second argument to [split] isn't really a string, but
>rather a set of characters, all of which are to be split upon equally.

Now THAT I knew---I'd made another bad assumption about \n\r which
more or less CANXd my first assumption. Don't ask...I'm too busy
feeling stupid right now to answer. :-)

Thanks....

Later,
--jim

--
73 DE N5IAL (/4) | DMR: So fsck was originally called
spoo...@cox.net | something else.
< Running FreeBSD 6.1 > | Q: What was it called?
ICBM / Hurricane: | DMR: Well, the second letter was different.
30.39735N 86.60439W | -- Dennis M. Ritchie, Usenix, June 1998.

Pavel Novozhilov

unread,

Aug 9, 2007, 9:38:26 AM8/9/07

to

Hi,
What do you mean by skipping empty lines? Does it mean that anywhere I
want to use the resulting list I should implement a check if element
is empty like if {$listElement != ""} ? I consider that a little
inconvenient since I have to care about it each time. I think it's
easier to get rid of empty elements once and for all.

The other question is does it matter from the performance point of
view?

The other thing I learned about Expect is the more your script can
predict the easier is your life. You're right that sometimes Expect
returns junk just because the remote system responded not as expected.
So in many cases I had to analyze expect_out(buffer) to get data
needed for the next step and instead of trying to skip empty lines
with _if_ I thought it's easier to create a proc that would take care
of it.

Thanks,
Pavel.

Cameron Laird

unread,

Aug 9, 2007, 9:20:24 AM8/9/07

to

In article <1186585792....@o61g2000hsh.googlegroups.com>,
pn8830 <pnovo...@gmail.com> wrote:

.
.

.
>I will test this out. But there was another example when Expect was
>sending a grep command to parse a file on the remote system which
>already had \r\n. I was getting two \r characters at the end of each
>line plus \n. So I used a straight forward approach - no matter what
>you receive from Expect, remove all \r, split on \n

.
.
.
Pavel, <URL: http://wiki.tcl.tk/2958 > might interest you.