Any help would be appreciated.
How about something like
proc getn {channel args} {
if {[llength $args]} {
upvar 1 [lindex $args 0] line
set count 1
} else {
set count 0
}
if {[length $args]>=2} {
set term [lindex $args 1]
} else {
set term \x00
}
set line ""
while 1 {
set character [read $channel 1]
if {[string length $character]==0} {
if {$count} {
if {[string length $line]==0} {
return -1
} else {
return [string length $line]
}
} else {
return $line
}
} elseif {$character eq $term} {
if {$count} {
return [string length $line]
} else {
return $line
}
} else {
append line $character
}
}
}
--
SM Ryan http://www.rawbw.com/~wyrmwif/
No pleasure, no rapture, no exquisite sin greater than central air.
I took the proc above (with one correction) and did this:
foreach n {0 1 10 100} {
puts "line size : $n"
set s [string repeat "X" $n]
set f [open "test" w]
puts "puts : [time {puts $f $s} 100000]"
close $f
set f [open "test" r]
puts "gets : [time {gets $f line} 100000]"
close $f
set f [open "test" r]
puts "getn : [time {getn $f line "\n"} 100000]"
close $f
}
and here is the output on my machine:
line size : 0
puts : 2 microseconds per iteration
gets : 3 microseconds per iteration
getn : 6 microseconds per iteration
line size : 1
puts : 2 microseconds per iteration
gets : 3 microseconds per iteration
getn : 8 microseconds per iteration
line size : 10
puts : 2 microseconds per iteration
gets : 3 microseconds per iteration
getn : 24 microseconds per iteration
line size : 100
puts : 3 microseconds per iteration
gets : 4 microseconds per iteration
getn : 175 microseconds per iteration
getn looks like about O(n) compared to about O(1) for gets (for these
line sizes). Too slow.
Maybe there is some kind of extension to help out? Or some other trick
I don't know about?
If I were to try to attack this (I haven't), I'd start by using "read"
to get a block of text in from the I/O channel into a buffer. Then I'd
use split to split the buffered text on \0, then rebuild the pieces
from the blocks I read in and send along strings as delimited by \0.
I'd wrap all that logic up into a proc, so I wouldn't have to think
about it anymore, something like "gets0".
I hope that helps, and isn't too vague. It also may not be preferable
to string map.
> Specifically, I'm wanting to use the null character (from a pipe) in a
> protocol to delimt records.
Something like this may work for you--if you use \0 to delimit a record
(and allow \n to mean newlines within any given record). Then you can use
[fconfigure channel -eofchar \0] to set the null character to be the eof.
Then you can read in an entire record at once, clear the EOF flag, and
read another one. (Assuming your protocol never sends completely blank
records you'll know you were really at the EOF on your pipe when a read
returns 0 bytes.)
Here is an example. Two separate Tcl scripts involved, "client.tcl" and
"pipe.tcl":
##### contents of "pipe.tcl"
#!/bin/sh
#\
exec tclsh "$0" ${1+"$@"}
fconfigure stdout -translation binary -buffering none
for {set i 1} {$i <= 3} {incr i} {
puts -nonewline [format "Record %d\nData %d\nMore Data%d%s" \
$i $i $i \0]
}
##### contents of "client.tcl"
#!/bin/sh
#\
exec tclsh "$0" ${1+"$@"}
set fp [open "|./pipe.tcl"]
fconfigure $fp -eofchar \0 -buffering full
proc resetEOF {fp} {
# Changing the eof character to something different resets
# the EOF flag on the channel
#
# Note: if this were a regular file that we [seek] in then
# a mere [seek $fp 1 current] would clear the EOF flag and
# move us beyond the \0, but we can't [seek] on a pipe ...
fconfigure $fp -eofchar {}
# we read the null byte (to get past it) then reset \0 to
# be the eofchar
read $fp 1
fconfigure $fp -eofchar \0
}
set counter 0
while 1 {
set record [read $fp]
if {[string length $record] == 0} then break else {
# process the record ...
puts "Read #[incr counter] returned:\n$record\n"
resetEOF $fp
}
}
Michael
This is exactly the kind of thing I'm looking for - but it doesn't seem
to work for me. I ran the above and got:
Read #1 returned:
Record 1
Data 1
More Data1
It only got the first record. It didn't seem able to reset the eof
status. I also tried the above client.tcl with fp set to stdin and
using some arbitrary eof character like "~". After it reached the
first "end of file", resetEOF only allowed reading what was left in the
input buffer.
This seems really close. Just need to completely reset the eof status,
so that it can keep reading.
I get:
Read #1 returned:
Record 1
Data 1
More Data1
Read #2 returned:
Record 2
Data 2
More Data2
Read #3 returned:
Record 3
Data 3
More Data3
I'm running 8.4 on OS X. What platform are you on?
Michael
# getn looks like about O(n) compared to about O(1) for gets (for these
# line sizes). Too slow.
Sometimes you can speed up append with something like
proc K varname {
upvar 1 $varname var
set result $var
set var ""
}
...
set string [append [K string] character]
...
--
SM Ryan http://www.rawbw.com/~wyrmwif/
I love the smell of commerce in the morning.
> proc K varname {
> upvar 1 $varname var
> set result $var
> set var ""
> }
This name is misleading - K is of course the basic functional
combinator defined as
proc K {a b} {set a}
I'd rather call the above "destructive-read" or so :^)
While we don't support arbitrary record separator characters, you can
instead use the -eofchar option to get the same effect:
fconfigure $pipe -eofchar \u0000
while {1} {
# Read up to the next eof char, as configured
set record [read $pipe]
if {[string length $record]} {
# process $record here
} else {
break
}
# skip over the eof char
seek $pipe 1 current
}
Equivalently:
for {fconfigure $pipe -eofchar \u0000} {
[string length [set record [read $pipe]]]
} {seek $pipe 1 current} {
# process $record here
}
If you don't like using [seek] to skip the char, [fconfigure] the
channel to clear the -eofchar temporarily and [read] the char instead.
proc foreachRecord {var channel separator body} {
upvar 1 $var v
while {1} {
fconfigure $channel -eofchar $separator
set v [read $channel]
uplevel 1 $body
fconfigure $channel -eofchar {}
read $channel 1
if {[eof $channel]} {
break
}
}
}
foreachRecord record $pipe \u0000 {
# process $record here
}
I've no idea which option is fastest.
(If you can, the fastest option might be to load the whole contents of
the stream into memory and then [split] on the separator character, but
that might not work with the protocol you're using.)
Donal.
Here's what I'm on:
# uname -a
Linux localhost 2.6.8.1 #10 Tue Sep 21 12:10:29 CDT 2004 i686 Intel(R)
Pentium(R) M processor 1.70GHz unknown GNU/Linux
# rpm -q tcl
tcl-8.4.5-6mdk
I was able to get the above results if instead of using -eofchar {}, I
used -eofchar <any-non-null-char>. But, I still could use stdin (which
has big pauses in the stream) with a typable -eofchar. Also, if I add
a "after 1000" in the pipe.tcl loop, I'm back to one record again. It
is almost as though hitting an eofchar makes it go into non-blocking
mode. I tried adding -blocking 1 to the fconfigure commands, but it
didn't seem to help.
Does it work for you if you add some delay to your loop in pipe.tcl?
If I don't hear a better solution, I think I'll go with using \n to be
my record separator and \0 to mean newline within the record (my
original solution). I'll just use something like this : [string map
{\0 \n} [gets $fp]]. It sounds like anything else will be slower and
likely have compatibility issues.
Sorry, I meant "could not".
But I should have thought a bit harder and noted that the [seek] version
doesn't work with pipes. D'oh! Use the other one.
Donal.
fconfigure stdout -buffering none
fconfigure stdin -buffering none -translation cr
while {1} {
puts -nonewline stdout \0[catch [gets stdin] result]$result\0
}
I also made a slight optimization to the loop above assuming that the
majority of commands won't have exceptions. This gave a little speedup
with the same functionality:
while {1} {
puts -nonewline stdout \0[catch {
while {1} {
puts -nonewline stdout \0000[eval [gets stdin]]\0
}
} result]$result\0
}
When I initially implemented the server in TCL, I put an extra \n in
the stdout pipe to make it easier/faster. I now have the server in C++
(since it needs to talk with C++ code anyways) where there is no reason
for the extra character. Although I felt like my hands were tied with
TCL, I was surprised to see that it wasn't much slower implemented in
pure TCL.
BTW, if you are wondering how the client ever terminates, it dies when
the stdout pipe is closed on the other end and puts fails. I'm
allowing this instead of the having the client check stdin and
gracefully exit upon EOF.
I've experimented with these sorts of things, and I find I prefer to
use counted strings instead. By that, I mean that I send the number of
chars in the string as a fixed-width binary value, followed by the
chars themselves. This turns out to admit a very fast implementation in
multiple languages while still allowing arbitrary binary data in the
payloads, which can be useful.
> When I initially implemented the server in TCL, I put an extra \n in
> the stdout pipe to make it easier/faster. I now have the server in C++
> (since it needs to talk with C++ code anyways) where there is no reason
> for the extra character. Although I felt like my hands were tied with
> TCL, I was surprised to see that it wasn't much slower implemented in
> pure TCL.
The bottleneck is probably the pipe handling and context switching, and
not the data marshalling on either side. Given that, Tcl will hold its
own just fine against C++.
Donal.
Yep, I was thinking of doing this initially, but decided not because
for the stdout coming from the command, I didn't know what its length
would be up front. I wanted to just let the command put what it wanted
on stdout and then terminate it. For the stdin pipe, what you suggest
would be perfectly reasonable. With a protocol of <length><eol><data>,
this would work
read stdin [gets stdin]
although it wouldn't be as space efficient as sending a fixed-width
binary value as you suggest (although probably faster in tcl).