IO.popen usage

Robert Citek

unread,

Nov 5, 2009, 12:55:00 PM11/5/09

to stl...@googlegroups.com

Hello all,

I'm trying to use IO.popen to send data to and read data from an
external process. I've create a sample case in the shell like this:

$ { echo hello ; sleep 2 ; echo world; } | cat
hello
world

I've written the same in ruby like so, which works:

$ cat foo.rb
#!/usr/bin/env ruby
if $0 == __FILE__
cat = IO.popen("cat", "w+") ;
cat.puts("hello, ") ;
puts(cat.gets) ;
sleep 2 ;
cat.puts("world") ;
puts(cat.gets) ;
end

$ ./foo.rb
hello
world

However, if I change the cat command to a sed command, the ruby
version no longer works. The command-line equivalent does work, but
the ruby version waits forever and has to be interrupted:

$ { echo hello ; sleep 2 ; echo world; } | sed -ne p
hello
world

$ cat foo.rb
#!/usr/bin/env ruby
if $0 == __FILE__
cat = IO.popen("sed -ne p", "w+") ;
cat.puts("hello, ") ;
puts(cat.gets) ;
sleep 2 ;
cat.puts("world") ;
puts(cat.gets) ;
end

$ ./foo.rb
./foo.rb:6:in `gets': Interrupt
from ./foo.rb:6

Why does ruby work in the first case but wait forever in the second?

Regards,
- Robert

Craig Buchek

unread,

Nov 5, 2009, 4:47:29 PM11/5/09

to Saint Louis Ruby Users Group

It took me a long time to find the answer, but you have to close the
write end of the pipe before you can read from it. This is alluded to
in the docs for IO#pipe, but not very clear. I found the answer at
http://www.rubycentral.com/pickaxe/tut_threads.html (search for the
popen example). The odd thing is that 'cat -n' exhibits the same
hanging behavior, but 'cat' does not. It seems to have something to do
with buffering, but calling IO#sync=true on the object didn't help.

Anyway, the following works. I'm not sure there's any simple way to
process the stream in real-time though. I also converted it to more
idiomatic Ruby, which is much shorter.

#!/usr/bin/env ruby
IO.popen("sed -ne p", "w+") do |cat|
cat.puts("hello, ")

sleep 2
cat.puts("world")

cat.close_write
puts(cat.gets)
puts(cat.gets)
end

Cheers,
Craig

Robert Citek

unread,

Nov 6, 2009, 10:25:06 AM11/6/09

to stl...@googlegroups.com

On Thu, Nov 5, 2009 at 4:47 PM, Craig Buchek <craig....@gmail.com> wrote:
> It took me a long time to find the answer, but you have to close the
> write end of the pipe before you can read from it.

That's unfortunate. What if I don't want to close the pipe? That is, I would
like to keep the pipe open so that I can send some data, read some
data and work on it, send some more data, read some more data and work
on it, etc. much like the process was a service, e.g. database. I am
trying to code the equivalent of a Call and Response. My examples
using cat and sed are just stand-ins for the real program.

> ...but calling IO#sync=true on the object didn't help.

Tried that, also, and noticed it didn't help.

> Anyway, the following works. I'm not sure there's any simple way to
> process the stream in real-time though. I also converted it to more
> idiomatic Ruby, which is much shorter.

And that does work, sort of. The output "hello" and "world" do
appear, but only after the 2 second pause. I also tried 'cat -n' at
the shell and it also pauses before the output:

$ { echo hello ; sleep 2 ; echo world; } | cat -n

Makes me wonder if buffering the output is a feature of the shell or
program or system and if that feature can be temporarily overridden.

It would be nice to do something like this:

open_pipe
set_of_data.each do |foo|
send_foo_to_pipe
flush_pipe
read_foo_output_from_pipe
process_foo_output
end
close_pipe

Of course, it's entirely possible that IO.popen is not the "right" way
to tackle this and I have not discovered the Ruby way, yet.

Again, any pointers in the right direction are greatly appreciated.

Regards,
- Robert

Craig Buchek

unread,

Nov 6, 2009, 12:33:14 PM11/6/09

to Saint Louis Ruby Users Group

> I also tried 'cat -n' at
> the shell and it also pauses before the output:
>
> $ { echo hello ; sleep 2 ; echo world; } | cat -n
>
> Makes me wonder if buffering the output is a feature of the shell or
> program or system and if that feature can be temporarily overridden.

I suspect that it may be the cat/sed program that's doing the caching,
since it works with a plain cat. I don't see how it could be ruby's
fault -- how would it know the difference between cat and cat -n?
Unless there's some bug in Ruby, like looking for a dash (which it
does) incorrectly.

Craig

Christopher M

unread,

Nov 6, 2009, 2:14:09 PM11/6/09

to stl...@googlegroups.com

On Fri, Nov 6, 2009 at 11:33 AM, Craig Buchek <craig....@gmail.com> wrote:

> I also tried 'cat -n' at
> the shell and it also pauses before the output:
>
> $ { echo hello ; sleep 2 ; echo world; } | cat -n
>
> Makes me wonder if buffering the output is a feature of the shell or
> program or system and if that feature can be temporarily overridden.

I suspect that it may be the cat/sed program that's doing the caching,
since it works with a plain cat.

This is correct, in follow mode "cat -n", "tail -f", sed will cache stdout, you can reproduce this by piping to a pager. Until you close the pipe with ^C, less will not continue following.

Robert Citek

unread,

Nov 6, 2009, 2:21:56 PM11/6/09

to stl...@googlegroups.com

Yes, it appears that the external program is controlling the buffering
and not Ruby. When I tried the same process with the program I really
wanted to use, IO.popen worked pretty much the way I wanted it to.
The pattern was this:

foo = io.popen("external_program", "w+")
while data = gets
prepare data
foo.puts(data)
while not end of record
newdata += foo.readlines
end
process newdata
end
foo.close

Turns out that the program I used has a signal to signify the end of a
chunk of data. So the program knows when I am finished sending data
and it can start crunching away. And I know when I can stop reading
data from the pipe and begin processing it. This saves the time of
repeatedly having to open and close the pipe.