> I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
> on 1.8.2 or better. In code with several processes each with several
> threads, I see about a 12%-17% speed boost, because of not having to use
> the polling hack.
Ruby does the polling. You may call it a hack.
--
Tanaka Akira
>> Blocking IO can occasionally cause unexpected problems.
>> For example, in some cases a blocking read *can* block even
>> though select said that the file descriptor was readable.
>> This problem may be rare (it can happen, for instance, when
>> the checksum of a piece of data fails to match the payload),
>> but the bottom line is that non-blocking IO is safer.
>
> Well, at least in recent linux 2.4 and 2.6 kernels, this
> particular problem is fixed (see the recent ruby-talk
> discussion entitled "event driven framework for ruby",
> particularly comments by Akira Tanaka and Ralf Horstmann).
Hmm, okay, then I guess I shall have to remove or annotate
that paragraph to avoid spreading FUD.
>> Perhaps most importantly, while Ruby's threads are green,
>> they are still effectively preemptively scheduled, with all
>> the implications thereof — in a word, synchronization hell.
>> By contrast, event handlers are executed in a strictly
>> sequential manner; an event loop will never run two event
>> handlers simultaneously. (Though, of course, all bets are
>> off if you run multiple event loops in separate threads.)
>
> But in some cases you *want* preemptive scheduling.
Sure. If I want preemptive scheduling, I use threads.
Sometimes, I don't even have a choice. (Ruby's GNU Readline
wrapper only supports blocking calls, for instance.)
> One handler's execution shouldn't block the others, if it
> might take a significant time to finish.
Event handlers should not take a significant amount of time
to finish. If they do, you have coded them wrong. :-)
> Take care of synchronization with concurrent data
> structures like queues, or if that isn't sufficient, lower
> level mechanisms like mutexes.
Or use a deterministic event loop and avoid the problem of
synchronization altogether.
In a callback-based system, you have to deal with callbacks.
In a preemptively multithreaded system, you have to deal
with synchronization. It's a tradeoff, and largely a matter
of taste, preference and familiarity.
You might also ask yourself, do you really *need* to have
the scheduler arbitrarily switch contexts back and forth?
Do your event handlers really take that much time to run?
If so, fine. Otherwise, why not have determinism instead?
--
Daniel Brockman <dan...@brockman.se>
From: "Tanaka Akira" <a...@m17n.org>
>
> So I think it is good to have both blocking methods and nonblocking
> methods. The nonblocking methods should make event loop style
> programs happy. However it is not accepted by matz because good names
> for nonblocking methods are not found yet. Recently I proposed
> connect_nonblock, nonblock_connect, nbconnect for nonblocking connect
> but they are rejected.
Wow - the main thing holding back progress on this front is
method names? I could embrace connect_nonblock or nbconnect,
there.
The blocking I/O issues are the thorniest problem for me
writing applications in ruby. (Of course, it's 1000 times
worse on Windows, ... where nonblocking I/O is apparently not
supported at all yet. That is just a nightmare.)
But regarding the method names - I'm wondering - are separate
methods really needed? Are there any cases where Ruby can't
just inspect the fcntl() flags of the socket, and if
O_NONBLOCK is set, provide nonblocking behavior? You mentioned
connect(), which is an instance method. Couldn't connect()
just check for O_NONBLOCK? Why would a separate method be
needed? (Sorry if this is a FAQ. :)
Regards,
Bill
> I'm about to head to the airport so I'll just say that
> I'm still experimenting. I sometimes develop a system
> with threads, because I think select() was a pain in the
> last system I did. Then I get annoyed with the new
> threaded system, and go back to select() on the next one.
>
> I keep encountering trade-offs, and I'm not sure which
> way I like best.
The trade-off should not big except programming style issue since Ruby
thread mechanism use select(). You use select() anyway, directly or
indirectly. The thread mechanism can do what IO.select can and vice
versa, in principle.
However Ruby doesn't support non-blocking I/O well, yet. It tends to
be a problem for event driven programs which use IO.select.
> Well, what if we were to add a #nonblocking= method to IO,
> or at least to Socket?
I heard sock.fcntl(Fcntl::F_SETFL, File::NONBLOCK) makes sock
nonblocking mode on Windows. However sock.fcntl(Fcntl::F_GETFL)
doesn't work.
There is io/nonblock which add IO#nonblock and IO#nonblock=. I don't
know it works on Windows.
--
Tanaka Akira
The problem is that it doesn't work exactly the way it does on Linux
and BSD, and so "direct port" software like Ruby tends to not have
particularly good support for it. I believe the ActiveState Perl and
Python distributions give you some good hooks into it, but I've never
really gone to that level with Ruby, so I won't spread any bad
information.
--Wilson.
I got the impression from that thread that it pertained only to 2.6... ?
It's really fixed in 2.4 too?
Thanks,
Bill
> First of all, you may consider the event loop API more
> pleasant than Ruby's threads and not-quite-blocking IO.
> Otherwise, don't listen to me; go on using the latter. :-)
I *love* ruby threads. Still, I wish ruby's thread scheduler would
handle more types of blocking than select can handle, such as waiting
for a file lock.
> Blocking IO can occasionally cause unexpected problems.
> For example, in some cases a blocking read *can* block even
> though select said that the file descriptor was readable.
> This problem may be rare (it can happen, for instance, when
> the checksum of a piece of data fails to match the payload),
> but the bottom line is that non-blocking IO is safer.
Well, at least in recent linux 2.4 and 2.6 kernels, this particular
problem is fixed (see the recent ruby-talk discussion entitled "event
driven framework for ruby", particularly comments by Akira Tanaka and
Ralf Horstmann).
> Perhaps most importantly, while Ruby's threads are green,
> they are still effectively preemptively scheduled, with all
> the implications thereof — in a word, synchronization hell.
> By contrast, event handlers are executed in a strictly
> sequential manner; an event loop will never run two event
> handlers simultaneously. (Though, of course, all bets are
> off if you run multiple event loops in separate threads.)
But in some cases you *want* preemptive scheduling. One handler's
execution shouldn't block the others, if it might take a significant
time to finish. Take care of synchronization with concurrent data
structures like queues, or if that isn't sufficient, lower level
mechanisms like mutexes.
--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
> Please give us a sane version number. :)
It just dawned on me that you were probably talking about
EventLoop 0.0.20050825.1600
That version number _is_ a bit ambiguous. It really should have "UTC" in
it somewhere to make clear that the 1600 is not in the poster's local
time zone. ;)
Due to the somewhat popular demand of an event loop for Ruby,
I've recently been working on packaging the one I've written
for a network application of mine (Refusde, an NMDC client).
With the help of Tilman Sauerbeck, I've now managed to put
together some documentation and a gem/tarball of what I've
decided is going to be the first publicly announced version.
Here's the canonical short package overview:
EventLoop is a simple IO::select-based main event loop
featuring IO event notification and timeout callbacks.
It comes with a signal system inspired by that of GLib.
The code is licensed under the GPL and can be found at
<http://www.brockman.se/software/ruby-event-loop/>.
At this point, some of you will probably want to see an
example of how it works --- a kind of screenshot.
For this purpose, I chose to implement a simple asynchronous
buffered IO reader:
require "event-loop"
class BufferedReader
include SignalEmitter
define_signals :line, :done
def initialize(io, eol="\n")
yield self if block_given?
io = File.new(io) if io.kind_of? String
buffer = String.new
io.on_readable do
begin
buffer << io.readpartial(1024)
while i = buffer.index(eol)
signal :line, buffer.slice!(0, i)
buffer.slice!(0, eol.size)
end
rescue EOFError
signal :done, buffer
io.close
end
end
end
end
reader = BufferedReader.new("/etc/passwd") do |r|
r.on_line { |content| puts "Line: #{content}" }
r.on_done { |leftover| puts "Done: #{leftover}" }
r.on_done { EventLoop.quit }
end
EventLoop.run
See how easy the event loop is to use, and how nicely it
blends into the rest of Ruby?
For good measure, maybe I should also attach a section of
the manual (i.e., the README file) that describes how event
loops fit into the rest of the world:
The Event Loop
==============
This section explains how IO multiplexing works in general
(albeit briefly and not very in-depth), and specifically the
issues relevant for Ruby applications. You may safely skip
it if you (a) already know this subject, or (b) don't care.
Plain ol' blocking IO works well when you're reading from
just a single file descriptor. But when you're interested
in a whole bunch of FDs, you can't wait for any single one
of them to become readable or writable, because then you'll
inevitably miss that happening to the other ones. Instead,
you need a multiplexer that can wait for them *all at once*.
There are a handful of low-level multiplexing primitives:
‘select’, ‘poll’, ‘epoll’, ‘/dev/poll’, and ‘kqueue’.
In addition, there are portable low-level wrapper libraries
such as libevent, which can use any of those primitives.
The event loop in this package uses the standard ‘select’
wrapper shipped with Ruby, ‘IO::select’. But in the future,
I'd like to use libevent instead, because that'd be cooler.
Most applications use a higher-level abstraction built on
top of the low-level multiplexer, usually called a ‘main
loop’, an ‘event loop’, or an ‘event source’. There are
also libraries such as liboop, which generalizes the event
source and event sink concepts, so that components (event
sinks) written against liboop become event-source-agnostic.
Actually, the combination of blocking IO and Ruby's green
threads works well in most cases where you would normally
use an event loop. When you call ‘IO#read’ on an empty file
descriptor, for instance, Ruby suspends that thread until
its internal event loop, known as the scheduler (currently
based on ‘select’), determines that the file descriptor has
become readable. In particular, Ruby never calls the
low-level ‘read’ function unless it knows that it will not
block (because ‘select’ said it wouldn't, but see below).
There are several reasons why you would use an event loop
such as the one implemented by this library instead of
not-so-plain ol' blocking IO with Ruby's green threads.
First of all, you may consider the event loop API more
pleasant than Ruby's threads and not-quite-blocking IO.
Otherwise, don't listen to me; go on using the latter. :-)
Blocking IO can occasionally cause unexpected problems.
For example, in some cases a blocking read *can* block even
though select said that the file descriptor was readable.
This problem may be rare (it can happen, for instance, when
the checksum of a piece of data fails to match the payload),
but the bottom line is that non-blocking IO is safer.
Perhaps most importantly, while Ruby's threads are green,
they are still effectively preemptively scheduled, with all
the implications thereof — in a word, synchronization hell.
By contrast, event handlers are executed in a strictly
sequential manner; an event loop will never run two event
handlers simultaneously. (Though, of course, all bets are
off if you run multiple event loops in separate threads.)
--
Daniel Brockman <dan...@brockman.se>
> I *love* ruby threads. Still, I wish ruby's thread scheduler would
> handle more types of blocking than select can handle, such as waiting
> for a file lock.
File#flock works well since Ruby 1.8.2. It blocks only the calling
thread. It doesn't block other threads.
% ruby-1.8.2 -ve '
f1 = open("z", "w")
f1.flock(File::LOCK_EX)
t = Thread.new {
f2 = open("z", "w")
p :f2_lock_start
f2.flock(File::LOCK_EX)
p :f2_lock_end
}
3.times {|i| p i; sleep 1 }
f1.flock(File::LOCK_UN)
t.join
'
ruby 1.8.2 (2004-12-25) [i686-linux]
:f2_lock_start
0
1
2
:f2_lock_end
--
Tanaka Akira
Oh, well as long as ruby does it, it's more efficient than me doing it,
so less of a hack.
> From: "Wilson Bilkovich" <wil...@gmail.com>
> On 8/26/05, Tanaka Akira <a...@m17n.org> wrote:
> > In article <033601c5a9fa$26ba7840$6442a8c0@musicbox>,
> > "Bill Kelly" <bi...@cts.com> writes:
> >
> > > The blocking I/O issues are the thorniest problem for me
> > > writing applications in ruby. (Of course, it's 1000 times
> > > worse on Windows, ... where nonblocking I/O is apparently not
> > > supported at all yet. That is just a nightmare.)
> >
> > I heard Windows has nonblocking I/O for sockets.
>
> Windows actually has plenty of support for nonblocking operations on
> sockets and files.
> Here's an example hit from MSDN:
> http://msdn.microsoft.com/library/en-us/ipc/base/named_pipe_type_read_and_wait_modes.asp
Thanks; I should have been more clear... I'd posted earlier
this year, in
http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/138533
about a way to put windows sockets into nonblocking mode.
What I meant by "nightmare" is that few (if any) nonblocking
operations in windows are supported in ruby.
One thing I've wondered, is if a win32 socket were put into
nonblocking mode via a C extension (I notice ruby's win32.c
already defines a rb_w32_ioctlsocket() ... but nothing seems
to use it) ... Would ruby's scheduler on win32 work correctly
with windows sockets in nonblocking mode? I haven't tried
this yet.
Regards,
Bill
In other news, 1989 called. They want their version numbering system back.
Please give us a sane version number. :)
Dan
From: "Tanaka Akira" <a...@m17n.org>
> In article <033601c5a9fa$26ba7840$6442a8c0@musicbox>,
> "Bill Kelly" <bi...@cts.com> writes:
>
> > Wow - the main thing holding back progress on this front is
> > method names? I could embrace connect_nonblock or nbconnect,
> > there.
>
> Do you have a problem with threads?
>
> If you use threads, nonblocking methods are not required in general.
>
> I'd like to know why people doesn't use threads.
I'm about to head to the airport so I'll just say that
I'm still experimenting. I sometimes develop a system
with threads, because I think select() was a pain in the
last system I did. Then I get annoyed with the new
threaded system, and go back to select() on the next one.
I keep encountering trade-offs, and I'm not sure which
way I like best.
> 2. There is no F_GETFL on Windows.
> Ruby cannot test O_NONBLOCK is set/clear on a fd. So connect
> method cannot check O_NONBLOCK.
:( OK thanks.
Well, what if we were to add a #nonblocking= method to IO,
or at least to Socket?
So the programmer could say: socket.nonblocking = true
And have Ruby perform the appropriate action behind the
scenes?
Regards,
Bill
> You might also ask yourself, do you really *need* to have
> the scheduler arbitrarily switch contexts back and forth?
> Do your event handlers really take that much time to run?
> If so, fine. Otherwise, why not have determinism instead?
To nitpick, neither pre-emptive threading nor cooperative threading
(of which explicit event handling loop is a form of) has anything to
do with determinism.
It is what is being executed in that thread that determines whether it
is deterministic or not.
YS.
Performance and resource consumption is it for me. I have a
client/server transaction processing application written in twisted
that can do about 300 tps and use less then 5% of the cpu on a 3ghz
pentium. Each transaction does 2-3 database queries, an http post to
a remote server, and some text formatting. It does use a thread
pool for database connections but that's it. Since the http post
takes 2-3 seconds to complete, there are anywhere from 900 to 2000
active client connections at any one time. The server uses a steady
30mb of ram the whole time.
Chris
> Wow - the main thing holding back progress on this front is
> method names? I could embrace connect_nonblock or nbconnect,
> there.
Do you have a problem with threads?
If you use threads, nonblocking methods are not required in general.
I'd like to know why people doesn't use threads.
> The blocking I/O issues are the thorniest problem for me
> writing applications in ruby. (Of course, it's 1000 times
> worse on Windows, ... where nonblocking I/O is apparently not
> supported at all yet. That is just a nightmare.)
I heard Windows has nonblocking I/O for sockets.
> But regarding the method names - I'm wondering - are separate
> methods really needed? Are there any cases where Ruby can't
> just inspect the fcntl() flags of the socket, and if
> O_NONBLOCK is set, provide nonblocking behavior? You mentioned
> connect(), which is an instance method. Couldn't connect()
> just check for O_NONBLOCK? Why would a separate method be
> needed? (Sorry if this is a FAQ. :)
1. The threaded programs needs blocking methods for a IO object with
O_NONBLOCK. O_NONBLOCK is required to avoid enteire process
blocking by write operations. But the threaded programs still
needs blocking behavior because most threaded programs doesn't
expects EAGAIN. I think nonblocking methods are better than
implementing EAGAIN retry loop for all threaded programs.
2. There is no F_GETFL on Windows.
Ruby cannot test O_NONBLOCK is set/clear on a fd. So connect
method cannot check O_NONBLOCK.
--
Tanaka Akira
> In a callback-based system, you have to deal with callbacks.
> In a preemptively multithreaded system, you have to deal
> with synchronization. It's a tradeoff, and largely a matter
> of taste, preference and familiarity.
It seems that a giant lock can be some compromise of them.
(like GIL of Python)
Apart from that, Ruby's IO methods are not so good for event loop.
You may have frustration when you find that some methods block even if
O_NONBLOCK is set.
The blocking behavior is good for threaded programs. The context
switch behind the blocking is enough to do some works because the
works are held by other threads. So the blocking behavior makes
threaded programs happy even if O_NONBLOCK is set. Anyway O_NONBLOCK
is required to avoid entire process blocking on write operation.
However the behavior is bad for event loop style programs. Because
the works are held by the event loop in the caller's thread.
So I think it is good to have both blocking methods and nonblocking
methods. The nonblocking methods should make event loop style
programs happy. However it is not accepted by matz because good names
for nonblocking methods are not found yet. Recently I proposed
connect_nonblock, nonblock_connect, nbconnect for nonblocking connect
but they are rejected.
--
Tanaka Akira
> Performance and resource consumption is it for me. I have a
> client/server transaction processing application written in twisted
> that can do about 300 tps and use less then 5% of the cpu on a 3ghz
> pentium. Each transaction does 2-3 database queries, an http post to
> a remote server, and some text formatting. It does use a thread
> pool for database connections but that's it. Since the http post
> takes 2-3 seconds to complete, there are anywhere from 900 to 2000
> active client connections at any one time. The server uses a steady
> 30mb of ram the whole time.
I see. I never experienced such applications, though.
--
Tanaka Akira
Wow! Thanks.
I updated FSDB(*) to 0.5 to take advantage of this, in case it's running
on 1.8.2 or better. In code with several processes each with several
threads, I see about a 12%-17% speed boost, because of not having to use
the polling hack.
(*) http://redshift.sourceforge.net/fsdb/
--
> Daniel Brockman <dan...@brockman.se> writes:
>
>> You might also ask yourself, do you really *need* to have
>> the scheduler arbitrarily switch contexts back and forth?
>> Do your event handlers really take that much time to run?
>> If so, fine. Otherwise, why not have determinism instead?
>
> To nitpick, neither pre-emptive threading nor cooperative
> threading (of which explicit event handling loop is a form
> of) has anything to do with determinism.
To nitpick back, I think you overstated that claim a bit.
Cooperatively threaded systems are deterministic by default;
pre-emptively scheduled ones are probablistic by default.
If you write a multithreaded program without keeping
synchronization in mind, it is likely to still end up
essentially deterministic under cooperative threading.
If you are using pre-emptive threading, however, you are
very likely to introduce race conditions.
So what I'm saying here is that while I agree that the
determinism of a correctly written program does not depend
fundamentally on the kind of threading in use, I must object
to the claim that ``[neither threading model] has anything
to do with determinism.''
In a cooperatively multithreaded program, control progresses
linearly through the source --- every line of code will be
executed immediately after the previous one has finished.
In a pre-emptively scheduled one, on the other hand, control
jumps around probablistically. Determinism is clearly
relevant here, IMHO.
But I see your point. I did sort of imply that pre-emptive
threading leads to non-determinism, which might not be the
fairest way of putting it. Sorry about that.
> It is what is being executed in that thread that
> determines whether it is deterministic or not.
I agree. It's just you don't have to put anything fancy in
cooperative threads to make them deterministic, because they
already are by default. Unless you put `rand' everywhere.
--
Daniel Brockman <dan...@brockman.se>
I'm still living in 1989 in many ways....
It's not quite a three-year old project yet, so I don't think it
deserves 1.0 status ;)
Or do you mean more digits? (Internally, it is 0.5.5, but, for a minor
project like this, I only release the last in each 0.x series.)
That's a good point.
I do like what using threads does to the architecture of my program.
It's very easy to separate all the functionality out into components,
each of which performs a specific task, has a ThreadGroup to manage its
own threads, and communicates with other components by queues. The
components can be tested idependently and even executed in other
processes/hosts, if you replace Queue with something based on Sockets
and Marshal, or DRb.
So I guess another consideration in making this tradeoff is the degree
to which the system as a whole can be decoupled.
If, for example, the handlers are making atomic updates to some
monolithic data structure, or to a GUI, then decoupling doesn't make
sense: the overhead to make the updates atomic would be too high.
Right... For the program I'm writing right now, if I were
using threads, what I'd like is to have is multiple
pairs of threads--a read thread doing
sock.gets()
and a write thread doing
sock.puts(line)
.. but I'm afraid the #puts will block my whole process,
potentially. . . . So I can break that down into select()
and send() with NONBLOCK ... But then I'm afraid to use
puts() on the same socket, because I fear mixing high-level
gets/puts with low-level send/recv... So I presume I need
to break both threads into select() and send/recv...
And so it turns into a bigger chore than it ought to be
in Ruby... :( And so at that point I think why not just
have one thread with a central select() ...
Well, ... come to think of it - another reason I'd decided
to try a central select() again, was my previous program
that massively used threads in ruby would pause occasionally,
and I could never track down what the heck the process was
doing when it was paused. (This wasn't the UDP checksum
thing - these were pauses from maybe a fraction of a second
to 2 or 3 seconds.) It just happened frequently enough to
be annoying, but infrequently enough that it was difficult
to trace.... And so I remember thinking, if this were single-
threaded, it would be--in theory-- easier to determine where
program was paused in these situations.
However - I deduced later, it may have been doing garbage
collection and causing page swaps. (I had a large in-memory
hash table.) If that were the case, the mystery would have
been the same in a single-threaded ruby app. :)
So - I don't know. One thing I'm sure of is that if Ruby
handled nonblocking better behind the scenes, network
programming in ruby could be as much of a joy as most other
ruby programming is.
> I heard sock.fcntl(Fcntl::F_SETFL, File::NONBLOCK) makes sock
> nonblocking mode on Windows. However sock.fcntl(Fcntl::F_GETFL)
> doesn't work.
Is this new? I don't seem to have File::NONBLOCK in my
ruby 1.8.2 (2004-12-25) [i386-mswin32]
Regards,
Bill
> Right... For the program I'm writing right now, if I were
> using threads, what I'd like is to have is multiple
> pairs of threads--a read thread doing
>
> sock.gets()
>
> and a write thread doing
>
> sock.puts(line)
>
> ... but I'm afraid the #puts will block my whole process,
> potentially. . . . So I can break that down into select()
> and send() with NONBLOCK ... But then I'm afraid to use
> puts() on the same socket, because I fear mixing high-level
> gets/puts with low-level send/recv... So I presume I need
> to break both threads into select() and send/recv...
I can understand the fear. However I'm not sure how it can be
eliminated. Some documenatation might help.
> So - I don't know. One thing I'm sure of is that if Ruby
> handled nonblocking better behind the scenes, network
> programming in ruby could be as much of a joy as most other
> ruby programming is.
I'm trying.
> Is this new? I don't seem to have File::NONBLOCK in my
> ruby 1.8.2 (2004-12-25) [i386-mswin32]
I'm not sure. Maybe after that.
--
Tanaka Akira
Wow, I guess I did say "afraid" a lot of times there.
As I mentioned earlier, I'm still experimenting with
multi-threaded and single-threaded-select() based
implementations.
I wrote a new single-threaded-select() implementation, but
it *was* a lot more code than a threaded version using Queue
would have been. So this afternoon I put the single-threaded
version aside, and coded up a multi-threaded version, using
gets / puts as I described above.
The multi-threaded version looks nice, but here's what
I'm getting on Windows:
>> require 'socket'
=> true
>> require 'thread'
=> true
>> sv = TCPServer.new(12345)
=> #<TCPServer:0x2dbbc90>
>> cl = TCPSocket.new("localhost", 12345)
=> #<TCPSocket:0x2db8e28>
>> th1 = Thread.new { sleep(1.0) until cl.eof? }
=> #<Thread:0x2db4c70 sleep>
>> th2 = Thread.new { sleep(1.0) until cl.eof? }
=> #<Thread:0x2db0a58 sleep>
>> cl.eof?
^^^^^^^^ This call never returns, the whole process hangs,
apparently.
This is ruby 1.8.2 (2004-12-25) [i386-mswin32]
I must admit I didn't expect it to block the process in this
situation. Am I doing something stupid?
.. Hmmm, I'm getting this same hanging behavior in Linux,
regardless of whether the socket is in O_NONBLOCK mode.
Hrm.. It appears this is hanging due to the TCPServer being
in the same Ruby process as the client.
Or at least it seemed to make a difference on Linux... But
on Windows, even with the server socket being created in
a separate process, I'm still getting:
>> require 'socket'
=> true
>> require 'thread'
=> true
>> cl = TCPSocket.new("localhost", 12345)
=> #<TCPSocket:0x2dbaf88>
>> th1 = Thread.new { sleep(1.0) until cl.eof? }
=> #<Thread:0x2db6dd0 sleep>
>> th2 = Thread.new { sleep(1.0) until cl.eof? }
=> #<Thread:0x2db2bb8 sleep>
>> cl.eof?
^^^^^^^^ hang
Am I doing something dumb here? It seems this should be
legal.
Thanks,
Regards,
Bill
> >> require 'socket'
> => true
> >> require 'thread'
> => true
> >> cl = TCPSocket.new("localhost", 12345)
> => #<TCPSocket:0x2dbaf88>
> >> th1 = Thread.new { sleep(1.0) until cl.eof? }
> => #<Thread:0x2db6dd0 sleep>
> >> th2 = Thread.new { sleep(1.0) until cl.eof? }
> => #<Thread:0x2db2bb8 sleep>
> >> cl.eof?
>
> ^^^^^^^^ hang
I guess it must be something very basic I'm not aware of
about #eof? ... Because even:
> >> require 'socket'
> => true
> >> require 'thread'
> => true
> >> cl = TCPSocket.new("localhost", 12345)
> => #<TCPSocket:0x2dbaf88>
> >> cl.eof?
hangs.
Is it not possible to check for EOF without blocking?
Regards,
Bill
sometimes yes, sometimes no, or so it seems:
[ahoward@localhost ~]$ cat a.rb
require 'socket'
s = TCPSocket::new '127.0.0.1', 80
s.eof?
[ahoward@localhost ~]$ strace ruby a.rb 2>&1
...
...
...
read(3
and, from io.c:
/*
* call-seq:
* ios.eof => true or false
* ios.eof? => true or false
*
* Returns true if <em>ios</em> is at end of file. The stream must be
* opened for reading or an <code>IOError</code> will be raised.
*
* f = File.new("testfile")
* dummy = f.readlines
* f.eof #=> true
*/
VALUE
rb_io_eof(io)
VALUE io;
{
OpenFile *fptr;
int ch;
GetOpenFile(io, fptr);
rb_io_check_readable(fptr);
if (feof(fptr->f)) return Qtrue;
if (READ_DATA_PENDING(fptr->f)) return Qfalse;
READ_CHECK(fptr->f);
clearerr(fptr->f);
TRAP_BEG;
ch = getc(fptr->f); // look here !!!
TRAP_END;
if (ch != EOF) {
ungetc(ch, fptr->f);
return Qfalse;
}
rb_io_check_closed(fptr);
clearerr(fptr->f);
return Qtrue;
}
i'm no systems/network guy - but it seems like maybe one shouldn't try to read
a char from anything like a socket, pipe, etc. that might hang for a read in
this case... i dunno how to check for this kind of thing though... maybe:
if(fseek(fptr->f) == EBADF) {
return Qnil; // we can't tell for this stream!
}
else {
TRAP_BEG;
ch = getc(fptr->f);
TRAP_END;
}
or maybe an error could be thrown up front for this case (Errno::EWOULDBLOCK
for example)... but this doesn't seem right since you may want to check eof
for a socket at times... nil could mean "don't know."
anyhow... it sure makes sense that it'd block though...
hth.
-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| Your life dwells amoung the causes of death
| Like a lamp standing in a strong breeze. --Nagarjuna
===============================================================================
> [ahoward@localhost ~]$ strace ruby a.rb 2>&1
> ...
> ...
> ...
> read(3
>
>
> and, from io.c:
[...]
> ch = getc(fptr->f); // look here !!!
[...]
> anyhow... it sure makes sense that it'd block though...
Thanks, Ara !
Guess I would have expected it just to do the
if (feof(fptr->f)) return Qtrue;
part...
I've changed my program to infer EOF when gets() returns
nil. I no longer call #eof? because although it admittedly
looks pretty darned thorough, it's no good for me to block.
Thanks again,
Regards,
Bill
> Is it not possible to check for EOF without blocking?
I documented the blocking behavior of IO#eof?. I hope it reduces
confusion about IO#eof?.
Index: io.c
===================================================================
RCS file: /src/ruby/io.c,v
retrieving revision 1.376
diff -u -p -r1.376 io.c
--- io.c 30 Aug 2005 14:49:51 -0000 1.376
+++ io.c 5 Sep 2005 14:52:58 -0000
@@ -908,12 +908,31 @@ io_getc(OpenFile *fptr)
* ios.eof => true or false
* ios.eof? => true or false
*
- * Returns true if <em>ios</em> is at end of file. The stream must be
- * opened for reading or an <code>IOError</code> will be raised.
+ * Returns true if <em>ios</em> is at end of file that means
+ * there are no more data to read.
+ * The stream must be opened for reading or an <code>IOError</code> will be
+ * raised.
*
* f = File.new("testfile")
* dummy = f.readlines
* f.eof #=> true
+ *
+ * If <em>ios</em> is a stream such as pipe or socket, <code>IO#eof?</code>
+ * blocks until the other end sends some data or closes it.
+ *
+ * r, w = IO.pipe
+ * Thread.new { sleep 1; w.close }
+ * r.eof? #=> true after 1 second blocking
+ *
+ * r, w = IO.pipe
+ * Thread.new { sleep 1; w.puts "a" }
+ * r.eof? #=> false after 1 second blocking
+ *
+ * r, w = IO.pipe
+ * r.eof? # blocks forever
+ *
+ * Note that <code>IO#eof?</code> reads data to a input buffer.
+ * So <code>IO#sysread</code> doesn't work with <code>IO#eof?</code>.
*/
VALUE
--
Tanaka Akira
>In article <0ca201c5b19d$6368edd0$6442a8c0@musicbox>,
> "Bill Kelly" <bi...@cts.com> writes:
>
> > Is it not possible to check for EOF without blocking?
>
>I documented the blocking behavior of IO#eof?. I hope it reduces
>confusion about IO#eof?.
Hmmmm...
Correct me if I'm wrong, but using eof? on a socket would be rather
useless, wouldn't it?
I'd *really* like to find something that will just tell me if there's data
waiting on a socket. Something I can use like
someSocket.incoming_data_waiting?
And if there's something I can read, I'll get true, and otherwise I'll get
false. But nothing I've tried seems to work that way.
-Morgan, wonders why such a simple seeming thing is so difficult?
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.10.18/91 - Release Date: 09/06/2005
Doesn't select work, with a timeout of 0? I haven't tried that, but it
seems like what you are asking for.
>>
>>I'd *really* like to find something that will just tell me if there's data
>>waiting on a socket. Something I can use like
>>
>>someSocket.incoming_data_waiting?
>>
>>And if there's something I can read, I'll get true, and otherwise I'll get
>>false. But nothing I've tried seems to work that way.
This whole issue always confuses me. But isn't that the purpose
of readpartial? Or am I mistaken?
Hal
> Correct me if I'm wrong, but using eof? on a socket would be rather
> useless, wouldn't it?
>
> I'd *really* like to find something that will just tell me if there's data
> waiting on a socket. Something I can use like
>
> someSocket.incoming_data_waiting?
I'm not sure why people want a such method.
What is a situation you need the method?
> And if there's something I can read, I'll get true, and otherwise I'll get
> false. But nothing I've tried seems to work that way.
If it returns false, your program cannot have data from the socket.
So your program cannot do about the data. If your program has
something to do in such case, there should be some work which doesn't
depend the data.
If the work is done by other threads, the blocking behavior is
appropriate because the other threads is run at the blocking time.
If the program use a event driven framework, the readability test
should be done by the main event loop in the framework. So the
program don't need the method.
So I guess your program has another situation I don't imagine.
However it is possible to implement the method by IO.select.
--
Tanaka Akira
I suppose because it's part of the only way I've found to get around the
lack of
a someSocket.readallavailable - something that will give me *all* the data
that can be read from the socket at the time I call it. (I could use recv with
a number larger than anything I should ever have waiting for an argument,
but that's crude. Admittedly, the workaround is also crude.)
Sometimes the only way I have of being certain that I've gotten all of an
incoming line is that I haven't been sent anything more. So, in place of
a readallavailable function, I have this procedure (which I have yet to code,
but I think should work):
1. Get a chunk of data with recv. At this point, I don't care if it blocks.
2. Add received data to a buffer.
3. Check to see if there's any data waiting.
4. If there is, go back to step 1. If there isn't, go on.
5. Now being sure that the buffer contains all data currently
available, the program can process it.
I'd also love to see a method that would tell me how many bytes are
waiting to be read at the moment. But I wouldn't be surprised if that's
not feasible for some reason or another.
> > And if there's something I can read, I'll get true, and otherwise I'll get
> > false. But nothing I've tried seems to work that way.
>
>If it returns false, your program cannot have data from the socket.
>So your program cannot do about the data. If your program has
>something to do in such case, there should be some work which doesn't
>depend the data.
>
>If the work is done by other threads, the blocking behavior is
>appropriate because the other threads is run at the blocking time.
>
>If the program use a event driven framework, the readability test
>should be done by the main event loop in the framework. So the
>program don't need the method.
I tried doing something like this (or at least what I think you're
referring to) in the program I'm working on now. (Using Fox toolkit
via FXRuby.) However, I couldn't get it to work right on my own, and
my message here asking for help got no replies...
In any case, the method called by the framework to handle the
event will still have to deal with the "do I have all the data?" issue.
Threads, though powerful, are rather annoying to deal with sometimes.
In the program I'm working on now, they're probably the best way
of handling things anyway, but I'd like to have a way of working
with sockets that doesn't force me to use them when they don't
suit my needs.
>So I guess your program has another situation I don't imagine.
>
>However it is possible to implement the method by IO.select.
It can do the job, yes. But it also seems like overkill for someone
who's only interested in one object. And it'd be a lot easier to read
something that simply calls a method on the socket of concern.
-Morgan
> (I could use recv with
> a number larger than anything I should ever have waiting for an argument,
> but that's crude.
Just curious why that would seem crude. It's my understanding
that's why recv() behaves as it does.
The following is untested, but is similar to what I've written
many times:
def TCPSocket.read_all_available
buf = ""
while select([self], nil, nil, 0)
dat = self.recv(65536) rescue nil
break unless dat && ! dat.empty?
buf << dat
end
buf
end
Usually instead of just a "break", I'll set an "eof" flag
when I get a nil (or an exception) back from recv().
(Note that select() will indicate that a socket is
'readable' if it's at EOF, but then recv() returns nothing
and/or an exception... as I recall... Thus the above
'rescue nil' followed by checking for no data returned.)
But anyway . . .
Hope this helps,
Regards,
Bill
> def TCPSocket.read_all_available
Oops, I meant
class TCPSocket
def read_all_available
...
end
end
Regards,
Bill
> I suppose because it's part of the only way I've found to get around the
> lack of
> a someSocket.readallavailable - something that will give me *all* the data
> that can be read from the socket at the time I call it. (I could use recv with
> a number larger than anything I should ever have waiting for an argument,
> but that's crude. Admittedly, the workaround is also crude.)
Why you needs the all available data?
Even if you got the all data at the time, some data may be arrived
just after that.
> Sometimes the only way I have of being certain that I've gotten all of an
> incoming line is that I haven't been sent anything more. So, in place of
> a readallavailable function, I have this procedure (which I have yet to code,
> but I think should work):
>
> 1. Get a chunk of data with recv. At this point, I don't care if it blocks.
> 2. Add received data to a buffer.
> 3. Check to see if there's any data waiting.
> 4. If there is, go back to step 1. If there isn't, go on.
> 5. Now being sure that the buffer contains all data currently
> available, the program can process it.
You can't assume that the all data is in the buffer because
some data may be arrived between 4 and 5.
Is it acceptable?
If it is acceptable, the program can process the data incrementaly;
for each 4K bytes or a line or another.
If it is not acceptable, I think it's a problem of the protocol.
> I tried doing something like this (or at least what I think you're
> referring to) in the program I'm working on now. (Using Fox toolkit
> via FXRuby.) However, I couldn't get it to work right on my own, and
> my message here asking for help got no replies...
Hm. GUI library. I understand why your program can't block.
> In any case, the method called by the framework to handle the
> event will still have to deal with the "do I have all the data?" issue.
Yes. The framework notify that some data (or EOF) available.
It doesn't notify how much data available.
However the framework should notify readability until kernel has no
data. So it is no problem that the callback doesn't process the all
data. If the callback process 1byte or more at once, all data will be
processed.
> It can do the job, yes. But it also seems like overkill for someone
> who's only interested in one object. And it'd be a lot easier to read
> something that simply calls a method on the socket of concern.
I'm not sure that Ruby should encourage to read all data available.
--
Tanaka Akira