[MacRuby-devel] Thread safety in apply example?

24 views
Skip to first unread message

Charles Oliver Nutter

unread,
Jan 22, 2011, 12:57:25 AM1/22/11
to MacRuby development discussions.
I'm curious about this example in Queue#apply's rdoc:

* gcdq = Dispatch::Queue.new('doc')
* @result = []
* gcdq.apply(5) {|i| @result[i] = i*i }
* p @result #=> [0, 1, 4, 9, 16, 25]

apply is said to issue the jobs in parallel, so this would be making
concurrent updates to the @result array. Are simple arrays in MacRuby
thread-safe?

- Charlie
_______________________________________________
MacRuby-devel mailing list
MacRub...@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Ernest N. Prabhakar, Ph.D.

unread,
Jan 22, 2011, 9:52:04 AM1/22/11
to MacRuby development discussions.
Hi Charles,

On Jan 21, 2011, at 9:57 PM, Charles Oliver Nutter wrote:

> I'm curious about this example in Queue#apply's rdoc:
>
> * gcdq = Dispatch::Queue.new('doc')
> * @result = []
> * gcdq.apply(5) {|i| @result[i] = i*i }
> * p @result #=> [0, 1, 4, 9, 16, 25]
>
> apply is said to issue the jobs in parallel, so this would be making
> concurrent updates to the @result array. Are simple arrays in MacRuby
> thread-safe?

My assumption was that doing a parallel assign:

result[i] = i*i

would be safe, since it always accessed a unique portion of memory, but doing a serial append:

result << i*i

would not. But that may have been a mistake on my part, since the size (at least) needs to be updated. Anyone know better?

-- Ernie P.

Charles Oliver Nutter

unread,
Jan 22, 2011, 1:20:39 PM1/22/11
to MacRuby development discussions.
On Sat, Jan 22, 2011 at 8:52 AM, Ernest N. Prabhakar, Ph.D.
<prab...@apple.com> wrote:
> My assumption was that doing a parallel assign:
>
> result[i] = i*i
>
> would be safe, since it always accessed a unique portion of memory, but doing a serial append:
>
> result << i*i
>
> would not.  But that may have been a mistake on my part, since the size (at least) needs to be updated. Anyone know better?

I believe you're correct about the size update being the problem.
Unless the result array has all the space it needs initially, the
example could easily trigger it to resize, and with concurrent threads
all writing at the same time it's very likely some results will get
lost.

It could be improved slightly if you initialize @result with a
guaranteed-sized backing store, like this:

@result = Array.new(5)

This would ensure the array already has a 5 element backing store
prepared, and so concurrent writes would be writing different memory
locations. But it still seems like a bad precedent, since the
implementation details of doing concurrent writes to Array are hidden.
It could be maintaining an immutable linked list structure instead of
an array, in which case every random-access update would need to
modify all nodes in one direction or the other, and you're back to
concurrent mutation.

mac...@djc.net

unread,
Jan 22, 2011, 3:00:18 PM1/22/11
to MacRuby development discussions., Ernest N. Prabhakar, Ph.D.

Taking
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html
as a sign, the answer appears to be:
neither of these can be safe, the latter even more so since the result is
unordered.

If one is paranoid, not even sizing the array beforehand (a very good idea) is
safe, unless someone can definitively state that no one (inc. MacRuby) will
be modifying/moving/otherwise manipulating this mutable object while you
are....

a bit of a disappointment, since a) we know resizing occurs far less
frequently then access (in general) b) making this all thread safe can easily
be done at the Framework level c) most of the time, the behavior is exactly
what we see here w.r.t. sizing (e.g. it grows UP, never down, and not up and
down ... until GC kicks in and removes it all)

-Daniel

On Sat, 22 Jan 2011 06:52:04 -0800


"Ernest N. Prabhakar, Ph.D." <prab...@apple.com> wrote:

Charles Oliver Nutter

unread,
Jan 23, 2011, 5:48:49 PM1/23/11
to MacRuby development discussions., Ernest N. Prabhakar, Ph.D.
On Sat, Jan 22, 2011 at 2:00 PM, <mac...@djc.net> wrote:
> Taking
> http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html
> as a sign, the answer appears to be:
> neither of these can be safe, the latter even more so since the result is
> unordered.

Yes, that's not too surprising I guess

> If one is paranoid, not even sizing the array beforehand (a very good idea)
> is safe, unless someone can definitively state that no one (inc. MacRuby)
> will
> be modifying/moving/otherwise manipulating this mutable object while you
> are....

That's certainly not specified behavior, although I *think* it would
always be safe in JRuby and MRI (don't know about MacRuby).

> a bit of a disappointment, since a) we know resizing occurs far less
> frequently then access (in general) b) making this all thread safe can
> easily be done at the Framework level c) most of the time, the behavior is
> exactly what we see here w.r.t. sizing (e.g. it grows UP, never down, and
> not up and down ... until GC kicks in and removes it all)

In JRuby we ship a nonstandard extension JRuby::Synchronized you can
include into any class (or object) to force all methods to be
synchronized. The example in that case would look like:

@result = []
@result.extend JRuby::Synchronized

Or even better, create a SyncArray subclass that includes it.

I'd like for there to be a standard way to do this across Ruby impls
(in particular, I'd like a standard set of thread-safe collections in
Ruby proper), but perhaps the JRuby solution will provide a starting
point for everyone else.

Alan Skipp

unread,
Jan 24, 2011, 6:40:09 AM1/24/11
to MacRuby development discussions.
Warning, horrible hack alert!
I'm sure there must be a far better implementation, but the idea is to have an array subclass which diverts all method calls through a serial dispatch queue, which should ensure thread safe access.

Anyway, here's the horrible hack. By the way, it manages to break Enumerable, other than that it's perfect!

class ThreadSafeArray < Array
instance_methods.each do |meth|
eval "
def #{meth}(*args, &block)
val = []
@serial_queue.async(@group) do
val[0] = super
end
@group.wait
val[0]
end"
end

def initialize(size=0, obj=nil)
@serial_queue = Dispatch::Queue.new("array.serial_queue")
@group = Dispatch::Group.new
super
end
end

gcdq = Dispatch::Queue.new('doc')

@result = ThreadSafeArray.new


gcdq.apply(5) {|i| @result[i] = i*i }

p @result #=> [0, 1, 4, 9, 16, 25]


Having to create an array for every method call to copy the return value into is terrible, is there a better way of getting a reference to the return value from a dispatch queue?

Matt Massicotte

unread,
Jan 24, 2011, 10:22:44 AM1/24/11
to MacRuby development discussions.
What's the advantage of this over just iterating over the array serially using each?

Matt

Alan Skipp

unread,
Jan 24, 2011, 11:00:41 AM1/24/11
to MacRuby development discussions.
The post was admittedly vague, but was more of an idea about a general purpose thread safe array class (in principle, this could apply to any mutable class). Perhaps implemented as a module that would redirect all method calls through a serial dispatch queue. This would mean that the array (or other mutable collection) could be mutated from multiple threads/queues without fear of bad things happening.

Matt Massicotte

unread,
Jan 24, 2011, 11:24:04 AM1/24/11
to MacRuby development discussions.
My guess is that most problems need a thread-safe mechanism, where manipulating an array is just one part of a more complex system. Pushing the details of the thread-safety into the array just hides the problem, and will likely trade more locking for possibly simpler code.

Which could be a very good trade, don't get me wrong. I'm not trying to knock your implementation at all.

I'm just saying that achieving parallelism actually is hard. I've been much better served (though, admittedly, not in Ruby) by trying to get really familiar with the queue paradigm and understanding how to use them in place of traditional locks and threads.

I think the actual underlying problem here is Queue#apply's rdoc entry. Manipulating a non-thread-safe data structure (which is all of Foundation's collection classes) in an apply is a bug and even worse a conceptual error. Where do you open bugs on the dispatch gem?

Matt

Ernest N. Prabhakar, Ph.D.

unread,
Jan 24, 2011, 12:08:30 PM1/24/11
to MacRuby development discussions.
Hi all,

On Jan 24, 2011, at 8:24 AM, Matt Massicotte wrote:

> I'm just saying that achieving parallelism actually is hard. I've been much better served (though, admittedly, not in Ruby) by trying to get really familiar with the queue paradigm and understanding how to use them in place of traditional locks and threads.

Right, that is why I created the extended Dispatch module (now a gem), to try to handle all that bookkeeping safely.

I think those examples don't suffer the same problem.

-- Ernie P.

Charles Oliver Nutter

unread,
Jan 24, 2011, 12:15:55 PM1/24/11
to MacRuby development discussions.
On Mon, Jan 24, 2011 at 10:24 AM, Matt Massicotte <massi...@apple.com> wrote:
> My guess is that most problems need a thread-safe mechanism, where manipulating an array is just one part of a more complex system.  Pushing the details of the thread-safety into the array just hides the problem, and will likely trade more locking for possibly simpler code.
>
> Which could be a very good trade, don't get me wrong.  I'm not trying to knock your implementation at all.

In some cases, it's a great trade. It's just not a trade-off you want
to make for *all* access, which is why concurrent-threaded Ruby impls
like JRuby and MacRuby don't make that guarantee for Array, Hash, etc.
Of course, regular C Ruby isn't concurrent-threaded, so they have had
little motivation to do *anything* to improve the state of concurrent
programming on Ruby, like adding thread-safe collections or simpler
synchronization mechanisms.

> I think the actual underlying problem here is Queue#apply's rdoc entry.  Manipulating a non-thread-safe data structure (which is all of Foundation's collection classes) in an apply is a bug and even worse a conceptual error.  Where do you open bugs on the dispatch gem?

Queue#apply is actually defined in gcd.c in MacRuby's code, so this
would be a MacRuby bug. I'd have filed it, but I wasn't sure if I was
missing something about MacRuby's Array and thread-safety.

Joshua Ballanco

unread,
Jan 24, 2011, 12:23:34 PM1/24/11
to MacRuby development discussions.
On Sat, Jan 22, 2011 at 12:57 AM, Charles Oliver Nutter <hea...@headius.com> wrote:
I'm curious about this example in Queue#apply's rdoc:

 *     gcdq = Dispatch::Queue.new('doc')
 *     @result = []
 *     gcdq.apply(5) {|i| @result[i] = i*i }
 *     p @result  #=> [0, 1, 4, 9, 16, 25]

apply is said to issue the jobs in parallel, so this would be making
concurrent updates to the @result array. Are simple arrays in MacRuby
thread-safe?

Arrays in MacRuby are not inherently thread safe. However, "Dispatch::Queue.new" creates a serial queue. So, there is an error in that each invocation of the block will execute serially. If, instead, the example had used "Dispatch::Queue.concurrent", then there would be a potential bug, but the easiest way to work around that is:

gcdq = Dispatch::Queue.concurrent
@result = []
@result_q = Dispatch::Queue.new('@result_assignment_queue')
gcdq.apply(5) { |i| res = i * i; @result_q.async { @result[i] = res } }
p @result

There's also a mixin available in the Dispatch gem which abstracts away the need for a serial queue for assignments by monkey-patching assignments to all happen on a serial queue.

For those wondering about the advantages to this technique, it's worth noting that GCD has a much lower overhead than more traditional methods of parallelism. That said, if the calculation for each iteration is small, this is not a useful technique. However, if the calculation for each iteration is long or (more likely) non-deterministic -- as in waiting for an external call or a remote connection -- then this can be a useful technique to continue doing useful work while letting iteration complete in the background.

Cheers,

Josh

Joshua Ballanco

unread,
Jan 24, 2011, 12:26:23 PM1/24/11
to MacRuby development discussions.
On Mon, Jan 24, 2011 at 12:15 PM, Charles Oliver Nutter <hea...@headius.com> wrote:
Queue#apply is actually defined in gcd.c in MacRuby's code, so this
would be a MacRuby bug. I'd have filed it, but I wasn't sure if I was
missing something about MacRuby's Array and thread-safety.

Yeah, probably a documentation bug...but only sorta. Parallel/Not-Parallel will depend on the queue. Apply just queues the block asynchronously the number of times indicated by the argument. How the block executes depends on the queue.

Jordan K. Hubbard

unread,
Jan 24, 2011, 12:39:33 PM1/24/11
to MacRuby development discussions.

On Jan 24, 2011, at 8:00 AM, Alan Skipp wrote:

> The post was admittedly vague, but was more of an idea about a general purpose thread safe array class (in principle, this could apply to any mutable class). Perhaps implemented as a module that would redirect all method calls through a serial dispatch queue. This would mean that the array (or other mutable collection) could be mutated from multiple threads/queues without fear of bad things happening.

FWIW, we've tried this same approach experimentally (use a per-object serial queue behind the scenes to implement an async setter / sync getter model) over in CoreFoundation land, and it seemed to work reasonably well for the most part. We ran into some compatibility issues with a few of the methods which just didn't lend themselves to a thread-safe object model, enough that we couldn't simply do this as a completely transparent solution, but it mostly Just Worked and I think MacRuby (or, for that matter, JRuby) could do something very similar and probably get away with it since Ruby arrays/hashes/strings aren't quite so inclined to hand pointers to their internal state around to clients of the API. Laurent doesn't exactly agree with me since he's still focused on performance, but I think it should definitely be possible to declare a Ruby object as "contended" in some way such that everything then goes through a serial queue implicitly to access it - I'm not saying this sh
ould be true for every object, just the ones for which you need thread safety.

- Jordan

Charles Oliver Nutter

unread,
Jan 24, 2011, 8:09:13 PM1/24/11
to MacRuby development discussions.

Ok, that makes sense. The documentation implied that apply always runs
the jobs in parallel, so I assumed that meant they could potentially
run concurrently. If instead they're serial, the example as written
should be safe.

It seems like a poor example in any case, however, since someone might
assume using a concurrent queue is safe as well.

Charles Oliver Nutter

unread,
Jan 24, 2011, 8:12:13 PM1/24/11
to MacRuby development discussions.
On Mon, Jan 24, 2011 at 11:39 AM, Jordan K. Hubbard <j...@apple.com> wrote:
> FWIW, we've tried this same approach experimentally (use a per-object serial queue behind the scenes to implement an async setter / sync getter model) over in CoreFoundation land, and it seemed to work reasonably well for the most part.  We ran into some compatibility issues with a few of the methods which just didn't lend themselves to a thread-safe object model, enough that we couldn't simply do this as a completely transparent solution, but it mostly Just Worked and I think MacRuby (or, for that matter, JRuby) could do something very similar and probably get away with it since Ruby arrays/hashes/strings aren't quite so inclined to hand pointers to their internal state around to clients of the API.   Laurent doesn't exactly agree with me since he's still focused on performance, but I think it should definitely be possible to declare a Ruby object as "contended" in some way such that everything then goes through a serial queue implicitly to access it - I'm not saying this sh
>  ould be true for every object, just the ones for which you need thread safety.

I'd be extremely surprised if the overhead frrom making an object
thread safe with a queue wasn't much higher than simply synchronizing
all access with normal low-level locking primitives. It might be
useful to transparently proxy all calls targeting a specific object
through a queue-based delegator (as I believe you or someone has
rigged up in the Dispatch gem), but I'd be very surprised if it were a
good generally-applicable thread-safety mechanism for mutable state.

Sometimes you just need to use locks.

- Charlie

Charles Oliver Nutter

unread,
Jan 24, 2011, 8:20:29 PM1/24/11
to MacRuby development discussions.
On Mon, Jan 24, 2011 at 11:23 AM, Joshua Ballanco <jbal...@gmail.com> wrote:
> Arrays in MacRuby are not inherently thread safe. However,
> "Dispatch::Queue.new" creates a serial queue. So, there is an error in that
> each invocation of the block will execute serially. If, instead, the example
> had used "Dispatch::Queue.concurrent", then there would be a potential bug,
> but the easiest way to work around that is:
> gcdq = Dispatch::Queue.concurrent
> @result = []
> @result_q = Dispatch::Queue.new('@result_assignment_queue')
> gcdq.apply(5) { |i| res = i * i; @result_q.async { @result[i] = res } }
> p @result

Yes, forcing the updates to run serially rather than in parallel. The
poor-man's fork/join.

> There's also a mixin available in the Dispatch gem which abstracts away the
> need for a serial queue for assignments by monkey-patching assignments to
> all happen on a serial queue.

I'm curious what you mean by this. Can you point out the code? Is it
actually attempting to rewrite local variable or instance variable or
array modifications in code?

> For those wondering about the advantages to this technique, it's worth
> noting that GCD has a much lower overhead than more traditional methods of
> parallelism. That said, if the calculation for each iteration is small, this
> is not a useful technique. However, if the calculation for each iteration is
> long or (more likely) non-deterministic -- as in waiting for an external
> call or a remote connection -- then this can be a useful technique to
> continue doing useful work while letting iteration complete in the
> background.

I suppose it's time to explain why I'm looking into this stuff at all.

I'm building a JRuby-based mimic API of the MacRuby GCD API:
https://github.com/headius/jcd

I started out using java.util.concurrent Executors directly, and later
moved to using HawtDispatch, a libdispatch-like API. The latter has
done a reasonably good job of matching the necessary dispatch_*
functions in an OO API, and for the pieces that exist the wrapping job
has been trivial.

What I have now is a mostly-complete Queue and Object, but I'm still
figuring out how to map the other libdispatch primitives to what
HawtDispatch provides. The library currently is complete enough to run
a fork of ControlTower I made called ChaosBazaar:
http://github.com/headius/ChaosBazaar

There's no real reason I shouldn't be able to support most of the
MacRuby GCD APIs using just what's available on the JVM (i.e. no
native or OS dependencies), which would allow those APIs to be used on
any platform (and GCD client code that's not otherwise bound to
MacRuby to work on JRuby without modification). I think that would be
a good thing for concurrency in the Ruby world.

Joshua Ballanco

unread,
Jan 25, 2011, 2:00:50 AM1/25/11
to MacRuby development discussions.
On Mon, Jan 24, 2011 at 8:20 PM, Charles Oliver Nutter <hea...@headius.com> wrote:
Yes, forcing the updates to run serially rather than in parallel. The
poor-man's fork/join.

More like a map-reduce, I would think. In fact, at some point in the past the "dispatch extras" (i.e. everything in MacRuby dispatch related that wasn't part of the "base" libdispatch functionality) contained a #map_reduce function. Actually, I must admin that it's been some time since I've had a look at the dispatch extras/dispatch gem...and there seems to be less of them than I recall...hmmm...
 
> There's also a mixin available in the Dispatch gem which abstracts away the
> need for a serial queue for assignments by monkey-patching assignments to
> all happen on a serial queue.

I'm curious what you mean by this. Can you point out the code? Is it
actually attempting to rewrite local variable or instance variable or
array modifications in code?

Sorry, I mis-remembered. Not a mix-in, but Dispatch::Proxy from the dispatch gem is what I was thinking of. It uses the delegator pattern to funnel all access through a serial queue. Admittedly, this technique is a bit heavy handed (and, I'm just now realizing, somewhat inefficiently implemented...). Still, it gets the job done...
 
I'm building a JRuby-based mimic API of the MacRuby GCD API:
https://github.com/headius/jcd

That sounds like an excellent idea! One issue that we've come across repeatedly in MacRuby is how to teach Ruby programmers to write "grown-up" thread-safe code (and, from my half-hearted attempts to follow what's going on in JRuby, it seems you've been dealing with this as well). That is, something like "@my_array[i] += 1" is safe enough when you have a GIL, but is not when you have true low-level reentrancy. (In fact, even @my_array[i] = foo is not thread-safe!)

Instead of attempting to bring the world of spin-locks and mutexes (and when to use one instead of the other) into Ruby, I think it's probably more useful to introduce a transactional programming model. This LtU post comes to mind: http://lambda-the-ultimate.org/node/4070

Anyway, I'd be glad to lend a hand (and whatever slice of my limited time I can spare) to the effort. In particular, test suites/specs are difficult in this space, as the MacRuby team has learned first-hand.
 
I started out using java.util.concurrent Executors directly, and later
moved to using HawtDispatch, a libdispatch-like API. The latter has
done a reasonably good job of matching the necessary dispatch_*
functions in an OO API, and for the pieces that exist the wrapping job
has been trivial.

What I have now is a mostly-complete Queue and Object, but I'm still
figuring out how to map the other libdispatch primitives to what
HawtDispatch provides. The library currently is complete enough to run
a fork of ControlTower I made called ChaosBazaar:
http://github.com/headius/ChaosBazaar

So, now I'm curious: have you gotten to the dispatch_source_t part of libdispatch? I wonder, because this is where I would predict the most difficulty. In OS X, the dispatch sources are made possible through the use of a kqueue, which is kinda like a select but at the kernel level. Also, there might be some difficulty in mapping the semantics directly due to the high degree of asynchrony achieved by libdispatch. For example: the following is perfectly legal:

myq = Dispatch::Queue.new
myq.async do
  myq.cancel!
  *keep doing stuff on 'myq'*
end

...because cancelation is dispatched, asynchronously, to be executed by the same queue (or, well, with the same queueing semantics) as the queue you are canceling.

Also, I'm really excited by the clone of ControlTower. I should warn you that ControlTower only includes ~40% of the GCD-ishness that was originally planned for it. I originally intended that reading from/writing to the socket should occur using dispatch_sources, since that is more efficient than waiting on a blocking read. In fact, I have (somewhere) a branch with this implemented, but the concurrency bugs that developed were so cryptic and difficult to chase down, that I eventually gave up...resulting in what you have now. I think you'll find the vestiges of these attempts in the "gcd-ify" branch...but I can't vouch at all for their functional state.

Anyway, I think this could be the start of a great thing for Ruby in general...

Cheers,

Josh

Charles Oliver Nutter

unread,
Jan 25, 2011, 3:11:04 AM1/25/11
to MacRuby development discussions.
On Tue, Jan 25, 2011 at 1:00 AM, Joshua Ballanco <jbal...@gmail.com> wrote:
> On Mon, Jan 24, 2011 at 8:20 PM, Charles Oliver Nutter <hea...@headius.com>
> wrote:
>> I'm curious what you mean by this. Can you point out the code? Is it
>> actually attempting to rewrite local variable or instance variable or
>> array modifications in code?
>
> Sorry, I mis-remembered. Not a mix-in, but Dispatch::Proxy from the dispatch
> gem is what I was thinking of. It uses the delegator pattern to funnel all
> access through a serial queue. Admittedly, this technique is a bit heavy
> handed (and, I'm just now realizing, somewhat inefficiently implemented...).
> Still, it gets the job done...

Ok, got it. Unsurprisingly HawtDispatch also provides a built-in
queue-backed delegation proxy using Java reflection. Of course, we can
do a lot better than that in Ruby.

>> I'm building a JRuby-based mimic API of the MacRuby GCD API:
>> https://github.com/headius/jcd
>
> That sounds like an excellent idea! One issue that we've come across
> repeatedly in MacRuby is how to teach Ruby programmers to write "grown-up"
> thread-safe code (and, from my half-hearted attempts to follow what's going
> on in JRuby, it seems you've been dealing with this as well). That is,
> something like "@my_array[i] += 1" is safe enough when you have a GIL, but
> is not when you have true low-level reentrancy. (In fact, even @my_array[i]
> = foo is not thread-safe!)

Yes, I think JRuby has helped force the issue for years, and now
MacRuby gets to help :)

Things used to be a lot worse. I remember spending long hours writing
emails and IM responses to people that were almost angry that I
decided JRuby's Array, Hash, and String would not be thread-safe. They
really didn't get why we couldn't just "do what MRI does". It took a
lot of effort and a few favors from key greybeards in the Ruby
community for people to accept that JRuby was making the pragmatic
decision. And hopefully MacRuby has had an easier time selling such
decisions as a result.

The Ruby community has come a long way, but there's obviously more work to do.

> Instead of attempting to bring the world of spin-locks and mutexes (and when
> to use one instead of the other) into Ruby, I think it's probably more
> useful to introduce a transactional programming model. This LtU post comes
> to mind: http://lambda-the-ultimate.org/node/4070

Yes, you're right for most scenarios. There are of course many places
where queues or transactions are far too heavy. What we need to do is
ensure that as better concurrency models become available for
Rubyists, we do what we can to ensure they work across Ruby
implementations and share the load of educating users. That's what I
hope to do with JCD, and I hope someone will be able to do the same
for MRI in a way that works on all its supported platforms.

I have other concurrency-related projects you may be interested in:

The "ruby-atomic" gem, providing explicitly atomic references (and
CAS-like operations) on JRuby and MRI:
https://github.com/headius/ruby-atomic. I collaborated with MenTaLguY
on this one.

The "cloby" gem, which wraps Clojure's STM and Ref to allow creating
(J)Ruby objects with all transactional instance variables:
https://github.com/headius/cloby

I have also worked with MRI and Rubinius folks on various forms of
Multi-VM (MVM) API, providing a standard interface for launching
isolated ruby environments in the same process. Of course JRuby has
been able to do this for years (since org.jruby.Ruby is just another
object), but MVM and similar APIs may be the only way MRI ever gets
access to true in-process concurrency.

> Anyway, I'd be glad to lend a hand (and whatever slice of my limited time I
> can spare) to the effort. In particular, test suites/specs are difficult in
> this space, as the MacRuby team has learned first-hand.

I don't have a lot to offer in this area. We benefited years ago from
the big threadsafety push in Rails 2.2, by providing rails core a
32-way box and letting them beat the hell out of Rails and JRuby under
heavy concurrent load. I know of no similar efforts since, for any
library.

>> What I have now is a mostly-complete Queue and Object, but I'm still
>> figuring out how to map the other libdispatch primitives to what
>> HawtDispatch provides. The library currently is complete enough to run
>> a fork of ControlTower I made called ChaosBazaar:
>> http://github.com/headius/ChaosBazaar
>
> So, now I'm curious: have you gotten to the dispatch_source_t part of
> libdispatch? I wonder, because this is where I would predict the most
> difficulty. In OS X, the dispatch sources are made possible through the use
> of a kqueue, which is kinda like a select but at the kernel level. Also,
> there might be some difficulty in mapping the semantics directly due to the
> high degree of asynchrony achieved by libdispatch. For example: the
> following is perfectly legal:
> myq = Dispatch::Queue.new
> myq.async do
>   myq.cancel!
>   *keep doing stuff on 'myq'*
> end
> ...because cancelation is dispatched, asynchronously, to be executed by the
> same queue (or, well, with the same queueing semantics) as the queue you are
> canceling.

Currently, I have punted on the source logic, mostly because
HawtDispatch only includes out-of-the-box Source support for
selectable IO channels (in which case it would use kqueue on OS X,
epoll on Linux, etc). For the cases I've been trying to support in my
early attempt, source support has not been necessary. I know I'll have
to deal with this soon, however, and that will probably mean working
with HawtDispatch folks to provide a more complete set of source
types.

FWIW, the JVM does have fairly solid support for asynchronous queuing
systems, and the asynchronous IO support is at least passable (so long
as you're dealing with sockets).

> Also, I'm really excited by the clone of ControlTower. I should warn you
> that ControlTower only includes ~40% of the GCD-ishness that was originally
> planned for it. I originally intended that reading from/writing to the
> socket should occur using dispatch_sources, since that is more efficient
> than waiting on a blocking read. In fact, I have (somewhere) a branch with
> this implemented, but the concurrency bugs that developed were so cryptic
> and difficult to chase down, that I eventually gave up...resulting in what
> you have now. I think you'll find the vestiges of these attempts in the
> "gcd-ify" branch...but I can't vouch at all for their functional state.
> Anyway, I think this could be the start of a great thing for Ruby in
> general...

Yes, I was wondering about that. HawtDispatch supports sources against
NIO (Java's "New IO" API) channels, which provides at least one leg of
the libdispatch "source" support. As it stands now I'm really just
using Queue and largely ignoring the Group use that doesn't seem to be
fired unless you do a clean shutdown of the server. My impl is still
mostly at proof-of-concept stage.

I did have to hack around the parser logic, since native extensions
largely mean death for concurrency on JRuby (and by native I mean C
extensions using MRI's API). Instead, I lifted code from Mongrel and
Rack to use Mongrel's parser to populate a Rack environment, and
managed to make the result function well enough to benchmark.

Charles Oliver Nutter

unread,
Jan 25, 2011, 3:13:23 AM1/25/11
to MacRuby development discussions.

I should clarify this... Mongrel has shipped a Java version of its
Ragel-based parser for many years. JCD and ChaosBazaar have no
native/C code whatsoever, and will run on any platform with a
reasonably current JVM.

I'd be happy to roll my CT/CB parser hackage back into mainline if we
can find a clean way to isolate impl-specific backends. It would be
easier to maintain a single CT atop multiple GCD lookalike APIs than
to collaborate on largely-independent forks.

Joshua Ballanco

unread,
Jan 25, 2011, 9:31:20 AM1/25/11
to MacRuby development discussions.
On Tue, Jan 25, 2011 at 3:13 AM, Charles Oliver Nutter <hea...@headius.com> wrote:
On Tue, Jan 25, 2011 at 2:11 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> I did have to hack around the parser logic, since native extensions
> largely mean death for concurrency on JRuby (and by native I mean C
> extensions using MRI's API). Instead, I lifted code from Mongrel and
> Rack to use Mongrel's parser to populate a Rack environment, and
> managed to make the result function well enough to benchmark.

I should clarify this... Mongrel has shipped a Java version of its
Ragel-based parser for many years. JCD and ChaosBazaar have no
native/C code whatsoever, and will run on any platform with a
reasonably current JVM.

I'd be happy to roll my CT/CB parser hackage back into mainline if we
can find a clean way to isolate impl-specific backends. It would be
easier to maintain a single CT atop multiple GCD lookalike APIs than
to collaborate on largely-independent forks.

I agree. I'll think more about this later this evening, but for now I wanted to quickly reply to mention Manfred Stienstra's beginning attempts at a net/http rewrite (https://github.com/Manfred/Net-http2) including a Ragel-based HTTP parser similar to that used by Mongrel. I had originally played with the idea of using this instead of hooking up the parser on my own, but in the end time-constraints won out. I should also mention that the particular way in which ControlTower incorporates the parser via an Obj-C class (instead of using the more vanilla Ruby C API) is partly a consequence of the state of MacRuby when ControlTower was written, but mostly a result of my own inexperience with the C API at the time.

So, yes, I think a more unified interface with a fast, reentrant, standards-compliant HTTP parser would be a great thing for future server developers and probably even the Ruby community as a whole.

- Josh

Laurent Sansonetti

unread,
Jan 26, 2011, 5:45:56 PM1/26/11
to MacRuby development discussions.
Hi Charles!

Sorry for the late response.

As others have noted, in this snippet, #apply is called on a sequential queue (queues created by Queue.new are always sequential), therefore there shouldn't be any problem here. If the queue was concurrent, however, there would be a thread safety issue.

I think this rdoc snippet should be rewritten to avoid confusion.

I see that you're wrapping a GCD-like interface in JRuby, it's very cool! I assume you want your interface to be cross platform, but in the case of JRuby running on Mac OS X, maybe we can extract our code as a C extension, this way JRuby would use the "system" GCD. Maybe we can also work together on creating a good test/spec suite for the GCD interface, because it's currently lacking.

Laurent

On Jan 21, 2011, at 9:57 PM, Charles Oliver Nutter wrote:

Charles Oliver Nutter

unread,
Jan 28, 2011, 8:45:41 PM1/28/11
to MacRuby development discussions.
On Wed, Jan 26, 2011 at 4:45 PM, Laurent Sansonetti
<lsans...@apple.com> wrote:
> Hi Charles!
> Sorry for the late response.
> As others have noted, in this snippet, #apply is called on a sequential
> queue (queues created by Queue.new are always sequential), therefore there
> shouldn't be any problem here. If the queue was concurrent, however, there
> would be a thread safety issue.
> I think this rdoc snippet should be rewritten to avoid confusion.

Yup, I understand now. Thank you!

> I see that you're wrapping a GCD-like interface in JRuby, it's very cool! I
> assume you want your interface to be cross platform, but in the case of
> JRuby running on Mac OS X, maybe we can extract our code as a C extension,
> this way JRuby would use the "system" GCD. Maybe we can also work together
> on creating a good test/spec suite for the GCD interface, because it's
> currently lacking.

To be honest I'm more and more downbeat about C extensions. On JRuby,
we have opted to disallow C extensions from running concurrently,
since there's just too many extensions that mutate global state. They
also make it impossible to spin up multiple JRuby instances in the
same JVM, since ALL C extensions hold references to a specific JRuby
runtime or classes it contains in global state (i.e. no extensions out
there are MVM-safe).

So there are certainly other possiblities:

1. FFI. The libdispatch APIs are all C-based and seem to be pretty
consistent. A single FFI binding might work for everyone including
MRI.
2. Attempt to build off Java/Cocoa integration, as seen in this
attempt: http://pixel.recoil.org/code/rococoa/index.html. I don't know
how reliable that integration is though, or if Apple is going to
continue maintaining it.

A C extension would probably be a "last resort" since it has many limitations.

re: testing: Yes! Are there many tests for this yet? I have not looked
for them, so I have no idea. I'd love to see a complete suite (or at
least complete for the highest-level surface area of functionality)
that we could all try to meet.

Reply all
Reply to author
Forward
0 new messages