Hi John,
I know what's going on. You're actually falling afoul of a performance optimization we added in 1.7.1 (this is the first time I've seen it bite someone). The feature that's involved here is what we call "spawn-cache" (as in --disable-spawn-cache). Fundamentally, forks ("spawns") are accumulated in a thread-specific buffer and are only made visible to the scheduler when a scheduling operation occurs (e.g. a sleep or a yield or a blocking operation). This provides a rather significant speedup in some circumstances by reducing the contention for the scheduler's queue (we can talk in greater detail about this, if you're interested), but in your case, it's preventing those threads from *ever* being visible to the scheduler. The reason that usleep(1) changes things is, most likely, because it's being intercepted by qthreads and turned into a scheduler event (and thus flushing the spawn cache). Another way of doing it is to call qthread_yield(), which is a scheduling operation. Alternatively, you can simply disable the spawn cache in qthreads. At some point (relatively soon, I think), we'll add a function to manually flush the spawn cache. I'm open to other ideas too, if you have any.
--
Kyle B. Wheeler
Dept. 1423: Scalable System Software
Sandia National Laboratories
505-844-0394