Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parallelism a la make -j <n> / GNU parallel

32 views
Skip to first unread message

Colin McEwan

unread,
May 3, 2012, 2:49:37 PM5/3/12
to bug-...@gnu.org
Hi there,

I don't know if this is anything that has ever been discussed or
considered, but would be interested in any thoughts.

I frequently find myself these days writing shell scripts, to run on
multi-core machines, which could easily exploit lots of parallelism (eg. a
batch of a hundred independent simulations).

The basic parallelism construct of '&' for async execution is highly
expressive, but it's not useful for this sort of use-case: starting up 100
jobs at once will leave them competing, and lead to excessive context
switching and paging.

So for practical purposes, I find myself reaching for 'make -j<n>' or GNU
parallel, both of which destroy the expressiveness of the shell script as I
have to redirect commands and parameters to Makefiles or stdout, and
wrestle with appropriate levels of quoting.

What I would really *like* would be an extension to the shell which
implements the same sort of parallelism-limiting / 'process pooling' found
in make or 'parallel' via an operator in the shell language, similar to '&'
which has semantics of *possibly* continuing asynchronously (like '&') if
system resources allow, or waiting for the process to complete (';').

Any thoughts, anyone?

Thanks!

--
C.

https://plus.google.com/109211294311109803299
https://www.facebook.com/mcewanca

Elliott Forney

unread,
May 3, 2012, 3:21:24 PM5/3/12
to Colin McEwan, bug-...@gnu.org
Here is a construct that I use sometimes... although you might wind up
waiting for the slowest job in each iteration of the loop:


maxiter=100
ncore=8

for iter in $(seq 1 $maxiter)
do
startjob $iter &

if (( (iter % $ncore) == 0 ))
then
wait
fi
done

Colin McEwan

unread,
May 3, 2012, 3:45:57 PM5/3/12
to Elliott Forney, bug-...@gnu.org
Indeed, I've used variations of most of these in the past. :)

My contention is that this is the sort of thing that more people will want to do more frequently, and that this is a reasonable argument in favour of including the functionality *correctly* in the core language for maximum expressiveness without external dependencies.

I just don't know if that fits with the maintenance/extension philosophy applied to bash ;)

--
iC.

Greg Wooledge

unread,
May 3, 2012, 4:01:20 PM5/3/12
to Colin McEwan, bug-...@gnu.org
On Thu, May 03, 2012 at 08:45:57PM +0100, Colin McEwan wrote:
> My contention is that this is the sort of thing that more people will want to
> do more frequently, and that this is a reasonable argument in favour of
> including the functionality *correctly* in the core language for maximum
> expressiveness without external dependencies.

It seems like a rather complex and esoteric feature to be adding to a
shell, but that's ultimately Chet's decision to make, not ours.

John Kearney

unread,
May 3, 2012, 4:12:17 PM5/3/12
to bug-...@gnu.org
I tend to do something more like this


function runJobParrell {
local mjobCnt=${1} && shift
jcnt=0
function WrapJob {
"${@}"
kill -s USR2 $$
}
function JobFinised {
jcnt=$((${jcnt}-1))
}
trap JobFinised USR2
while [ $# -gt 0 ] ; do
while [ ${jcnt} -lt ${mjobCnt} ]; do
jcnt=$((${jcnt}+1))
echo WrapJob "${1}" "${2}"
WrapJob "${1}" "${2}" &
shift 2
done
sleep 1
done
}
function testProcess {
echo "${*}"
sleep 1
}
runJobParrell 2 testProcess "jiji#" testProcess "jiji#" testProcess
"jiji#"

tends to work well enough.
it gets a bit more complex if you want to recover output but not too much.

Greg Wooledge

unread,
May 3, 2012, 4:30:57 PM5/3/12
to John Kearney, bug-...@gnu.org
The real issue here is that there is no generalizable way to store an
arbitrary command for later execution. Your example assumes that each
pair of arguments constitutes one simple command, which is fine if that's
all you need it to do. But the next guy asking for this will want to
schedule arbitrarily complex shell pipelines and complex commands with
here documents and brace expansions and ....

John Kearney

unread,
May 3, 2012, 5:23:11 PM5/3/12
to bug-...@gnu.org
:)
A more complex/flexible example. More like what I actually use.




CNiceLevel=$(nice)
declare -a JobArray
function PushAdvancedCmd {
local IFS=$'\v'
JobArray+=("${*}")
}
function PushSimpleCmd {
PushAdvancedCmd WrapJob ${CNiceLevel} "${@}"
}
function PushNiceCmd {
PushAdvancedCmd WrapJob "${@}"
}
function UnpackCmd {
local IFS=$'\v'
set -o noglob
_RETURN=( .${1}. )
set +o noglob
_RETURN[0]="${_RETURN[0]#.}"
local -i le=${#_RETURN[@]}-1
_RETURN[${le}]="${_RETURN[${le}]%.}"
}
function runJobParrell {
local mjobCnt=${1} && shift
jcnt=0
function WrapJob {
[ ${1} -le ${CNiceLevel} ] || renice -n ${1}
local Buffer=$("${@:2}")
echo "${Buffer}"
kill -s USR2 $$
}
function JobFinised {
jcnt=$((${jcnt}-1))
}
trap JobFinised USR2
while [ $# -gt 0 ] ; do
while [ ${jcnt} -lt ${mjobCnt} ]; do
jcnt=$((${jcnt}+1))
UnpackCmd "${1}"
"${_RETURN[@]}" &
shift
done
sleep 1
done
}



function testProcess {
echo "${*}"
sleep 1
}
# So standard variable args can be handled in 2 ways 1
# encode them as such
PushSimpleCmd testProcess "jiji#" dfds dfds dsfsd
PushSimpleCmd testProcess "jiji#" dfds dfds
PushNiceCmd 20 testProcess "jiji#" dfds
PushSimpleCmd testProcess "jiji#"
PushSimpleCmd testProcess "jiji#" "*" s
# more complex things just wrap them in a function and call it
function DoComplexMagicStuff1 {
echo "${@}" >&2
}
# Or more normally just do a hybrid of both.
PushSimpleCmd DoComplexMagicStuff1 "jiji#"

#

runJobParrell 1 "${JobArray[@]}"



Note there is another level of complexity where I start a JobQueue
Process and issues it commands using a fifo.



John Kearney

unread,
May 3, 2012, 5:59:34 PM5/3/12
to bug-...@gnu.org
This version might be easier to follow. The last version was more for
being able to issue commands via a fifo to a job queue server.

function check_valid_var_name {
case "${1:?Missing Variable Name}" in
[!a-zA-Z_]* | *[!a-zA-Z_0-9]* ) return 3;;
esac
}


CNiceLevel=$(nice)
declare -a JobArray
function PushAdvancedCmd {
local le="tmp_array${#JobArray[@]}"
JobArray+=("${le}")
eval "${le}"'=("${@}")'
}
function PushSimpleCmd {
PushAdvancedCmd WrapJob ${CNiceLevel} "${@}"
}
function PushNiceCmd {
PushAdvancedCmd WrapJob "${@}"
}
function UnpackCmd {
check_valid_var_name ${1} || return $?
eval _RETURN=('"${'"${1}"'[@]}"')
unset "${1}[@]"
}
function runJobParrell {
local mjobCnt=${1} && shift
jcnt=0
function WrapJob {
[ ${1} -le ${CNiceLevel} ] || renice -n ${1}
local Buffer=$("${@:2}")
echo "${Buffer}"
kill -s USR2 $$
}
function JobFinised {
jcnt=$((${jcnt}-1))
}
trap JobFinised USR2
while [ $# -gt 0 ] ; do
while [ ${jcnt} -lt ${mjobCnt} ]; do
jcnt=$((${jcnt}+1))
if UnpackCmd "${1}" ; then
"${_RETURN[@]}" &
else
continue
fi

Mike Frysinger

unread,
May 3, 2012, 6:01:04 PM5/3/12
to bug-...@gnu.org
On Thursday 03 May 2012 16:12:17 John Kearney wrote:
> I tend to do something more like this
>
> function runJobParrell {
> local mjobCnt=${1} && shift
> jcnt=0
> function WrapJob {
> "${@}"
> kill -s USR2 $$
> }

neat trick. all my parallel loops tend to have a fifo of depth N where i push
on pids and when it gets full, wait for the first one. it works moderately
well, except for when a slow job in the pipe chokes and the parent doesn't
push anymore in until that clears.

> function JobFinised {
> jcnt=$((${jcnt}-1))

: $(( --jcnt ))

or a portable version:

: $(( jcnt -= 1 ))

> while [ $# -gt 0 ] ; do
> while [ ${jcnt} -lt ${mjobCnt} ]; do
> jcnt=$((${jcnt}+1))

same math suggestion as above
-mike
signature.asc

Chet Ramey

unread,
May 4, 2012, 8:55:42 AM5/4/12
to Colin McEwan, bug-...@gnu.org, chet....@case.edu
On 5/3/12 2:49 PM, Colin McEwan wrote:

> What I would really *like* would be an extension to the shell which
> implements the same sort of parallelism-limiting / 'process pooling' found
> in make or 'parallel' via an operator in the shell language, similar to '&'
> which has semantics of *possibly* continuing asynchronously (like '&') if
> system resources allow, or waiting for the process to complete (';').

I think the combination of asynchronous jobs and `wait' provides most of
what you need. The already-posted alternatives look like a good start to a
general solution.

If those aren't general enough, how would you specify the behavior of a
shell primitive -- operator or builtin -- that does what you want?

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Mike Frysinger

unread,
May 4, 2012, 12:41:03 PM5/4/12
to bug-...@gnu.org, chet....@case.edu, Colin McEwan
On Friday 04 May 2012 08:55:42 Chet Ramey wrote:
> On 5/3/12 2:49 PM, Colin McEwan wrote:
> > What I would really *like* would be an extension to the shell which
> > implements the same sort of parallelism-limiting / 'process pooling'
> > found in make or 'parallel' via an operator in the shell language,
> > similar to '&' which has semantics of *possibly* continuing
> > asynchronously (like '&') if system resources allow, or waiting for the
> > process to complete (';').
>
> I think the combination of asynchronous jobs and `wait' provides most of
> what you need. The already-posted alternatives look like a good start to a
> general solution.
>
> If those aren't general enough, how would you specify the behavior of a
> shell primitive -- operator or builtin -- that does what you want?

i wish there was a way to use `wait` that didn't block until all the pids
returned. maybe a dedicated option, or a shopt to enable this, or a new
command.

for example, if i launched 10 jobs in the background, i usually want to wait
for the first one to exit so i can queue up another one, not wait for all of
them.
-mike
signature.asc

Greg Wooledge

unread,
May 4, 2012, 12:44:32 PM5/4/12
to Mike Frysinger, bug-...@gnu.org
On Fri, May 04, 2012 at 12:41:03PM -0400, Mike Frysinger wrote:
> i wish there was a way to use `wait` that didn't block until all the pids
> returned. maybe a dedicated option, or a shopt to enable this, or a new
> command.

wait takes arguments.

> for example, if i launched 10 jobs in the background, i usually want to wait
> for the first one to exit so i can queue up another one, not wait for all of
> them.

Do you mean "for *any* one of them", or literally "for the first one"?
The latter, you can do right now -- just pass the PID of the first one.
The former would require that you set up a SIGCHLD trap and do some work.

By the way, "help wait" is misleading; it says you can only pass a single
job ID argument. The man page indicates that you can pass multiple
arguments.

Mike Frysinger

unread,
May 4, 2012, 12:56:19 PM5/4/12
to Greg Wooledge, bug-...@gnu.org
On Friday 04 May 2012 12:44:32 Greg Wooledge wrote:
> On Fri, May 04, 2012 at 12:41:03PM -0400, Mike Frysinger wrote:
> > i wish there was a way to use `wait` that didn't block until all the pids
> > returned. maybe a dedicated option, or a shopt to enable this, or a new
> > command.
>
> wait takes arguments.

yes, and it'll wait for all the ones i specified before returning

> > for example, if i launched 10 jobs in the background, i usually want to
> > wait for the first one to exit so i can queue up another one, not wait
> > for all of them.
>
> Do you mean "for *any* one of them", or literally "for the first one"?

any. maybe `wait -1` will translate into waitpid(-1, ...).
-mike
signature.asc

Andreas Schwab

unread,
May 4, 2012, 1:46:32 PM5/4/12
to Mike Frysinger, Colin McEwan, bug-...@gnu.org, chet....@case.edu
Mike Frysinger <vap...@gentoo.org> writes:

> i wish there was a way to use `wait` that didn't block until all the pids
> returned. maybe a dedicated option, or a shopt to enable this, or a new
> command.
>
> for example, if i launched 10 jobs in the background, i usually want to wait
> for the first one to exit so i can queue up another one, not wait for all of
> them.

If you set -m you can trap on SIGCHLD while waiting.

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Mike Frysinger

unread,
May 4, 2012, 2:53:31 PM5/4/12
to Andreas Schwab, Colin McEwan, bug-...@gnu.org, chet....@case.edu
On Friday 04 May 2012 13:46:32 Andreas Schwab wrote:
> Mike Frysinger <vap...@gentoo.org> writes:
> > i wish there was a way to use `wait` that didn't block until all the pids
> > returned. maybe a dedicated option, or a shopt to enable this, or a new
> > command.
> >
> > for example, if i launched 10 jobs in the background, i usually want to
> > wait for the first one to exit so i can queue up another one, not wait
> > for all of them.
>
> If you set -m you can trap on SIGCHLD while waiting.

awesome, that's a good mitigation

#!/bin/bash
set -m
cnt=0
trap ': $(( --cnt ))' SIGCHLD
for n in {0..20} ; do
(
d=$(( RANDOM % 10 ))
echo $n sleeping $d
sleep $d
) &
: $(( ++cnt ))
if [[ ${cnt} -ge 10 ]] ; then
echo going to wait
wait
fi
done
trap - SIGCHLD
wait

it might be a little racy (wrt checking cnt >= 10 and then doing a wait), but
this is good enough for some things. it does lose visibility into which pids
are live vs reaped, and their exit status, but i more often don't care about
that ...
-mike
signature.asc

John Kearney

unread,
May 4, 2012, 3:02:27 PM5/4/12
to bug-...@gnu.org
That won't work I don't think.
I think you meant something more like this?

set -m
cnt=0
trap ': $(( --cnt ))' SIGCHLD
set -- {0..20}
while [ $# -gt 0 ]; do
if [[ ${cnt} -lt 10 ]] ; then

(
d=$(( RANDOM % 10 ))
echo $n sleeping $d
sleep $d
) &
: $(( ++cnt ))
shift
fi
echo going to wait
sleep 1
done


which is basically what I did in my earlier example except I used USR2
instead of SIGCHLD and put it in a function to make it easier to use.



Greg Wooledge

unread,
May 4, 2012, 3:11:41 PM5/4/12
to John Kearney, bug-...@gnu.org
On Fri, May 04, 2012 at 09:02:27PM +0200, John Kearney wrote:
> set -m
> cnt=0
> trap ': $(( --cnt ))' SIGCHLD
> set -- {0..20}
> while [ $# -gt 0 ]; do
> if [[ ${cnt} -lt 10 ]] ; then
>
> (
> d=$(( RANDOM % 10 ))
> echo $n sleeping $d
> sleep $d
> ) &
> : $(( ++cnt ))
> shift
> fi
> echo going to wait
> sleep 1
> done

You're busy-looping with a 1-second sleep instead of using wait and the
signal handler, which was the whole purpose of the previous example (and
of the set -m that you kept in yours). And $n should probably be $1 there.

Mike Frysinger

unread,
May 4, 2012, 3:13:55 PM5/4/12
to bug-...@gnu.org
On Friday 04 May 2012 15:02:27 John Kearney wrote:
> Am 04.05.2012 20:53, schrieb Mike Frysinger:
> > On Friday 04 May 2012 13:46:32 Andreas Schwab wrote:
> >> Mike Frysinger <vap...@gentoo.org> writes:
> >>> i wish there was a way to use `wait` that didn't block until all the
> >>> pids returned. maybe a dedicated option, or a shopt to enable this,
> >>> or a new command.
> >>>
> >>> for example, if i launched 10 jobs in the background, i usually want to
> >>> wait for the first one to exit so i can queue up another one, not wait
> >>> for all of them.
> >>
> >> If you set -m you can trap on SIGCHLD while waiting.
> >
> > awesome, that's a good mitigation
> >
> > #!/bin/bash
> > set -m
> > cnt=0
> > trap ': $(( --cnt ))' SIGCHLD
> > for n in {0..20} ; do
> >
> > (
> >
> > d=$(( RANDOM % 10 ))
> > echo $n sleeping $d
> > sleep $d
> >
> > ) &
> >
> > : $(( ++cnt ))
> >
> > if [[ ${cnt} -ge 10 ]] ; then
> >
> > echo going to wait
> > wait
> >
> > fi
> >
> > done
> > trap - SIGCHLD
> > wait
> >
> > it might be a little racy (wrt checking cnt >= 10 and then doing a wait),
> > but this is good enough for some things. it does lose visibility into
> > which pids are live vs reaped, and their exit status, but i more often
> > don't care about that ...
>
> That won't work I don't think.

seemed to work fine for me

> I think you meant something more like this?

no. i want to sleep the parent indefinitely and fork a child asap (hence the
`wait`), not busy wait with a one second delay. the `set -m` + SIGCHLD
interrupted the `wait` and allowed it to return.
-mike
signature.asc

John Kearney

unread,
May 4, 2012, 3:25:25 PM5/4/12
to bug-...@gnu.org
The functionality of the code doesn't need SIGCHLD, it still waits till
all the 10 processes are finished before starting the next lot.

it only interrupts the wait to decrement the counter.

to do what your talking about you would have to start the new subprocess
in the SIGCHLD trap.


try this out it might make it clearer what I mean

set -m
cnt=0
trap ': $(( --cnt )); echo SIGCHLD' SIGCHLD
for n in {0..20} ; do
(
d=$(( RANDOM % 10 ))
echo $n sleeping $d
sleep $d
echo $n exiting $d

John Kearney

unread,
May 4, 2012, 3:41:49 PM5/4/12
to bug-...@gnu.org
Am 04.05.2012 21:11, schrieb Greg Wooledge:
> On Fri, May 04, 2012 at 09:02:27PM +0200, John Kearney wrote:
>> set -m
>> cnt=0
>> trap ': $(( --cnt ))' SIGCHLD
>> set -- {0..20}
>> while [ $# -gt 0 ]; do
>> if [[ ${cnt} -lt 10 ]] ; then
>>
>> (
>> d=$(( RANDOM % 10 ))
>> echo $n sleeping $d
>> sleep $d
>> ) &
>> : $(( ++cnt ))
>> shift
>> fi
>> echo going to wait
>> sleep 1
>> done
> You're busy-looping with a 1-second sleep instead of using wait and the
> signal handler, which was the whole purpose of the previous example (and
> of the set -m that you kept in yours). And $n should probably be $1 there.
>
see my response to mike.


what you are thinking about is either what I suggested or something like
this

function TestProcess_22 {
local d=$(( RANDOM % 10 ))
echo $1 sleeping $d
sleep $d
echo $1 exiting $d
}
function trap_SIGCHLD {
echo "SIGCHLD";
if [ $cnt -gt 0 ]; then
: $(( --cnt ))
TestProcess_22 $cnt &
fi
}
set -m
cnt=20
maxJobCnt=10
trap 'trap_SIGCHLD' SIGCHLD
for (( x=0; x<maxJobCnt ; x++ )); do
: $(( --cnt ))
TestProcess_22 $cnt &
done
wait
trap - SIGCHLD




Chet Ramey

unread,
May 4, 2012, 4:17:02 PM5/4/12
to Mike Frysinger, Colin McEwan, Andreas Schwab, bug-...@gnu.org, chet....@case.edu
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/4/12 2:53 PM, Mike Frysinger wrote:

> it might be a little racy (wrt checking cnt >= 10 and then doing a wait), but
> this is good enough for some things. it does lose visibility into which pids
> are live vs reaped, and their exit status, but i more often don't care about
> that ...

What version of bash did you test this on? Bash-4.0 is a little different
in how it treats the SIGCHLD trap.

Would it be useful for bash to set a shell variable to the PID of the just-
reaped process that caused the SIGCHLD trap? That way you could keep an
array of PIDs and, if you wanted, use that variable to keep track of live
and dead children.

Chet
- --
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+kOT4ACgkQu1hp8GTqdKs0iACfdXujLn96piuZGbJaIVu22TZ0
x/EAoIZo01MQJTCsEGvU0zmDghEEPjrA
=DOKm
-----END PGP SIGNATURE-----

Mike Frysinger

unread,
May 5, 2012, 12:28:32 AM5/5/12
to chet....@case.edu, Andreas Schwab, bug-...@gnu.org, Colin McEwan
On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
> On 5/4/12 2:53 PM, Mike Frysinger wrote:
> > it might be a little racy (wrt checking cnt >= 10 and then doing a wait),
> > but this is good enough for some things. it does lose visibility into
> > which pids are live vs reaped, and their exit status, but i more often
> > don't care about that ...
>
> What version of bash did you test this on? Bash-4.0 is a little different
> in how it treats the SIGCHLD trap.

bash-4.2_p28. wait returns 145 (which is SIGCHLD).

> Would it be useful for bash to set a shell variable to the PID of the just-
> reaped process that caused the SIGCHLD trap? That way you could keep an
> array of PIDs and, if you wanted, use that variable to keep track of live
> and dead children.

we've got associative arrays now ... we could have one which contains all the
relevant info:
declare -A BASH_CHILD_STATUS=(
["pid"]=1234
["status"]=1 # WEXITSTATUS()
["signal"]=13 # WTERMSIG()
)
makes it easy to add any other fields people might care about ...
-mike
signature.asc

Mike Frysinger

unread,
May 5, 2012, 12:35:48 AM5/5/12
to bug-...@gnu.org
On Friday 04 May 2012 15:25:25 John Kearney wrote:
> Am 04.05.2012 21:13, schrieb Mike Frysinger:
> > On Friday 04 May 2012 15:02:27 John Kearney wrote:
> >> Am 04.05.2012 20:53, schrieb Mike Frysinger:
> >>> On Friday 04 May 2012 13:46:32 Andreas Schwab wrote:
> >>>> Mike Frysinger <vap...@gentoo.org> writes:
> >>>>> i wish there was a way to use `wait` that didn't block until all the
> >>>>> pids returned. maybe a dedicated option, or a shopt to enable this,
> >>>>> or a new command.
> >>>>>
> >>>>> for example, if i launched 10 jobs in the background, i usually want
> >>>>> to wait for the first one to exit so i can queue up another one, not
> >>>>> wait for all of them.
> >>>>
> >>>> If you set -m you can trap on SIGCHLD while waiting.
> >>>
> >>> awesome, that's a good mitigation
> >>>
> >>> #!/bin/bash
> >>> set -m
> >>> cnt=0
> >>> trap ': $(( --cnt ))' SIGCHLD
> >>> for n in {0..20} ; do
> >>> (
> >>> d=$(( RANDOM % 10 ))
> >>> echo $n sleeping $d
> >>> sleep $d
> >>> ) &
> >>> : $(( ++cnt ))
> >>> if [[ ${cnt} -ge 10 ]] ; then
> >>> echo going to wait
> >>> wait
> >>> fi
> >>> done
> >>> trap - SIGCHLD
> >>> wait
> >>>
> >>> it might be a little racy (wrt checking cnt >= 10 and then doing a
> >>> wait), but this is good enough for some things. it does lose
> >>> visibility into which pids are live vs reaped, and their exit status,
> >>> but i more often don't care about that ...
> >>
> >> That won't work I don't think.
> >
> > seemed to work fine for me
> >
> >> I think you meant something more like this?
> >
> > no. i want to sleep the parent indefinitely and fork a child asap (hence
> > the `wait`), not busy wait with a one second delay. the `set -m` +
> > SIGCHLD interrupted the `wait` and allowed it to return.
>
> The functionality of the code doesn't need SIGCHLD, it still waits till
> all the 10 processes are finished before starting the next lot.

not on my system it doesn't. maybe a difference in bash versions. as soon as
one process quits, the `wait` is interrupted, a new one is forked, and the
parent goes back to sleep until another child exits. if i don't `set -m`,
then i see what you describe -- the wait doesn't return until all 10 children
exit.
-mike
signature.asc

Andreas Schwab

unread,
May 5, 2012, 2:37:45 AM5/5/12
to Mike Frysinger, bug-...@gnu.org
Mike Frysinger <vap...@gentoo.org> writes:

> not on my system it doesn't. maybe a difference in bash versions. as soon as
> one process quits, the `wait` is interrupted, a new one is forked, and the
> parent goes back to sleep until another child exits. if i don't `set -m`,
> then i see what you describe -- the wait doesn't return until all 10 children
> exit.

(bash) Bash POSIX Mode::

46. The arrival of `SIGCHLD' when a trap is set on `SIGCHLD' does not
interrupt the `wait' builtin and cause it to return immediately.
The trap command is run once for each child that exits.

(I think the description is backwards.)

John Kearney

unread,
May 5, 2012, 4:28:50 AM5/5/12
to bug-...@gnu.org
> not on my system it doesn't. maybe a difference in bash versions. as soon as
> one process quits, the `wait` is interrupted, a new one is forked, and the
> parent goes back to sleep until another child exits. if i don't `set -m`,
> then i see what you describe -- the wait doesn't return until all 10 children
> exit.
> -mike
Just to clarify what I see with your code, with the extra echos from me
and less threads so its shorter.
set -m
cnt=0
trap ': $(( --cnt )); echo "SIGCHLD"' SIGCHLD
for n in {0..10} ; do
(
d=$(( RANDOM % 10 ))
echo $n sleeping $d
sleep $d
echo $n exiting $d
) &
: $(( ++cnt ))
if [[ ${cnt} -ge 5 ]] ; then
echo going to wait
wait
echo Back from wait
fi
done
trap - SIGCHLD
wait

gives
0 sleeping 9
2 sleeping 4
going to wait
4 sleeping 7
3 sleeping 4
1 sleeping 6
2 exiting 4
SIGCHLD
3 exiting 4
SIGCHLD
1 exiting 6
SIGCHLD
4 exiting 7
SIGCHLD
0 exiting 9
SIGCHLD
Back from wait
5 sleeping 5
6 sleeping 5
going to wait
8 sleeping 1
9 sleeping 1
7 sleeping 3
9 exiting 1
8 exiting 1
SIGCHLD
SIGCHLD
7 exiting 3
SIGCHLD
6 exiting 5
SIGCHLD
5 exiting 5




now
this code
function TestProcess_22 {
local d=$(( RANDOM % 10 ))
echo $1 sleeping $d
sleep $d
echo $1 exiting $d
}
function trap_SIGCHLD {
echo "SIGCHLD";
if [ $cnt -gt 0 ]; then
: $(( --cnt ))
TestProcess_22 $cnt &
fi
}
set -m
cnt=10
maxJobCnt=5
trap 'trap_SIGCHLD' SIGCHLD
for (( x=0; x<maxJobCnt ; x++ )); do
: $(( --cnt ))
TestProcess_22 $cnt &
done
wait
trap - SIGCHLD

9 sleeping 5
7 sleeping 0
5 sleeping 8
6 sleeping 5
8 sleeping 6
7 exiting 0
SIGCHLD
4 sleeping 9
9 exiting 5
SIGCHLD
3 sleeping 5
6 exiting 5
SIGCHLD
2 sleeping 8
8 exiting 6
SIGCHLD
1 sleeping 8
5 exiting 8
SIGCHLD
0 sleeping 6
4 exiting 9
SIGCHLD
3 exiting 5
SIGCHLD
2 exiting 8
SIGCHLD
0 exiting 6
SIGCHLD
1 exiting 8
SIGCHLD




bash --version
GNU bash, version 4.2.24(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

uname -a
Linux DETH00 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

Ben Pfaff

unread,
May 5, 2012, 12:36:21 PM5/5/12
to Colin McEwan, bug-...@gnu.org
Colin McEwan <colin....@gmail.com> writes:

> I frequently find myself these days writing shell scripts, to run on
> multi-core machines, which could easily exploit lots of parallelism (eg. a
> batch of a hundred independent simulations).
>
> The basic parallelism construct of '&' for async execution is highly
> expressive, but it's not useful for this sort of use-case: starting up 100
> jobs at once will leave them competing, and lead to excessive context
> switching and paging.

Autotest testsuites, which are written in Bourne shell, do this
if you pass in a make-like "-j<N>" option. Have you had a look
at how they are implemented?

John Kearney

unread,
May 5, 2012, 11:25:26 PM5/5/12
to bug-...@gnu.org
Am 05.05.2012 06:28, schrieb Mike Frysinger:
> On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
>> On 5/4/12 2:53 PM, Mike Frysinger wrote:
>>> it might be a little racy (wrt checking cnt >= 10 and then doing a wait),
>>> but this is good enough for some things. it does lose visibility into
>>> which pids are live vs reaped, and their exit status, but i more often
>>> don't care about that ...
>> What version of bash did you test this on? Bash-4.0 is a little different
>> in how it treats the SIGCHLD trap.
> bash-4.2_p28. wait returns 145 (which is SIGCHLD).
>
>> Would it be useful for bash to set a shell variable to the PID of the just-
>> reaped process that caused the SIGCHLD trap? That way you could keep an
>> array of PIDs and, if you wanted, use that variable to keep track of live
>> and dead children.
> we've got associative arrays now ... we could have one which contains all the
> relevant info:
> declare -A BASH_CHILD_STATUS=(
> ["pid"]=1234
> ["status"]=1 # WEXITSTATUS()
> ["signal"]=13 # WTERMSIG()
> )
> makes it easy to add any other fields people might care about ...
> -mike
Is there actually a guarantee that there will be 1 SIGCHLD for every
exited process.
Isn't it actually a race condition?
what happens if 2 subprocesses exit simultaneously.
or if a process exits while already in the SIGCHLD trap.
I mean my normal interpretation of a interrupt/event/trap is just a
notification that I need to check what has happened. Or that there was
an event not the extent of the event?
I keep feeling that the following is bad practice

trap ': $(( --cnt ))' SIGCHLD

and would be better something like this

trap 'cnt=$(jobs -p | wc -w)' SIGCHLD


as such you would need something more like.
declare -a BASH_CHILD_STATUS=([1234]=1 [1235]=1 [1236]=1)

declare -a BASH_CHILD_STATUS_SIGNAL=([1234]=13 [1235]=13 [1236]=13)





Mike Frysinger

unread,
May 6, 2012, 2:28:23 AM5/6/12
to bug-...@gnu.org, John Kearney
On Saturday 05 May 2012 04:28:50 John Kearney wrote:
> Am 05.05.2012 06:35, schrieb Mike Frysinger:
> > On Friday 04 May 2012 15:25:25 John Kearney wrote:
> >> Am 04.05.2012 21:13, schrieb Mike Frysinger:
> >>> On Friday 04 May 2012 15:02:27 John Kearney wrote:
> >>>> Am 04.05.2012 20:53, schrieb Mike Frysinger:
> >>>>> On Friday 04 May 2012 13:46:32 Andreas Schwab wrote:
> >>>>>> Mike Frysinger writes:
> >>>>>>> i wish there was a way to use `wait` that didn't block until all
> >>>>>>> the pids returned. maybe a dedicated option, or a shopt to enable
> >>>>>>> this, or a new command.
> >>>>>>>
> >>>>>>> for example, if i launched 10 jobs in the background, i usually
> >>>>>>> want to wait for the first one to exit so i can queue up another
> >>>>>>> one, not wait for all of them.
> >>>>>>
> >>>>>> If you set -m you can trap on SIGCHLD while waiting.
> >>>>>
> >>>>> awesome, that's a good mitigation
> >>>>>
> >>>>> #!/bin/bash
> >>>>> set -m
> >>>>> cnt=0
> >>>>> trap ': $(( --cnt ))' SIGCHLD
> >>>>> for n in {0..20} ; do
> >>>>>
> >>>>> (
> >>>>>
> >>>>> d=$(( RANDOM % 10 ))
> >>>>> echo $n sleeping $d
> >>>>> sleep $d
> >>>>>
> >>>>> ) &
> >>>>>
> >>>>> : $(( ++cnt ))
> >>>>>
> >>>>> if [[ ${cnt} -ge 10 ]] ; then
> >>>>>
> >>>>> echo going to wait
> >>>>> wait
> >>>>>
> >>>>> fi
> >>>>>
> >>>>> done
> >>>>> trap - SIGCHLD
> >>>>> wait
> >>>>>
> >>>>> it might be a little racy (wrt checking cnt >= 10 and then doing a
> >>>>> wait), but this is good enough for some things. it does lose
> >>>>> visibility into which pids are live vs reaped, and their exit status,
> >>>>> but i more often don't care about that ...
> >>>>
> >>>> That won't work I don't think.
> >>>
> >>> seemed to work fine for me
> >>>
> >>>> I think you meant something more like this?
> >>>
> >>> no. i want to sleep the parent indefinitely and fork a child asap
> >>> (hence the `wait`), not busy wait with a one second delay. the `set
> >>> -m` + SIGCHLD interrupted the `wait` and allowed it to return.
> >>
> >> The functionality of the code doesn't need SIGCHLD, it still waits till
> >> all the 10 processes are finished before starting the next lot.
> >
> > not on my system it doesn't. maybe a difference in bash versions. as
> > soon as one process quits, the `wait` is interrupted, a new one is
> > forked, and the parent goes back to sleep until another child exits. if
> > i don't `set -m`, then i see what you describe -- the wait doesn't
> > return until all 10 children exit.
>
> Just to clarify what I see with your code, with the extra echos from me
> and less threads so its shorter.

that's not what i was getting. as soon as i saw the echo of SIGCHLD, a new
"sleeping" would get launched.
-mike
signature.asc

Mike Frysinger

unread,
May 6, 2012, 2:28:26 AM5/6/12
to bug-...@gnu.org, John Kearney
On Saturday 05 May 2012 23:25:26 John Kearney wrote:
> Am 05.05.2012 06:28, schrieb Mike Frysinger:
> > On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
> >> On 5/4/12 2:53 PM, Mike Frysinger wrote:
> >>> it might be a little racy (wrt checking cnt >= 10 and then doing a
> >>> wait), but this is good enough for some things. it does lose
> >>> visibility into which pids are live vs reaped, and their exit status,
> >>> but i more often don't care about that ...
> >>
> >> What version of bash did you test this on? Bash-4.0 is a little
> >> different in how it treats the SIGCHLD trap.
> >
> > bash-4.2_p28. wait returns 145 (which is SIGCHLD).
> >
> >> Would it be useful for bash to set a shell variable to the PID of the
> >> just- reaped process that caused the SIGCHLD trap? That way you could
> >> keep an array of PIDs and, if you wanted, use that variable to keep
> >> track of live and dead children.
> >
> > we've got associative arrays now ... we could have one which contains all
> > the relevant info:
> > declare -A BASH_CHILD_STATUS=(
> > ["pid"]=1234
> > ["status"]=1 # WEXITSTATUS()
> > ["signal"]=13 # WTERMSIG()
> > )
> >
> > makes it easy to add any other fields people might care about ...
>
> Is there actually a guarantee that there will be 1 SIGCHLD for every
> exited process.
> Isn't it actually a race condition?

when SIGCHLD is delivered doesn't matter. the child stays in a zombie state
until the parent calls wait() on it and gets its status. so you can have
`wait` return one child's status at a time.
-mike
signature.asc

John Kearney

unread,
May 6, 2012, 3:25:27 AM5/6/12
to Mike Frysinger, bug-...@gnu.org
but I think my point still stands
trap ': $(( cnt-- ))' SIGCHLD
is a bad idea, you actually need to verify how many jobs are running not
just arbitrarily decrement a counter, because your not guaranteed a trap
for each process. I mean sure it will normally work, but its not
guaranteed to work.

Also I think the question would be is there any point in forcing bash to
issue 1 status at a time? It seems to make more sense to issue them in
bulk.
So bash could populate an array of all reaped processes in one trap
rather than having to execute multiple traps. This is what bash does
internally anyway?


John Kearney

unread,
May 6, 2012, 3:35:13 AM5/6/12
to Mike Frysinger, bug-...@gnu.org
Am 06.05.2012 08:28, schrieb Mike Frysinger:
>>>>>>> it might be a little racy (wrt checking cnt >= 10 and then doing a
>>>>>>> wait), but this is good enough for some things. it does lose
>>>>>>> visibility into which pids are live vs reaped, and their exit status,
>>>>>>> but i more often don't care about that ...
>>>>>> That won't work I don't think.
>>>>> seemed to work fine for me
>>>>>
>>>>>> I think you meant something more like this?
>>>>> no. i want to sleep the parent indefinitely and fork a child asap
>>>>> (hence the `wait`), not busy wait with a one second delay. the `set
>>>>> -m` + SIGCHLD interrupted the `wait` and allowed it to return.
>>>> The functionality of the code doesn't need SIGCHLD, it still waits till
>>>> all the 10 processes are finished before starting the next lot.
>>> not on my system it doesn't. maybe a difference in bash versions. as
>>> soon as one process quits, the `wait` is interrupted, a new one is
>>> forked, and the parent goes back to sleep until another child exits. if
>>> i don't `set -m`, then i see what you describe -- the wait doesn't
>>> return until all 10 children exit.
>> Just to clarify what I see with your code, with the extra echos from me
>> and less threads so its shorter.
> that's not what i was getting. as soon as i saw the echo of SIGCHLD, a new
> "sleeping" would get launched.
> -mike
Ok then, thats weird because it doesn't really make sense to me why a
SIGCHLD would interrupt the wait command. Oh well.

Mike Frysinger

unread,
May 6, 2012, 3:49:10 AM5/6/12
to John Kearney, bug-...@gnu.org
On Sunday 06 May 2012 03:25:27 John Kearney wrote:
> Am 06.05.2012 08:28, schrieb Mike Frysinger:
> > On Saturday 05 May 2012 23:25:26 John Kearney wrote:
> >> Am 05.05.2012 06:28, schrieb Mike Frysinger:
> >>> On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
> >>>> On 5/4/12 2:53 PM, Mike Frysinger wrote:
> >>>>> it might be a little racy (wrt checking cnt >= 10 and then doing a
> >>>>> wait), but this is good enough for some things. it does lose
> >>>>> visibility into which pids are live vs reaped, and their exit status,
> >>>>> but i more often don't care about that ...
> >>>>
> >>>> What version of bash did you test this on? Bash-4.0 is a little
> >>>> different in how it treats the SIGCHLD trap.
> >>>
> >>> bash-4.2_p28. wait returns 145 (which is SIGCHLD).
> >>>
> >>>> Would it be useful for bash to set a shell variable to the PID of the
> >>>> just- reaped process that caused the SIGCHLD trap? That way you could
> >>>> keep an array of PIDs and, if you wanted, use that variable to keep
> >>>> track of live and dead children.
> >>>
> >>> we've got associative arrays now ... we could have one which contains
> >>> all the relevant info:
> >>> declare -A BASH_CHILD_STATUS=(
> >>> ["pid"]=1234
> >>> ["status"]=1 # WEXITSTATUS()
> >>> ["signal"]=13 # WTERMSIG()
> >>> )
> >>>
> >>> makes it easy to add any other fields people might care about ...
> >>
> >> Is there actually a guarantee that there will be 1 SIGCHLD for every
> >> exited process.
> >> Isn't it actually a race condition?
> >
> > when SIGCHLD is delivered doesn't matter. the child stays in a zombie
> > state until the parent calls wait() on it and gets its status. so you
> > can have `wait` return one child's status at a time.
>
> but I think my point still stands
> trap ': $(( cnt-- ))' SIGCHLD
> is a bad idea, you actually need to verify how many jobs are running not
> just arbitrarily decrement a counter, because your not guaranteed a trap
> for each process. I mean sure it will normally work, but its not
> guaranteed to work.

if `wait` setup BASH_CHILD_STATUS, then you wouldn't need the SIGCHLD trap at
all. you could just do `wait`, get the info from BASH_CHILD_STATUS as to what
child exactly was just reaped, and then proceed.

as to the underlying question, since it's possible for bash itself to receive
stacked SIGCHLDs, there's no reason it wouldn't be able to execute the trap
the right number of times.
-mike
signature.asc

Chet Ramey

unread,
May 7, 2012, 8:39:10 AM5/7/12
to Andreas Schwab, bug-...@gnu.org, chet....@case.edu
On 5/5/12 2:37 AM, Andreas Schwab wrote:

> (bash) Bash POSIX Mode::
>
> 46. The arrival of `SIGCHLD' when a trap is set on `SIGCHLD' does not
> interrupt the `wait' builtin and cause it to return immediately.
> The trap command is run once for each child that exits.
>
> (I think the description is backwards.)

You're right; the description is backwards. When in Posix mode, SIGCHLD
interrupts the wait builtin.

Chet Ramey

unread,
May 7, 2012, 8:56:54 AM5/7/12
to John Kearney, bug-...@gnu.org, chet....@case.edu
On 5/5/12 11:25 PM, John Kearney wrote:

> Is there actually a guarantee that there will be 1 SIGCHLD for every
> exited process.

The manual page says, under JOB CONTROL:

Any trap on SIGCHLD is executed for each child that exits.

> Isn't it actually a race condition?

No. waitpid() returns once for each child that exits. That's why you
loop calling waitpid() when you get a SIGCHLD. It keeps returning PIDs
until there are no more unreaped children, then it returns -1.

Chet

Chet Ramey

unread,
May 7, 2012, 9:00:33 AM5/7/12
to John Kearney, bug-...@gnu.org, chet....@case.edu
>>> Is there actually a guarantee that there will be 1 SIGCHLD for every
>>> exited process.
>>> Isn't it actually a race condition?
>> when SIGCHLD is delivered doesn't matter. the child stays in a zombie state
>> until the parent calls wait() on it and gets its status. so you can have
>> `wait` return one child's status at a time.
>> -mike
> but I think my point still stands
> trap ': $(( cnt-- ))' SIGCHLD
> is a bad idea, you actually need to verify how many jobs are running not
> just arbitrarily decrement a counter, because your not guaranteed a trap
> for each process. I mean sure it will normally work, but its not
> guaranteed to work.

It's certainly more robust to keep track of children that are running by
PID, but bash will run the SIGCHLD trap once for each child that exits.
(This doesn't work while using `wait' in Posix mode, since Posix requires
the trapped SIGCHLD to cause `wait' to return.)

Chet Ramey

unread,
May 7, 2012, 9:08:33 AM5/7/12
to Mike Frysinger, Colin McEwan, Andreas Schwab, bug-...@gnu.org, chet....@case.edu
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/5/12 12:28 AM, Mike Frysinger wrote:
> On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
>> On 5/4/12 2:53 PM, Mike Frysinger wrote:
>>> it might be a little racy (wrt checking cnt >= 10 and then doing a wait),
>>> but this is good enough for some things. it does lose visibility into
>>> which pids are live vs reaped, and their exit status, but i more often
>>> don't care about that ...
>>
>> What version of bash did you test this on? Bash-4.0 is a little different
>> in how it treats the SIGCHLD trap.
>
> bash-4.2_p28. wait returns 145 (which is SIGCHLD).

I wonder if you were running in Posix mode. Posix says

"When the shell is waiting, by means of the wait utility, for asynchronous
commands to complete, the reception of a signal for which a trap has been
set shall cause the wait utility to return immediately with an exit status
>128, immediately after which the trap associated with that signal shall
be taken."

The trapped SIGCHLD has to force `wait' to return.

- --
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+nyVEACgkQu1hp8GTqdKuKfgCgi1B9wyK1IbVGxWrmkIz4BgDj
+4EAn25jL8YlSjp3VWBseFqwjNWWpWxk
=/unM
-----END PGP SIGNATURE-----

Mike Frysinger

unread,
May 11, 2012, 10:53:30 AM5/11/12
to chet....@case.edu, Andreas Schwab, bug-...@gnu.org, Colin McEwan
On Monday 07 May 2012 09:08:33 Chet Ramey wrote:
> On 5/5/12 12:28 AM, Mike Frysinger wrote:
> > On Friday 04 May 2012 16:17:02 Chet Ramey wrote:
> >> On 5/4/12 2:53 PM, Mike Frysinger wrote:
> >>> it might be a little racy (wrt checking cnt >= 10 and then doing a
> >>> wait), but this is good enough for some things. it does lose
> >>> visibility into which pids are live vs reaped, and their exit status,
> >>> but i more often don't care about that ...
> >>
> >> What version of bash did you test this on? Bash-4.0 is a little
> >> different in how it treats the SIGCHLD trap.
> >
> > bash-4.2_p28. wait returns 145 (which is SIGCHLD).
>
> I wonder if you were running in Posix mode. Posix says

yes, i think that is what i was doing `sh ./test.sh`
-mike
signature.asc

Ole Tange

unread,
May 11, 2012, 5:57:33 PM5/11/12
to bug-...@gnu.org
On Thu, 3 May 2012 19:49:37, Colin McEwan wrote:

> I frequently find myself these days writing shell scripts, to run on
> multi-core machines, which could easily exploit lots of parallelism (eg. a
> batch of a hundred independent simulations).
>
> The basic parallelism construct of '&' for async execution is highly
> expressive, but it's not useful for this sort of use-case: starting up 100
> jobs at once will leave them competing, and lead to excessive context
> switching and paging.
>
> So for practical purposes, I find myself reaching for 'make -j<n>' or GNU
> parallel, both of which destroy the expressiveness of the shell script as I
> have to redirect commands and parameters to Makefiles or stdout, and
> wrestle with appropriate levels of quoting.
>
> What I would really *like* would be an extension to the shell which
> implements the same sort of parallelism-limiting / 'process pooling' found
> in make or 'parallel' via an operator in the shell language, similar to '&'
> which has semantics of *possibly* continuing asynchronously (like '&') if
> system resources allow, or waiting for the process to complete (';').
>
> Any thoughts, anyone?

Can you explain how that idea would differ from sem (Part of GNU Parallel)?

Example from the man page:

Run one gzip process per CPU core. Block until a CPU core
becomes available.

for i in `ls *.log` ; do
echo $i
sem -j+0 gzip $i ";" echo done
done
sem --wait

For quoting --shellquote in GNU Parallel may be of help.

/Ole
--
Did you get your GNU Parallel merchandise?
https://www.gnu.org/software/parallel/merchandise.html

Linda Walsh

unread,
May 12, 2012, 3:34:48 AM5/12/12
to Ole Tange, bug-bash


Ole Tange wrote:

> Can you explain how that idea would differ from sem (Part of GNU Parallel)?

----
Because gnu parallel is written in perl? And well, writing it in
perl.... that's near easy... did that about ... 8 years ago? in perl...
to encode albums in FLAC or LAME -- about 35-45 seconds/album...on my old
machine. But perl broke the script, multiple times .. (upgrades in perl)...

So am rewriting it...

Doing it in shell... that would be a 'new' challenge... ;-)

And people called me masochistic for trying to write complex
progs in shell ...

Actually my first parallel encode was about 20 lines of shell...
but it just delayed launch of each job by some constant 'k', Wasn't too
efficient, as it usually ran too many jobs at once.

I think I switched one using job control and keeping track of
the outstanding jobs using the jobs command. With arrays and hashes (assoc
arrays), it would be easier to be more flexible...

maybe 'par' or something similar needs to be ported into a
"dll", that could be dropped into bash as an extension..?

Hey... for that matter... um.. perl.dll... hmmmm...


Ole Tange

unread,
May 12, 2012, 8:06:09 AM5/12/12
to Linda Walsh, bug-bash
On Sat, May 12, 2012 at 9:34 AM, Linda Walsh <ba...@tlinx.org> wrote:
>
> Ole Tange wrote:
>
>> Can you explain how that idea would differ from sem (Part of GNU
>> Parallel)?
>
>        Because gnu parallel is written in perl?  And well, writing it in
> perl.... that's near easy... did that about ... 8 years ago? in perl...
> to encode albums in FLAC or LAME -- about 35-45 seconds/album...on my old
> machine.  But perl broke the script, multiple times .. (upgrades in perl)...

I have been the maintainer of GNU Parallel for the past 11 years. It
has cost years of work to get it to work on all platforms for all
versions for all corner cases. It has never broken because of a perl
upgrade. So I am quite baffled when you say it is near easy.

Maybe you really mean that it is easy to get it to work for some
platforms for some versions for some corner cases? I will agree to
that, but I would never characterize that as production quality code.

> So am rewriting it...
>
>        Doing it in shell... that would be a 'new' challenge... ;-)

I fully understand the thrill in doing something again that has
already been done - especially if it can be done better (Case in
point: GNU Parallel vs. xargs). I also understand the concept of doing
something that has already been done - just to see if you can do it
yourself (e.g. I wrote a quine just to see if I could).

What I do not understand is wanting help to do something that has
already been done better.

In the GNU Project we have several projects that could benefit from help:
https://www.fsf.org/campaigns/priority-projects/ so I would find it
wasteful to spend time doing something again - especially if the goal
is not to solve the problem better. I would encourage you to spend
your time doing something that has not been done before, or improve
existing code.

Linda Walsh

unread,
May 12, 2012, 11:35:35 AM5/12/12
to Ole Tange, bug-bash
Ole Tange wrote:

> On Sat, May 12, 2012 at 9:34 AM, Linda Walsh <ba...@tlinx.org> wrote:
>> Ole Tange wrote:
>>
>>> Can you explain how that idea would differ from sem (Part of GNU
>>> Parallel)?
>> � � � �Because gnu parallel is written in perl? �And well, writing it in
>> perl.... that's near easy... did that about ... 8 years ago? in perl...
>> to encode albums in FLAC or LAME -- about 35-45 seconds/album...on my old
>> machine. �But perl broke the script, multiple times .. (upgrades in perl)...
>
> I have been the maintainer of GNU Parallel for the past 11 years. It
> has cost years of work to get it to work on all platforms for all
> versions for all corner cases. It has never broken because of a perl
> upgrade. So I am quite baffled when you say it is near easy.
>
> Maybe you really mean that it is easy to get it to work for some
> platforms for some versions for some corner cases? I will agree to
> that, but I would never characterize that as production quality code.
>
>> So am rewriting it...
>>
>> � � � �Doing it in shell... that would be a 'new' challenge... ;-)
>
> I fully understand the thrill in doing something again that has

----
I think you missed the 'wink'... and the following comment:

"And people called me masochistic for trying to write complex
progs in shell ... "

Notice the original adjective before the word 'easy'. It's vital in
understanding, it's a classic example of the relative definition of "near".

Example: Travel to the nearest solar system outside ours is near easy compared
to travel to the next galaxy... *ahem*...*cough* *cough*.


Sorry, but I'm sure it WASN'T easy to get it to work natively on all platforms,
especially if Win32 or strawberry perl is included in that "all".

Even getting it to work correctly on 1 platform is less than trivial --
as mine runs jobs based on #cpus, a simple check of /proc/cpuinfo won't work so
well on non-*nix compat's... so there was no attempt at comparison, and mine
isn't even working right now....(didn't start in perl...mostly due to usage
of 'use Exporter/EXPORTS' and multiple packages in same file -- and semantics in
those areas changing quite a bit.

If you did it all in 1 package OR did it in straight OO (no 'Export/Imports),
you would like have avoided most of the problems I ran into.

As was suggested by others -- I also found a need to use HASH's to
map exit status values from each job, back to their original caller to point to
which job failed (if one failed).

Also one would want to trap Control-C, so one can terminate outstanding
jobs.

I had no need for a semaphore -- and was able to use sleep without fear of
waking up multiple jobs, as I only scheduled as many jobs as desired at one time
-- i.e. say running 20 convert jobs, but my cpu count=2 & extra loading factor
was 2, allowing for 4 jobs to run at once. The first foure would run
immediately, but the next 16 wouldn't get scheduled for execution until one of
the first ones exited, then they are scheduled 1-by-1 as done processes finish.

So for me, my 'semaphore' count was 'k1+k2', where k1 was based on # cpu's in
system and k2 was specific to how much overlap the particular machine I was
running on needed to keep the cpu's @ 100% ...Once I decide on those I just
have to simple accounting for number of launched procs & # of child deaths to
maintain a running score. So my limiting semaphore was based on outstanding
children -- if I wanted something not based on outstanding children, I might
have to use a semaphore... But my algorithm didn't require an arbitrary number,
so no need for that added complexity.



> already been done - especially if it can be done better (Case in
> point: GNU Parallel vs. xargs). I also understand the concept of doing
> something that has already been done - just to see if you can do it
> yourself (e.g. I wrote a quine just to see if I could).
>
> What I do not understand is wanting help to do something that has
> already been done better.

----
Huh?

I think people were discussing ways it might be done in shell, not
asking for help for any specific implementation -- or some were asking that
something like parallel be a built-in (my suggestion of a ".dll" (or .so)).

They challenge would be writing something as complex as 'parallel' in
shell... which is being discussed, in more primitive forms -- beyond which, I
think is left as an exercise for the reader...

And of course, seeing yours was written in perl, I couldn't help but muse a bit
on having a "perl builtin" keyword as, most installations having perl already
have it built as a dynamic lib. However, not knowing how or why it would be
beneficial/useful to do that of the top of my head, I left that idea with "..."...

Does that make my idiosyncratic and culturally-"contexted"-obfuscating
statement(s) more clear? ;-)


Linda


Greg Wooledge

unread,
May 14, 2012, 8:31:25 AM5/14/12
to Ole Tange, bug-...@gnu.org
On Fri, May 11, 2012 at 11:57:33PM +0200, Ole Tange wrote:
> Example from the man page:
>
> Run one gzip process per CPU core. Block until a CPU core
> becomes available.
>
> for i in `ls *.log` ; do
> echo $i
> sem -j+0 gzip $i ";" echo done
> done
> sem --wait

If this example is in the man page, then it should be fixed:

for i in *.log ; do
0 new messages