Shell Command Substitution and fork()

Edgar Fuß

unread,

Jun 21, 2016, 2:46:26 PM6/21/16

to

I have a shell script that makes heavy use of Command Substitution, i.e.
x="$(some-command)"
The script takes several seconds to execute, mainly because Command
Substitution takes place in a Subshell Environment and that usually means
a fork().
However, the OS X^W^WmacOS ksh only takes tens of milliseconds for the same
job, as does shells/ast-ksh from pkgsrc.
Unsurprisingly, the latter (I'm unable test for the former due to lack of
ktrace or the like) doesn't fork.

Does anyone know how those ksh's achieve that? Are there any drawbacks?
Could ash be taught the same thing?

Or can someone think of a POSIX-compliant way to put the output of a command
(think printf) into a variable without Command Substitution? Unfortunately
printf ... | read x
doesn't do the trick because POSIX allows (and most shells indeed do)
execution of the tail-of-pipe in a Subshell Environment.

Robert Elz

unread,

Jun 21, 2016, 4:39:04 PM6/21/16

to

Date: Tue, 21 Jun 2016 20:46:11 +0200
From: Edgar =?iso-8859-1?B?RnXf?= <e...@math.uni-bonn.de>
Message-ID: <20160621184...@trav.math.uni-bonn.de>

| Does anyone know how those ksh's achieve that? Are there any drawbacks?

I haven't look at their sources, but I would assume they don't fork.

| Could ash be taught the same thing?

It already has code to attempt to do that, but it doesn't currently
work properly in all cases, so has been disabled (long ago). Of course
there is only a big speedup when the command is a shell builtin, but printf
qualifies (as does echo of course) for that. But even for other cases,
when this eventually gets fixed other (simple) commands would be able to
just vfork() - as it is now a full fork() is required.

[As I understand it, the issue is with correct cleanup, especially in the
case of errors - nothing n the sub-shell environ is allowed to affect the
parent shell, so anything in the cmd-sub that changes anything at all has
to be undone - fork() makes that simple to get right, exit() cleans up
everything...]

| Or can someone think of a POSIX-compliant way to put the output of a
| command (think printf) into a variable without Command Substitution?

In general, no, that's what command substitution is for after all
(to get the output of a command into the command line, so it can then
be assigned, or whatevr other use you need it for.)

But in some cases, depending upon exactly what the printf is doing,
there can be other ways.

kre

Pierre-Philipp Braun

unread,

Jun 21, 2016, 4:48:00 PM6/21/16

to

Hello Edgar,

so why don't you use PDKSH (/bin/ksh) or pkgsrc's ast-ksh instead of ash
(/bin/sh) ? Even pkgsrc's GNU Bash would do.

I do x=`command` which is almost the same as $(command), using any KSH
or BASH as interpreter. I don't really care about POSIX here as KSH and
BASH syntax is much more powerful.

Pierre-Philipp

On 21/06/2016 20:46, Edgar Fuß wrote:
> I have a shell script that makes heavy use of Command Substitution, i.e.
> x="$(some-command)"
> The script takes several seconds to execute, mainly because Command
> Substitution takes place in a Subshell Environment and that usually means
> a fork().
> However, the OS X^W^WmacOS ksh only takes tens of milliseconds for the same
> job, as does shells/ast-ksh from pkgsrc.
> Unsurprisingly, the latter (I'm unable test for the former due to lack of
> ktrace or the like) doesn't fork.
>

> Does anyone know how those ksh's achieve that? Are there any drawbacks?

> Could ash be taught the same thing?
>

> Or can someone think of a POSIX-compliant way to put the output of a command

Edgar Fuß

unread,

Jun 21, 2016, 4:55:32 PM6/21/16

to

> so why don't you use PDKSH (/bin/ksh) or pkgsrc's ast-ksh instead of ash
> (/bin/sh) ? Even pkgsrc's GNU Bash would do.

Err, what?
pdksh performs worse that ash. bash surely performs worse. Both fork.

Overall, I don't write for a specific shell, I write for POSIX shells.
While that particular script (part of a much lager system) performs best
with ksh93, other parts may perform best with another shell. I won't switch
shells from component to component.

> I do x=`command` which is almost the same as $(command), using any KSH or
> BASH as interpreter.

Backticks are POSIX, too. A shell that will fork with $(...) will certainly
fork with `...`, too.

> I don't really care about POSIX here as KSH and BASH syntax is much more
> powerful.

It's fine if you don't care about POSIX.

Edgar Fuß

unread,

Jun 21, 2016, 5:27:54 PM6/21/16

to

EF> Unsurprisingly, [ast-ksh] doesn't fork.
EF> Does anyone know how those ksh's achieve that?
kre> I haven't look at their sources, but I would assume they don't fork.
I concur. The most probable way to achieve non-forking is not to fork.

EF> Are there any drawbacks?
kre> [ash] already has code to attempt to do that, but it doesn't currently
kre> work properly in all cases, so has been disabled (long ago).
So it's probably hard to get right.

To re-phrase my question: certain ksh's manage to perform thigs demanded to
take place in a Subshell Environment without fork()ing, whereas all other
shells known to me fork(). So I can think of three possibilities:
1. They are doing something extremely clever/ugly that only works in the
context of ksh's foo-baz-something internals
2. They are doing something that mostly works, but has non-trivial issues,
either violating POSIX or non-POSIX shell common sense.
3. They just got it right while no other shell has managed to do so.

I would hope for 3. so one could plug that code (or concept) into ash.

> Of course there is only a big speedup when the command is a shell builtin

Or a function.
Whereas, with a function, one could circumvent the problem by making the
function assign the result to a global variable instead of printing it.

> [As I understand it, the issue is with correct cleanup, especially in the

> case of errors - nothing [i]n the sub-shell environ is allowed to affect

> the parent shell, so anything in the cmd-sub that changes anything at all
> has to be undone - fork() makes that simple to get right, exit() cleans
> up everything...]

Yes, of course.

> that's what command substitution is for after all

Yes. But although X is for Y after all, Z may as well achieve Y. I always
like to learn new shell trickery.

> But in some cases, depending upon exactly what the printf is doing,
> there can be other ways.

Yes, please?

I'm generally interested in the subject, but the code in question here does
IPv6 address mangling: normalizing to X::Y form, combining (base address,
prefix length, relative address) triples into an address and the like.
The printf's are like
printf "%02X%02X:%02X%02X" "$@"
or
printf "%04X" 0x"$xyz"
or
printf "%0$(( 33 - ${#xyz} ))X" 0
or
printf "%x" 0x"$(printf "%.4s" "${xyz}")"
or
printf "%.$(($2 / 4))s%s\n" "$1" "${3#$(printf "%.$(($2 / 4))s" "$3")}"
(Yes, that's all part of real, working, tested code I wrote. I do declare
it would take me some sec^Wmin^Wwhatever to re-understand it.)

Robert Elz

unread,

Jun 21, 2016, 7:44:59 PM6/21/16

to

Date: Tue, 21 Jun 2016 23:27:41 +0200
From: Edgar =?iso-8859-1?B?RnXf?= <e...@math.uni-bonn.de>
Message-ID: <20160621212...@trav.math.uni-bonn.de>

| So it's probably hard to get right.

Non trivial yes, some of the things that can be done which have to be
undone are var assignments, option changes, redirections, and anything
that does a sys call that affects the shell environ (cd, ulimit, ...)

| I would hope for 3. so one could plug that code (or concept) into ash.

Unfortunately the issues relate to the whole structure of the shell. If
you want ksh93, you can just use it, copying its source code to src/bin/sh
and calling it the NetBSD shell wouldn't really achieve anything useful.

The "whole structure" includes the way the shell does memory management
(it deals with LOTS of temporary strings, etc) error handling, ...

| Or a function.

Yes.

| Whereas, with a function, one could circumvent the problem by making the
| function assign the result to a global variable instead of printing it.

Often, yes.

| > But in some cases, depending upon exactly what the printf is doing,
| > there can be other ways.
| Yes, please?

The stuff that is doing base conversion (or appears to be here) isn't
going to be trivial (of course, one could write a function to do it using
basic arithmetic ( $(( )) ), but that's not going to be very quick most
likely)

| The printf's are like
| printf "%02X%02X:%02X%02X" "$@"
| or
| printf "%04X" 0x"$xyz"

I think those two are just the same thing right? That is, print
a hex number (string) in a fixed width field with leading 0's, using
upper case A-F for the >=10 hex digits (whether the input is upper or lower)

To take just the second one (the first is just the same thing, more or
less, 4 times, and just 2 digits each, with a : stuck in the middle...)
assuming "$@" contains data that is already explicitly hex - if it is actually
doing base conversion (11 --> B) then it would need extra help.

There are two parts to that, one to convert lower case to upper ...

while :
do
case "${xyz}" in
*a*) xyz=${xyz%a*}A${xyz#*a} ;;
*b*) xyz=${xyz%b*}B${xyz#*b} ;;
# same for c d e f
*) break;;
esac
done

and then to make it (at least) 4 chars long with leading 0's as required

case "${xyz}" in
?) xyz=000${xyz} ;;
??) xyz=00${xyz} ;;
???) xyz=0${xyz} ;;
esac

whether that ends up being faster than a fork and printf I have no idea.

| or
| printf "%0$(( 33 - ${#xyz} ))X" 0

That's (I think) making a long string of 0's enough so that when $xyz
is appended it will be 33 chars long

p= # the result is in p
while :
do
case $(( 33 - ${#p} - ${#xyz} )) in
0) break ;;
1) p=0$p; break;;
2) p=00$p; break;;
3) p=000$p; break;;
4) p=0000$p; break;;
*) p=00000$p;;
esac
done

Obviously you can add more cases to make the loop iterate less, in the
extreme you could enumerate all 33 cases and remove the loop completely
(in that case you also don't need ${#p} as that would simply be 0)

| printf "%x" 0x"$(printf "%.4s" "${xyz}")"

That one is just extracting the first 4 chars of the string,
converting hex to lower case, and omitting leading 0's. I'l omit
the case conversion part (can be done as above, backwards)
what's left is ...

x=${xyz} # x is going to be the answer
case "${x}" in
?|??|???|????) ;;
*) y=${x#????}; x=${x%${y}};;
esac
x=${x##0}
test -z "${x}" && x=0

| or
| printf "%.$(($2 / 4))s%s\n" "$1" "${3#$(printf "%.$(($2 / 4))s" "$3")}"

This is slightly trickier, as the length to be extracted from each string
is variable, but it can be done too ($2 is obviously a bit length, so $2/4
is the number of hex digits - except that it is broken for any but count that
isn't an even multiple of 4 ( for 17 you'd get 4 chars, whereas 5 are really
needed - with some bits forced 0).

Unless any of this was going to be in very heavily used code, I wouldn't
even think of any of this however, doing it the simple clear way almost
always wins - people time is generally far more important than a few
cpu cycles.

kre

Robert Elz

unread,

Jun 21, 2016, 8:52:14 PM6/21/16

to

I should add a caution - don't just blindly use those code fragments,
they all (or at least most) have bugs - most easy to fix, but I was
just being quick & dirty with them as examples...

kre