Opinions on default built-ins list?

42 views
Skip to first unread message

Martijn Dekker

unread,
Nov 25, 2021, 4:58:35 PM11/25/21
to Korn Shell
I'm looking for some community input for ksh 93u+m on the default set of
path-bound commands built in to the ksh binary. Providing commands as
built-ins can greatly increase performance for those commands, but
increases the size of the ksh binary (particularly as they all have
built-in --man documentation as well), so a tradeoff is necessary.
Shipping all of them by default would be far too much bloat.
Distributors and power users can of course make their own choices when
compiling ksh.

The source distribution comes with many such commands as part of libcmd
(these commands are built in when /opt/ast/bin is prefixed to $PATH, or
as of ksh 93u+m, you can also invoke /opt/ast/bin/somecommand directly).
The full list of possibilities is:

basename cp head mv sync
cat cut id paste tail
chgrp date join pathchk tee
chmod dirname ln pids tty
chown expr logname rev uname
cksum fds md5sum rm uniq
cmdinit fmt mkdir rmdir wc
cmp fold mkfifo stty
comm getconf mktemp sum

Only a few of them are chosen to be compiled in by default -- a
selection originally made by AT&T. However, I don't think their default
selection makes a lot of sense:

$ builtin | grep ^/
/opt/ast/bin/basename
/opt/ast/bin/cat
/opt/ast/bin/chmod
/opt/ast/bin/cmp
/opt/ast/bin/cut
/opt/ast/bin/dirname
/opt/ast/bin/getconf
/opt/ast/bin/head
/opt/ast/bin/logname
/opt/ast/bin/mkdir
/opt/ast/bin/sync
/opt/ast/bin/uname
/opt/ast/bin/wc

IMO, good choices are: basename, cat, cut, dirname. Those are often used
in performance-sensitive code paths like loops, and/or in command
substitutions. Having these as built-ins on $PATH can greatly increase
performance.

We may want to keep getconf as well since it can report on some of the
internal libast state, which an external getconf clearly cannot.
Although we might want to consider if it should be in release builds by
default or perhaps only in development builds. The AST userland universe
is dead except for ksh, so it doesn't seem likely many people still run
scripts that depend on getconf's AST-specific functionality.

I'm not so sure about having the following as defaults, I'd be inclined
to remove them unless someone can give me a reason why they should stay:

* chmod. Some scripts do change the permission of lots of files, but
if they care about performance then xargs chmod… is typically used
which, being external, will not invoke any built-ins.
* cmp. I think it's relatively rarely used. People use diff a lot more,
but it's external.
* head. Why have this, but not tail? Also, neither of these are
typically performance-sensitive. Some process that produces a lot of
output gets piped into head or tail and that generally only needs to
be done once.
* uname. This is fairly frequently used in system scripts but if the
script is any good, it'll store the value in a variable before using
it in any performance-sensitive manner.
* wc. This is for counting lines/words in files. I don't think this is
commonly used in a loop either, though I could be wrong.

I think the following are almost certainly unnecessary defaults, and I
would need a pretty solid reason to keep them:

* logname. Getting your login name only needs to be done once.
* sync. This generally only needs to be done at shutdown time by system
scripts, which will not be using any ksh built-ins. Even if you do
occasionally want to sync your disk during regular usage, the external
command should do just fine for that.

The following are *not* included in the current defaults list, but I
think they would be nice to have:

* cp, ln, mv. Having these as built-ins can be a good performance
optimisation. This is done in loops all the time, often with one
file at a time.
* fds and pids. These list open file descriptors and process IDs,
respectively. They could not possibly work unless built in. Also,
they're very small.
* mktemp. Not for performance reasons (clearly), but because external
mktemp implementations are so different and incompatible on various
systems that it would be good for ksh to come with a known interface
to this important functionality.

Thoughts/opinions?

--
|| modernish -- harness the shell
|| https://github.com/modernish/modernish
||
|| KornShell lives!
|| https://github.com/ksh93/ksh

Martijn Dekker

unread,
Nov 25, 2021, 5:20:39 PM11/25/21
to korn-...@googlegroups.com
Op 25-11-21 om 22:58 schreef Martijn Dekker:
> The full list of possibilities is:
>
> basename      cp        head      mv        sync
> cat          cut        id          paste        tail
[...]

And of course, giggle grips managed to mess up the formatting of a
perfectly nice columnar list in plain text, even when it's viewed with a
monospaced font. Oh well, sorry about that.

This reminds me I was going to move this list to groups.io. But it's so
little used, I don't know if anyone still cares.

Andras Farkas

unread,
Nov 27, 2021, 3:28:21 AM11/27/21
to Martijn Dekker, Korn Shell
On Thu, Nov 25, 2021 at 1:58 PM Martijn Dekker <mar...@inlv.org> wrote:
> [big snip]
> Thoughts/opinions?

I unfortunately didn't know of these builtins until today: wish I knew
of them earlier, but I don't see them mentioned in the man page at
all. That shows I'm not invested in them, but...
I do have one thought:
When I look at this list, I don't think those /opt/ast/bin/ builtins
are there for performance! Reading the list, it seems as though
they're there for their features, especially when I compare the AST
versions of these commands to the FreeBSD versions.

> This reminds me I was going to move this list to groups.io. But it's so
> little used, I don't know if anyone still cares.

I'd be fine with such a move (as getting rid of Google from one's life
is usually pleasant) but as you already say, I don't know how many
people would be active there.

Martijn Dekker

unread,
Nov 27, 2021, 1:51:23 PM11/27/21
to Andras Farkas, Korn Shell
Op 27-11-21 om 09:28 schreef Andras Farkas:
> On Thu, Nov 25, 2021 at 1:58 PM Martijn Dekker <mar...@inlv.org> wrote:
>> [big snip]
>> Thoughts/opinions?
>
> I unfortunately didn't know of these builtins until today: wish I knew
> of them earlier, but I don't see them mentioned in the man page at
> all.

That's a good point. Their existence should at least be documented there
-- without specifics, as whoever compiles ksh can choose which ones to
compile in. Each AST command has a --man option as usual, so we can just
refer to that.

In AT&T ksh93, you could only access them by putting /opt/ast/bin in
your $PATH, and even that was slightly broken (e.g. a preceding
assignment as in 'PATH=/opt/ast/bin:$PATH somecommand' ... did not work).

In ksh 93u+m, that is fixed and you can also access them directly with
/opt/ast/bin/somecommand as you would expect.

> That shows I'm not invested in them, but...
> I do have one thought:
> When I look at this list, I don't think those /opt/ast/bin/ builtins
> are there for performance! Reading the list, it seems as though
> they're there for their features, especially when I compare the AST
> versions of these commands to the FreeBSD versions.

That is certainly a factor. But, when it comes to the default
configuration, we have to weigh that against the need to avoid bloating
the ksh binary too much.

At some point (probably not soon) I'll succeed in reinstating the
ability to build dynamic libraries, at which point it should become
possible to ship them all in dynamically loadable form.

- M.

Andras Farkas

unread,
Nov 27, 2021, 8:35:14 PM11/27/21
to Martijn Dekker, Korn Shell
On Sat, Nov 27, 2021 at 10:51 AM Martijn Dekker <mar...@inlv.org> wrote:
> Op 27-11-21 om 09:28 schreef Andras Farkas:
> > On Thu, Nov 25, 2021 at 1:58 PM Martijn Dekker <mar...@inlv.org> wrote:
> >> [big snip]
> >> Thoughts/opinions?
> >
> > I unfortunately didn't know of these builtins until today: wish I knew
> > of them earlier, but I don't see them mentioned in the man page at
> > all.
>
> That's a good point. Their existence should at least be documented there
> -- without specifics, as whoever compiles ksh can choose which ones to
> compile in. Each AST command has a --man option as usual, so we can just
> refer to that.
>
> In AT&T ksh93, you could only access them by putting /opt/ast/bin in
> your $PATH, and even that was slightly broken (e.g. a preceding
> assignment as in 'PATH=/opt/ast/bin:$PATH somecommand' ... did not work).
>
> In ksh 93u+m, that is fixed and you can also access them directly with
> /opt/ast/bin/somecommand as you would expect.

*thumbs-up*

> > That shows I'm not invested in them, but...
> > I do have one thought:
> > When I look at this list, I don't think those /opt/ast/bin/ builtins
> > are there for performance! Reading the list, it seems as though
> > they're there for their features, especially when I compare the AST
> > versions of these commands to the FreeBSD versions.
>
> That is certainly a factor. But, when it comes to the default
> configuration, we have to weigh that against the need to avoid bloating
> the ksh binary too much.

Of course! I was just mentioning how I analyzed the list in a different way.

> At some point (probably not soon) I'll succeed in reinstating the
> ability to build dynamic libraries, at which point it should become
> possible to ship them all in dynamically loadable form.

Sounds good to me. :D
I'm not a ksh93u+m developer (and already said I haven't used the
/opt/ast/bin utilities until recently) but I wonder: do you think you
should get dynamic libraries working before changing the default
builtin list? Just a thought, and not one that pertains to my own
usage.

Danny Weldon

unread,
Nov 28, 2021, 12:13:54 AM11/28/21
to Andras Farkas, Martijn Dekker, Korn Shell
> At some point (probably not soon) I'll succeed in reinstating the
> ability to build dynamic libraries, at which point it should become
> possible to ship them all in dynamically loadable form.

It would be good even if they were removed from the binary if they were all still available just by running "builtin <command>" or even "builtin -f libcmd <command>" which would cause them to be dynamically loaded.  Can libcmd be built and put into one of the directories referred to in the builtin man page so that it can available as "builtin -f libcmd <command>"?

  -f lib          On systems with dynamic linking, lib names a shared library
                  to load and search for built-ins. Libraries are searched for
                  in ../lib/ksh and ../lib on $PATH and in system dependent
                  library directories. The system dependent shared library
                  prefix and/or suffix may be omitted. Once a library is
                  loaded, its symbols become available for the current and
                  subsequent invocations of builtin. Multiple libraries can be
                  specified with separate invocations of builtin. Libraries are
                  searched in the reverse order in which they are specified.

Actually, even better would be the ability to have the extra commands automatically dynamically loaded from libcmd if they appear in /opt/ast/bin and it is in $PATH or if they are called as /opt/ast/bin/<command> (thanks for enabling that feature, btw!).

I also think it would be good for the complete and compgen builtins to be backported to this version and they could just be loaded dynamically if the shell is interactive, unless the shell is running on a system that doesn't support dynamic loading.  In fact all the extra commands including grep could be backported but left in libcmd.

I believe sync was made a builtin, as I have discussed in an att thread, because, like the kill builtin, it is sometimes needed to be 100% available if a condition arises that prevents the forking of new processes.  But making it dynamically loadable should be fine because a root user or system script would then just have to load it on initialisation before any such memory condition arises.

I won't bother discussing the individual case of any particular builtin as they exist for either speed or portability, and making the non-essential ones dynamically loadable solves the problem.

I suspect that the att maintainers compiled some "essential" builtins in the binary instead of libcmd because they suspected that some 3rd party system package maintainers probably wouldn't distribute libcmd, but it would be good if that barrier could be broken so that /usr/lib/ksh would be the standard place where packages could install any builtins developed by independent developers.

--
You received this message because you are subscribed to the Google Groups "Korn Shell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to korn-shell+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/korn-shell/CAA0nTRsUpNTJH29Qp1S1fpSc25Y%3DgTYSMtY1JejnU1sjjT1oYg%40mail.gmail.com.


--
Regards

Danny

Martijn Dekker

unread,
Dec 16, 2021, 1:59:25 AM12/16/21
to korn-...@googlegroups.com
Earlier I had written:
>> At some point (probably not soon) I'll succeed in reinstating the
>> ability to build dynamic libraries, at which point it should become
>> possible to ship them all in dynamically loadable form.

I've been working on it and I'm nearly there. See the latest comments at:

https://github.com/ksh93/ksh/pull/357

To test it (temporary instructions, applicable before this is committed
to the main development branch):
$ git clone -b dyn https://github.com/McDutchie/ksh
$ cd ksh
$ bin/package make
$ bin/package use

...and you should now be using a dynamically linked ksh directly from
the build directory. Type `shtests` to run the regression tests.

This is now working fine on most systems. Unfortunately, ksh with
dynamic libraries mysteriously crashes on init on Linux arm and aarch64
systems, and possibly even on Ubuntu x86_64. I've no clue about the
cause yet, so that's holding us back.

I'd be very interested in getting some reports from this list on how
this works for you.

Op 28-11-21 om 06:13 schreef Danny Weldon:
> It would be good even if they were removed from the binary if they were
> all still available just by running "builtin <command>" or even "builtin
> -f libcmd <command>" which would cause them to be dynamically loaded.
> Can libcmd be built and put into one of the directories referred to in
> the builtin man page so that it can available as "builtin -f libcmd
> <command>"?

Yes, that'll be the idea.

Meanwhile, for the next beta, ksh will remain statically linked I'm
afraid, so the builtins selection issue remains relevant. Some packagers
are also going to continue to prefer static binaries.

> I believe sync was made a builtin, as I have discussed in an att
> thread, because, like the kill builtin, it is sometimes needed to be
> 100% available if a condition arises that prevents the forking of new
> processes. But making it dynamically loadable should be fine because
> a root user or system script would then just have to load it on
> initialisation before any such memory condition arises.

That seems like an incredibly rare use case. I don't think it justifies
incorporating it into the main static binary by default.

Besides, if you can't fork(), yet you desperately need to run 'sync'
manually before rebooting, you can always replace your shell with 'sync'
by running 'exec sync'. Which will terminate your shell, but if your
system is that messed up then there's nothing left to do but reboot anyway.
Reply all
Reply to author
Forward
0 new messages