Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ksh(1) does not need to look at LC_CTYPE

17 views
Skip to first unread message

Ingo Schwarze

unread,
Oct 14, 2016, 9:21:05 AM10/14/16
to te...@openbsd.org
Hi,

when committing ksh(1) vi input mode UTF-8 support recently,
i added a setlocale(3) call to the shell. Now, when considering
how to document LC_CTYPE in the ksh(1) manual, i realized that
inspecting that variable is not really useful, so we can simplify
things, see the diff below.

First, note that emacs mode doesn't use LC_CTYPE in the first
place, nor does anything else in the shell except vi mode.

Assuming you use vi mode (VISUAL=vi), three settings influence
how things work for you:

1. Whether escaping non-ASCII bytes is disabled (set +o vi-show8,
the default) or enabled (set -o vi-show8).

2. LC_CTYPE=C (the default) or UTF-8.

3. Wether your xterm is UTF-8 enabled (-u8, the default)
or not (+u8).

So there are 2^3 = 8 case combinations. Let's look at them
in turn.

A. Escaping non-ASCII bytes enabled (set -o vi-show8):

In this case, the diff changes nothing because isu8cont()
always returns 0 already now, so LC_CTYPE is already
effectively ignored.

I considered whether this mode is useful at all or whether
it might be better to just delete the vi-show8 switch
outright and simplify the code.

But there is a potential use case, however rare it may be:
Sometimes, you may want to edit individual raw 8-bit bytes
on shell command lines, either for testing purposes or
to call programs that require binary command line arguments
or input. That wish may occur for any LC_CTYPE setting
and on any terminal.

So 4 of the 8 cases are taken care of so far.

In the following, we know that non-ASCII bytes will not be
escaped (set +o vi-show8).

B. The user wants to use UTF-8:

In that case, having LC_CTYPE=en_US.UTF-8 is required
and the patch changes nothing.

Of course, the xterm must also be UTF-8 enabled.
The combination with +u8 is just useless and dangerous,
with or without the patch.

So far, 6 of the 8 cases are taken care of.

What remains is set +o vi-show8 with LC_CTYPE=C (the default, actually).

C. On a UTF-8 terminal (This is the default case!):

In this case, if the user presses non-ASCII keys on the
keyboard, they will result in UTF-8-encoded multibyte
strings in the shell's internal buffers and they will be
shown as Unicode glyphs.

In that situation, respecting LC_CTYPE=C allows the user
to move the cursor to individual bytes, but without being
able to see where the cursor really is, and to delete and
insert single bytes in the middle of characters, causing
the display to show stuff that disagrees with the actual
content of the buffers. This is not useful at all and
potentially dangerous.

The diff below actually makes things better. It improves
the chances that the display remains consistent with actual
buffer content, by effectively editing in UTF-8 mode.

Admittedly, that may not be what the user wants.
But if the user really wants to mess with arbitrary bytes,
they ought to set -o vi-show8 as explained above.

D. On a terminal in non-UTF-8 legacy latin-1 mode:

The only reason i can imagine for using such a mode combination
is to manipulate arbitrary bytes individually, but actually,
this mode combination is unusable for that purpose because
several bytes will corrupt or lock up the terminal, both with
and without this patch. So it doesn't really matter that the
patch changes behaviour here, the mode is useless and dangerous
in the first place. Besides, the mode is hardly usable at all
because most characters that can be entered are interpreted and
shown as ISO-LATIN-1 characters, which is not a useful way to
represent arbitrary binary bytes.

So, the patch

- makes the code simpler,
- changes nothing for many use cases,
- improves the default use case, and
- besides, only affects a mode combination that is useless anyway.

OK to put it in?
Ingo


Index: main.c
===================================================================
RCS file: /cvs/src/bin/ksh/main.c,v
retrieving revision 1.81
diff -u -p -r1.81 main.c
--- main.c 11 Oct 2016 19:52:54 -0000 1.81
+++ main.c 14 Oct 2016 12:27:31 -0000
@@ -8,7 +8,6 @@

#include <errno.h>
#include <fcntl.h>
-#include <locale.h>
#include <paths.h>
#include <pwd.h>
#include <stdio.h>
@@ -152,8 +151,6 @@ main(int argc, char *argv[])
pid_t ppid;

kshname = argv[0];
-
- setlocale(LC_CTYPE, "");

if (pledge("stdio rpath wpath cpath fattr flock getpw proc exec tty",
NULL) == -1) {
Index: vi.c
===================================================================
RCS file: /cvs/src/bin/ksh/vi.c,v
retrieving revision 1.40
diff -u -p -r1.40 vi.c
--- vi.c 11 Oct 2016 19:52:54 -0000 1.40
+++ vi.c 14 Oct 2016 12:27:31 -0000
@@ -2224,6 +2224,6 @@ vi_macro_reset(void)
static int
isu8cont(unsigned char c)
{
- return MB_CUR_MAX > 1 && !Flag(FVISHOW8) && (c & (0x80 | 0x40)) == 0x80;
+ return !Flag(FVISHOW8) && (c & (0x80 | 0x40)) == 0x80;
}
#endif /* VI */

0 new messages