There is no easily reproducible bug on most platforms.
The is*()
and to*()
standard functions declared in <ctype.h>
require an argument that either is equal to EOF
(typically -1) or is in the range of unsigned char
.
If plain char
is signed, as it is on many systems, a character extracted from a command line argument, from user interactive input, or from a file can have a negative value, resulting in undefined behavior. Typically this applies only to non-ASCII characters.
In most implementations, these functions behave reasonably sanely for negative values, so in spite of the undefined behavior, the observed behavior is likely to be benign.
One exception: Microsoft's C implementation (I used Visual Studio 2022 on Windows 11) aborts on an invalid argument to these functions, but only when an application is built in Debug mode. I've built xxd.c
as a standalone application under Visual Studio. The resulting executable crashes when invoked as follows (I named the source file c.c
for this quick-and-dirty test):
Release\c.exe -i héllö.txt
I've created a fork of this repo and a branch with a proposed fix for this issue:
https://github.com/Keith-S-Thompson/vim/tree/ctype-fix
I have not yet created a pull request. I'll be glad to do so.
The fix adds a number of macros, such as:
#define SAFE_isalnum(c) (isalnum ((unsigned char)(c)))
and replaces most calls to the standard functions with invocations of these macros.
Note that there is already some code that's designed to avoid problems calling these functions with negative arguments. This existing code does not cover all cases; my update is intended to do so. For now, I've left most such code in place, though it's probably unnecessary.
(An aside: vim has code that works around implementations where toupper()
and tolower()
operate incorrectly on arguments that are not respectively lowercase or uppercase letters. I'm skeptical that there are any existing implementations that (a) have this bug and (b) are able to compile vim, since toupper()
and tolower()
have been required to operate correctly since the 1989 ANSI C standard. This is not directly related to this issue, but it's a potential opportunity for some cleanup.)
Calls to is*()
and to*()
functions declared in <ctype.h>
should behave correctly -- which is in fact what currently happens 99% of the time. (Behaving correctly is just one possible consequence of undefined behavior.)
9.0.2018
OS: Ubuntu 22.04.3 ; Windows 11
Terminal: xterm, tmux ; cmd.exe
$TERM: tmux-256-color
; N/A
shell: bash 5.3.0 ; cmd.exe
No response
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
I ran make test
with my version. I saw some test failures, but only ones that also occur with the unmodified version 9.0.2018.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
thanks, I am a bit worried that this may have unwanted consequences on other platforms. Perhaps we can start with a small change in some central place and if this doesn't cause problems, we can include this for the whole repository.
I suppose changing everything in once commit is too messy, so I'll prefer some more granular changes. In any case, that is something for after the Vim9.1 release.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
Since there are no visible symptoms, I understand the desire for caution.
I don't see how applying the entire patch could break anything, but of course that means that any problems would be unanticipated ones.
FWIW, I took a quick look at the neovim sources. Arguments to the is*()
and to*()
functions are generally cast to uint8_t
. (unsigned char*
is IMHO cleaner, but they're almost certain to be equivalent.)
I'll leave my repo and branch in place. Feel free to apply it at your convenience.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
okay thanks. Can you make a PR out of your repo please?
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
Done.
(I didn't see a way to tie the PR to the issue, so I just linked to the issue in the PR.)
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
There were some test failures. I think I have a good idea how to fix them. I'll update the branch soon.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
I am including this now, except for the change in iscygpty.c
, which should have been fixed since commit ac709e2
Please verify.
Thanks!
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.
Closed #13332 as completed via 184f71c.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.