Avoiding error prone usage of isalpha, isdigit, islower, ...

96 views
Skip to first unread message

Darren Shen

unread,
Sep 5, 2022, 7:39:54 PM9/5/22
to cxx
Hi folks,

We recently had a device specific crash when we accidentally used isalpha on a char16_t (no compile error unfortunately). As part of our postmortem, we're thinking about how to prevent this from happening in the future. Is it possible to ban isalpha (and co.) usage in favour of helpers in string_util.h?

I feel like this is a really easy mistake to make, but I could be overestimating the risk here.

Thanks!

dan...@chromium.org

unread,
Sep 6, 2022, 9:27:55 AM9/6/22
to Darren Shen, cxx
This is a signed conversion. With prevent_unsafe_narrowing, you'd get a compiler error IIUC. So that is probably what we should be relying on.


--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/CAMO7KM_XoKsvopAZ-8n%3D4g9-BESTpk%3Dz4Wj8a8%2ByFCHS2TYaVg%40mail.gmail.com.

Peter Kasting

unread,
Sep 6, 2022, 12:40:28 PM9/6/22
to dan...@chromium.org, Darren Shen, cxx
On Tue, Sep 6, 2022 at 6:27 AM <dan...@chromium.org> wrote:
This is a signed conversion. With prevent_unsafe_narrowing, you'd get a compiler error IIUC. So that is probably what we should be relying on.


(1) Unfortunately if the signedness were the same (e.g. passing in an icu::UChar32, which despite the name is a signed type), prevent_unsafe_narrowing would not flag this, but it would still be UB per the standard.
(2) I'm going to be writing a paper that recommends we not spend much further time on prevent_unsafe_narrowing for cost/benefit reasons.  Even if that paper were shot down, it would still be many years before this config would be rolled out widely.  Therefore, something more expedient is probably warranted.

I'd be supportive of ensuring we have sufficient replacements in base/, then PRESUBMIT-banning these functions.

PK

Darren Shen

unread,
Sep 6, 2022, 9:57:18 PM9/6/22
to Peter Kasting, dan...@chromium.org, cxx
Thanks! Yeah I'm not too sure on the details. Seems like isalpha takes an int (!!!), but it's UB if you pass something that is "not representable as unsigned char and is not equal to EOF." I believe u16string and u32string variants use signed chars too, so they will be UB?

Also isalpha has some complexities around locales, so I'm not sure if it's the right default choice regardless of signed conversions?

Sounds good, I'm glad! Should I file a bug for this?
 

PK

Peter Kasting

unread,
Sep 7, 2022, 11:50:49 AM9/7/22
to Darren Shen, dan...@chromium.org, cxx
On Tue, Sep 6, 2022 at 6:57 PM Darren Shen <sh...@chromium.org> wrote:
Thanks! Yeah I'm not too sure on the details. Seems like isalpha takes an int (!!!), but it's UB if you pass something that is "not representable as unsigned char and is not equal to EOF." I believe u16string and u32string variants use signed chars too, so they will be UB?

u16string and u32string use char16_t and char32_t, which are unsigned.  But icu::UnicodeString uses icu::UChar32, which is signed.

Also isalpha has some complexities around locales, so I'm not sure if it's the right default choice regardless of signed conversions?

It's usually not the right choice, no, since it relies on the C locale, while we want to make decisions based on the Chromium UI language.

On Wed, Sep 7, 2022 at 2:40 AM Peter Kasting <pkas...@google.com> wrote:
I'd be supportive of ensuring we have sufficient replacements in base/, then PRESUBMIT-banning these functions.

Sounds good, I'm glad! Should I file a bug for this?

A bug is probably a good place for further discussion, sure.

PK

David Benjamin

unread,
Sep 7, 2022, 12:00:30 PM9/7/22
to Peter Kasting, Darren Shen, dan...@chromium.org, cxx
On Wed, Sep 7, 2022 at 11:50 AM 'Peter Kasting' via cxx <c...@chromium.org> wrote:
On Tue, Sep 6, 2022 at 6:57 PM Darren Shen <sh...@chromium.org> wrote:
Thanks! Yeah I'm not too sure on the details. Seems like isalpha takes an int (!!!), but it's UB if you pass something that is "not representable as unsigned char and is not equal to EOF." I believe u16string and u32string variants use signed chars too, so they will be UB?

u16string and u32string use char16_t and char32_t, which are unsigned.  But icu::UnicodeString uses icu::UChar32, which is signed.

Though, signed or unsigned, none of types, when converted to int, will necessarily fit in {EOF, unsigned char}, so it'll be a UB risk either way. Between that and the locale issue below, I agree we should avoid them. (I believe isdigit and isxdigit are the only ones that are locale-independent. Those are salvageable, except the UB issue is a headache.)

Perhaps we could get Clang to warning around the UB issue. It can probably assume anything passing a signed char into those functions is a mistake. int is maybe less clear since that's the function's own input type. A UBSan check would also be good.
 
Also isalpha has some complexities around locales, so I'm not sure if it's the right default choice regardless of signed conversions?

It's usually not the right choice, no, since it relies on the C locale, while we want to make decisions based on the Chromium UI language.

On Wed, Sep 7, 2022 at 2:40 AM Peter Kasting <pkas...@google.com> wrote:
I'd be supportive of ensuring we have sufficient replacements in base/, then PRESUBMIT-banning these functions.

Sounds good, I'm glad! Should I file a bug for this?

A bug is probably a good place for further discussion, sure.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

Darren Shen

unread,
Sep 7, 2022, 7:12:39 PM9/7/22
to David Benjamin, Peter Kasting, dan...@chromium.org, cxx
Thanks, I've filed crbug.com/1361094
Reply all
Reply to author
Forward
0 new messages