--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaQ_AWdBB%3Dzggk2fkNhoBEd7_7b4mCAjKUdY3-U1_3qscQ%40mail.gmail.com.
Based on your summary, this seems like a good improvement. Does the change pass all the trybots already?
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaRuuQUDZ4g5v1bS8QXgdE6UON6op2ty3o%3D1o%2Btt%3DsfDKA%40mail.gmail.com.
Just a small note: Java's char type is more like char16_t; Java's byte type is the closest analogue to C++'s char, and is signed.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CACwGi-5vKmCnD1CnixzSfqW-AWwDDH_E8SUHy4CaU_4iA99iRw%40mail.gmail.com.
Does this potentially cause problems linking with third-party or system libraries which were built with the platform convention for char signedness? This switch doesn't affect name mangling, so it seems like it could cause us to undetectably do signed/unsigned conversions at the linking boundary.
On Fri, Jan 14, 2022 at 1:00 PM K. Moon <km...@chromium.org> wrote:Just a small note: Java's char type is more like char16_t; Java's byte type is the closest analogue to C++'s char, and is signed.Thanks for the clarification. I see that String supports working with bytes or char and requires you to explicitly choose when accessing the data.
Regardless, there's no change for us on any platform that uses Java.
On Fri, Jan 14, 2022 at 1:35 PM Jeremy Roman <jbr...@chromium.org> wrote:Does this potentially cause problems linking with third-party or system libraries which were built with the platform convention for char signedness? This switch doesn't affect name mangling, so it seems like it could cause us to undetectably do signed/unsigned conversions at the linking boundary.It would not change linking. The same bits would be given to said library as before. If a library explicitly wanted "-1" as an input with a char type, for example, no code change is required to pass it that.For the most part, the compiler switch won't do anything, and we'd have to change our code to pass different values to said library before we had a problem (for eg passing 255 when it expects a positive signed char). But thank you, there is one thing that does change as a result, std::numeric_limits<char> will be different. It does not appear outside of tests at this time. It may be that a template makes use of it, then passes something based on that to a linked-in library (glibc/pthread/??) that wants a signed char, and something goes wrong. It does feel like a bit of a stretch, but maybe there's a way to look for potential problems there.
Hello,In chromium today, `char` is sometimes signed, and sometimes unsigned. In particular, on ARM platforms our `char` is unsigned; otherwise it is signed.Because the vast majority of our code is shared with ARM builds, it already must work with unsigned chars. But the presence of signed chars causes us some problems:1) Because a char can be signed, comparing to >= 0 is valid. But on ARM, it's not valid. Clang can provide warnings for us when we make unsigned comparisons like this, via -Wtautological-unsigned-zero-compare. But it's not possible to turn it on while we have both signed and unsigned chars, as it's both correct to warn on ARM and incorrect to warn on Intel for the same code.Yes, we could disable the warning on ARM, but then we exclude the check from ARM-specific code, and this warning is important. It is the only way to automatically catch bugs of the formfor (size_t i = container.size() - 1; i >= 0; ++i)which is something we can expect to encounter as we attempt to improve our signed correctness in order to raise our per-allocation-limit above 2GB.
2) FFI is more complicated when char is both signed and unsigned. While C/C++ have the ambiguously-signed `char`, a language like Rust does not, so we need to choose a type to convert to. If char can be signed, then there are many values which we must handle as errors when converting to a u8, for instance. Java also has unsigned chars, though we only use Java on ARM, where we have unsigned chars, so we currently avoid this problem in FFI there.This means changing all our chars to be consistently signed would make FFI worse on ARM, whereas going the other way, to unsigned, makes FFI better elsewhere.
3) Code we import from Google internals also expects char to be unsigned, so this would make the build/runtime environment more consistent for that code as well.
Therefore, I am proposing to consistently build chrome with unsigned chars across all platforms. On clang that is -funsigned-char. On MSVC that is /J. It builds everywhere, and needed only 1 change in breakpad to do so.Have I missed anything?Thanks,Dana
This kind of thing gives me the heebie-jeebies. The signedness of unadorned char is ABI, inherent to a system, and to a large extent, unyielding. It’s quite possible to be able to squeak by with a change like this without causing problems at module boundaries because it’s actually quite rare to pass a scalar char into a function, or return one from a function, in the C tradition. I wouldn’t bet the farm on it, though.
For whatever it’s worth, the assertion that char is already unsigned on ARM (also bug 1287175) is incomplete. As a property of the ABI, the OS needs to be considered, not just the CPU. On Apple platforms (specifically, mac-arm64 and ios-arm64), char is, in fact, signed. I mention this in case any of the analysis that brought you to this point was based on the notion that anything we build for an ARM system must already be safe and could get a pass, or in case you tested by asking for -funsigned-char on a variety of x86_64 platforms but not on any arm64 platform. Since that’s not true, at the very least Apple-specific code can’t get a pass on these grounds.
(The fact that Apple went against the ARM-universe grain and declared char to be signed in their various ARM ABIs is a hint that there may be very sharp edges on their platforms in the form of ingrained expectations that unadorned char is signed, and rather than sanding them down, they opted to keep char signed. But it’s also possible that without being shackled by the pre-ARMv4 legacy of inefficient signed loads, they were able to go their own way favoring consistency across their platforms. More: an interesting historical note on why char is usually signed on ARM. This brings up another point, though: a platform’s choice of char signedness may have been made with efficiency in mind, and although this probably isn’t the case for anything we target today, it’s at least worth the inquiry.)
Another caution that comes to mind that I haven’t seen mentioned yet: we may have third-party code that uses unadorned char coupled with something that tells it what the properties of that char are. This might take the form of a macro, which could appear in a .gn file or in a checked-in header such as autoconf configure output. These would need to be updated.
More inline.dan...@chromium.org wrote:Hello,In chromium today, `char` is sometimes signed, and sometimes unsigned. In particular, on ARM platforms our `char` is unsigned; otherwise it is signed.Because the vast majority of our code is shared with ARM builds, it already must work with unsigned chars. But the presence of signed chars causes us some problems:1) Because a char can be signed, comparing to >= 0 is valid. But on ARM, it's not valid. Clang can provide warnings for us when we make unsigned comparisons like this, via -Wtautological-unsigned-zero-compare. But it's not possible to turn it on while we have both signed and unsigned chars, as it's both correct to warn on ARM and incorrect to warn on Intel for the same code.Yes, we could disable the warning on ARM, but then we exclude the check from ARM-specific code, and this warning is important. It is the only way to automatically catch bugs of the formfor (size_t i = container.size() - 1; i >= 0; ++i)which is something we can expect to encounter as we attempt to improve our signed correctness in order to raise our per-allocation-limit above 2GB.How prevalent is the case where we test an unadorned char for ≥0? Given the fact that we can’t make any assumptions about signedness, such code smells broken regardless, and having it be flagged by -Wtautological-unsigned-zero-compare where char is unsigned sounds correct. Can we fix those cases and enable -Wtautological-unsigned-zero-compare independently of changing char’s signedness contrary to some ABI definitions?
In fact, I think that this is already the case with the clang that we use today (crclang++ is Chrome’s clang++):mark@arm-and-hammer zsh% crclang++ --versionmark@arm-and-hammer zsh% cat t_tautological_unsigned_zero_compare.cc
clang version 14.0.0 (https://github.com/llvm/llvm-project/ ed3a4a4948dedd264732105e910405138c9c0f3b)
Target: arm64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /chrome/chrome/src/third_party/llvm-build/Release+Asserts/bin
bool IsZeroOrPositive(char c) {
return c >= 0;
}
mark@arm-and-hammer zsh% crclang++ -std=c++17 -funsigned-char -Wall -Wtautological-unsigned-zero-compare -Werror -c -o /dev/null t_tautological_unsigned_zero_compare.ccmark@arm-and-hammer zsh% crclang++ -std=c++17 -funsigned-char -Wall -Wtautological-unsigned-zero-compare -Wtautological-unsigned-char-zero-compare -Werror -c -o /dev/null t_tautological_unsigned_zero_compare.cc
t_tautological_unsigned_zero_compare.cc:2:12: error: result of comparison of char expression >= 0 is always true, since char is interpreted as unsigned [-Werror,-Wtautological-unsigned-char-zero-compare]
return c >= 0;
~ ^ ~
1 error generated.It is possible to turn this warning on selectively, and to keep it away from unadorned char until we’re ready for that.
2) FFI is more complicated when char is both signed and unsigned. While C/C++ have the ambiguously-signed `char`, a language like Rust does not, so we need to choose a type to convert to. If char can be signed, then there are many values which we must handle as errors when converting to a u8, for instance. Java also has unsigned chars, though we only use Java on ARM, where we have unsigned chars, so we currently avoid this problem in FFI there.This means changing all our chars to be consistently signed would make FFI worse on ARM, whereas going the other way, to unsigned, makes FFI better elsewhere.I would argue that these problems can appear at any language/runtime boundary, and the correct solution isn’t to abuse the meaning of a type on either side of any fence, but to be more careful at the boundaries. In no case do I really find it compelling to bend the ABI-defined meaning of a fundamental type in our primary language, for the benefit of the few transitions we have into another.
3) Code we import from Google internals also expects char to be unsigned, so this would make the build/runtime environment more consistent for that code as well.I’m pretty sure that this isn’t universal. For example, -Wno-char-subscripts and a bug caused by mixing -funsigned-char settings.
Dana pointed out that SwiftShader in Google3 uses -fsigned-char. We should be able to remove that: b/422548590.On Fri, Jan 14, 2022 at 9:16 PM Nicolas Capens <ca...@google.com> wrote:Where did you find SwiftShader using -fsigned-char? I wouldn't be surprised if we did, but I couldn't immediately find it. Keep in mind that SwiftShader is used in Android and Google3 as well so we have to be careful when any code assumes signedness. Char being signed is the default on a lot of platforms as far as I know, and we use unsigned char for this reason in a lot of places, but I don't know offhand if we assume just char to be signed anywhere.
Well, sort order would change, for example, if you coded your own compare function, no?
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaTx372cWHNrGwy1e-MhZdO7GHQNHQx%2BVo79vTxWNLTr7g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaRMYM%2B%3Dp2cddiVzCtGTHaUiKs1hFwzVEawH%3Dhvs1v1M%3DQ%40mail.gmail.com.
I'm not going to strongly resist this but it seems like the shortest path is not to change the definition of char on various platforms to be inconsistent with the platform ABI, but instead to just use some combination of std::is_unsigned_v<char> / std::is_signed_v<char> and changing variables to be expressly "signed char" or "unsigned char", rather than baking a hidden assumption that we're writing in a modified dialect of C++.I have to imagine any Google code we import will need to be made portable for other reasons. I was actually surprised to learn the prod environment doesn't match the usual x86 ABI; do you know why that was done?
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaQjkhwVOr_5EUDBfy4hk%3D%2BHxRy4MT-WD8EEHAsKQyPvVA%40mail.gmail.com.
A strong reason for using unsigned chars, and it seems the reason it was done then, was so you could pull a `char` out of a std::string (which forces unqualified char), and use it as an index.
This seems like something the compiler should warn against. Even if Chromium forces char to be unsigned, it doesn't encourage writing portable code if we allow this kind of usage.
It already does, with -Wchar-subscripts, which is part of -Wall.mark@arm-and-hammer zsh% crclang++ -x c++ -std=c++17 -Wall -Werror -S -c - -o/dev/null <<< 'const int ints[256] = {}; int F(char c) { return ints[c]; }'
<stdin>:1:54: error: array subscript is of type 'char' [-Werror,-Wchar-subscripts]
const int ints[256] = {}; int F(char c) { return ints[c]; }
^~
1 error generated.This warning appears even with -funsigned-char.
It looks like we have this warning enabled for the most part, except for third_party/minizip on NaCl.
I agree that char array subscripts are a bad and nonportable practice and should be avoided. The cast to unsigned at the few points of use is defined and probably typically appropriate.
A strong reason for using unsigned chars, and it seems the reason it was done then, was so you could pull a `char` out of a std::string (which forces unqualified char), and use it as an index.
Well, sort order would change, for example, if you coded your own compare function, no?
On Tue, Jan 18, 2022, 8:00 AM <dan...@chromium.org> wrote:
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaTx372cWHNrGwy1e-MhZdO7GHQNHQx%2BVo79vTxWNLTr7g%40mail.gmail.com.
Hi, Dana. Sorry for the delay in responding.The sort of inter-module problem I’m most concerned about is where a char crosses a module boundary into a system library that we don’t control. The surface area for system libraries beyond our control is quite large on most of our platforms.A char passed directly across this boundary may pose a problem if it’s carried in a wider type. This can happen for ABI reasons, such as when a system library accepts a char argument that’s passed in a native-width register. These cases will typically appear innocuous, though, since the bit pattern in the least significant byte should still match everyone’s expectations. More concerning are cases like char *strchr(char const *, int). The fact that the “needle” argument is conveyed across the boundary in an int means that the input to that function is much more likely to be treated differently when called with an unsigned or signed char argument. Now, in the case of strchr, it’s not actually damaging at all, because the definition of strchr requires the int argument to be converted to char on the other end. I’m hard-pressed to think of a case in the standard C library where this would actually be a problem. This is part of the “maybe it will squeak by” comment I made in my first response on this thread.But I’ve really only given this serious thought with respect to the standard C library, and we do consume much more than that on most systems. There are many higher-level platform-specific libraries that we talk to. Often, there’s also a higher-level character typedef to wrap characters in these cases, and so long as everyone uses that consistently, there’s no problem. But this would be the most mysterious aspect that the proposed change might have, and it’s where most of my heebie-jeebies are concentrated.I’m most comfortable with cranking up the warnings on questionable uses of unadorned char. How many of the concerning cases that you’ve found can we cover with warnings that work within the ABI, without trying to twist it?
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAHtyhaSCp8y8kssF9mbGLLcFbYyDTsTet65TtkZuyWB1qkfB%2BQ%40mail.gmail.com.
On closer reading, your example appears to be fine. Feel free to disregard, but it seems like the sort of thing there'd be a diagnostic for, as it seems like there are better alternatives in most cases.
A couple of other options that come to mind:
- p[2 - 1] (index arithmetic instead of pointer arithmetic)
- *(p2 - 1)
Even though p2[-1] (and -1[p2]) are semantically the same as *(p2 - 1), I would consider them stylistically distinct. Clang's AST preserves the distinction, so it's possible to build diagnostics around this. There appear to be very few cases of the pattern "[-" in the code base, so it seems unusual anyway.