Hi build@,--I was playing with vectorization of some of the Blink ASCII functions and ran the following sample code in https://godbolt.org/z/vojz6v to see how the vectorizer treated it:#include <stddef.h>static inline bool isUpperAscii(char ch) {return ch > 'A' && ch < 'Z';}bool CharacterProperties(const char* str, size_t length) {int x = 0;int has_upper = 0;#pragma clang loop vectorize(enable) interleave(enable)for( size_t i = 0; i < length; ++i) {x |= str[i] & 0xA0;has_upper |= isUpperAscii(str[i]);}return x;}On armv7-a with -O2, this produces vectorized code. But if I run it with -Oz, then the loop vectorizer says the loop control flow is not understood.I wouldn't have expected the optimizer profile to affect whether or not the loop vectorizer could analyze the control flow. Am I missing something?Thanks,Albert
You received this message because you are subscribed to the Google Groups "build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/build/CALcbsXAmKbQVvpS20Zz414hLiBwsQJ3NeoWwEPbzZvB-qrTEQw%40mail.gmail.com.
--
To unsubscribe from this group and stop receiving emails from it, send an email to clang+un...@chromium.org.
From a chromium build setup perspective, code that's performance sensitive should be in a target with the optimize_max config applied (which makes it build with -O2). Trying to optimize code that builds with -Oz for performance is kind of inherently contradictory.
Would -Os work for you instead of -Oz?
At a higher level, two things:
1. The optimization level, and other settings, certainly can affect how the vectorizer sees the loop, and thus whether or not it understands the control flow -- the vectorizer runs near the end of the pipeline and Oz vs O2 can affect things before it.
2. For Oz, the vectorizer should not do anything that might increase code size. This includes, for example, having tail loops. We don't have this kind of hard restriction otherwise. Thus, there are loop structures that we just can't vectorize at Oz that we can at O2.
If you want certain functions to be compiled with particular
optimization-size levels, and for these to get optimized along
with other code compiled with different optimization-size levels,
you need to use LTO. Our LTO can keep track of the Os/Oz of a
function on a per-function basis even during
cross-translation-unit optimization.
-Hal
-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory