Why compiler is able to vectorize on int but not on unsigned

Melzzzzz

unread,

Dec 3, 2014, 5:19:19 AM12/3/14

to

Function in question is this:

int testNormal(const vector_t & ghs, const vector_t & lhs) {
int max = 0;
int tempMax;
for (unsigned k = 0; k < ghs.size(); k++) {
tempMax = lhs[k] + ghs[k];
if (max < tempMax) {
max = tempMax;
}
}
return max;
}

where: typedef vector<int> vector_t;
(could be also plain array so I post this to clc as well)

If k is unsigned both clang and gcc are not able to vectorize, producing
scalar code.
But, if k is int, both clang and gcc vectorize code producing 5x(SSE)
or 10x(AVX2) faster loop!
I am really stumbled as I can't see why type of index variable restricts
this optimization.

James Kuyper

unread,

Dec 3, 2014, 8:02:58 AM12/3/14

to

On 12/03/2014 05:19 AM, Melzzzzz wrote:
...

> int testNormal(const vector_t & ghs, const vector_t & lhs) {

...

> where: typedef vector<int> vector_t;

Cross-posting to comp.lang.c a question about code that can't even be
compiled as C code is pointless.
--
James Kuyper

jacob navia

unread,

Dec 3, 2014, 8:57:36 AM12/3/14

to

He wrote (and you did not quote that)

> (could be also plain array so I post this to clc as well)

Read for comprehension next time.

Tim Prince

unread,

Dec 3, 2014, 9:23:22 AM12/3/14

to

but OP leaves it open to guesswork on what extent he simplifies the
problem for C.
OP also leaves his choice of compiler and target open for guesswork.
However, unsigned indexing is notoriously (perhaps even surprisingly)
difficult to deal with in vectorization, particularly on ia32 target.

Victor Bazarov

unread,

Dec 3, 2014, 9:24:13 AM12/3/14

to

On 12/3/2014 8:57 AM, jacob navia wrote:

The code uses references and a member function. Do those exist in C?
(it's been some time since I touched C) And if vector_t is a "plain
array", how can it have a member ('size') at all?

V
--
I do not respond to top-posted replies, please don't ask

Richard Heathfield

unread,

Dec 3, 2014, 9:37:10 AM12/3/14

to

Victor Bazarov wrote:

> On 12/3/2014 8:57 AM, jacob navia wrote:
>> Le 03/12/2014 14:02, James Kuyper a écrit :
>>> On 12/03/2014 05:19 AM, Melzzzzz wrote:
>>> ...
>>>> int testNormal(const vector_t & ghs, const vector_t & lhs) {
>>> ...
>>>> where: typedef vector<int> vector_t;
>>>
>>> Cross-posting to comp.lang.c a question about code that can't even be
>>> compiled as C code is pointless.
>>>
>>
>> He wrote (and you did not quote that)
>>
>> > (could be also plain array so I post this to clc as well)
>>
>> Read for comprehension next time.
>
> The code uses references and a member function. Do those exist in C?

No, but plain arrays do.

> (it's been some time since I touched C)

It's good for you. Stops you getting scurvy or rickets or something like
that.

> And if vector_t is a "plain
> array", how can it have a member ('size') at all?

In a sense, of course, it can't. Nevertheless, a number of trivial solutions
to the problem do exist. You can use an extra parameter to allow the size to
be passed in, or you can make vector_t a structure type which contains not
only the array but also the size. (If it's dynamic, you keep a count. If
it's static, you can just sizeof arr / sizeof arr[0], provided it's safely
tucked up inside the struct.)

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

James Kuyper

unread,

Dec 3, 2014, 10:03:30 AM12/3/14

to

On 12/03/2014 09:23 AM, Tim Prince wrote:
> On 12/3/2014 8:57 AM, jacob navia wrote:
>> Le 03/12/2014 14:02, James Kuyper a écrit :
>>> On 12/03/2014 05:19 AM, Melzzzzz wrote:
>>> ...
>>>> int testNormal(const vector_t & ghs, const vector_t & lhs) {
>>> ...
>>>> where: typedef vector<int> vector_t;
>>>
>>> Cross-posting to comp.lang.c a question about code that can't even be
>>> compiled as C code is pointless.
>>>
>>
>> He wrote (and you did not quote that)
>>
>>> (could be also plain array so I post this to clc as well)
>>
>> Read for comprehension next time.

I'll admit that I didn't bothering reading any further after running
into the template typedef. However, his comment about plain arrays
doesn't change the inappropriateness of his message.
The wording of his comment could be interpreted as indicating that he
was just guessing about whether or not the same problem would occur with
plain arrays. If so, he should not have posted until after he had tested
it - the test is trivial enough to perform and evaluate.
If, on the other hand, he did already test with plain arrays, he should
have posted the plain array version of his code, not the template
version, particularly if he's posting to comp.lang.c. It could very well
be that the failure to vectorize is due to some detail of how he
converted the code to work with a plain array, so we need to see the
actual code.
However, he should have posted the plain array version even if he was
posting only to comp.lang.c++. By demonstrating the problem with plain
arrays he avoids having to worry whether it's due to some interaction
with the std::vector<> implementation.

> but OP leaves it open to guesswork on what extent he simplifies the
> problem for C.
> OP also leaves his choice of compiler and target open for guesswork.
> However, unsigned indexing is notoriously (perhaps even surprisingly)
> difficult to deal with in vectorization, particularly on ia32 target.

I suspect that an expanded explanation for that last sentence would
directly address the OP's original question.
--
James Kuyper

Melzzzzz

unread,

Dec 3, 2014, 10:03:52 AM12/3/14

to

I guess you are right, if I change function to:

int testNormal(const int* ghs, const int* lhs,unsigned sz) {

int max = 0;
int tempMax;

for (unsigned k = 0; k < sz; k++) {

tempMax = lhs[k] + ghs[k];
if (max < tempMax) {
max = tempMax;
}
}
return max;
}

then both clang & gcc are able to vectorize it no problem. So this is
something related to c++ vector.

Victor Bazarov

unread,

Dec 3, 2014, 10:28:07 AM12/3/14

to

I lean towards its being the use of .size() in the 'for' condition.
Change the 'for' to read

for (unsigned k = 0, sz = ghs.size(); k < sz; ++k)

and it's probably going to make the vectorizer happier.

Melzzzzz

unread,

Dec 3, 2014, 10:43:52 AM12/3/14

to

You are right,I just changed function to:

int testNormal(const vector_t& ghs, const vector_t& lhs,unsigned sz) {

int max = 0;
int tempMax;
for (unsigned k = 0; k < sz; k++) {
tempMax = lhs[k] + ghs[k];
if (max < tempMax) {
max = tempMax;
}
}
return max;
}

and indeed compilers are able to vectorize it. Heh I thought that this
is related to vectors operator[], but apparently it is not ;)

red floyd

unread,

Dec 3, 2014, 12:00:06 PM12/3/14

to

It's a quality of implementation issue. Has nothing to do with the
language itself. Please see FAQ 5.9.

Alain Ketterlin

unread,

Dec 3, 2014, 12:54:12 PM12/3/14

to

Melzzzzz <m...@zzzzz.com> writes:

> int testNormal(const vector_t & ghs, const vector_t & lhs) {
> int max = 0;
> int tempMax;
> for (unsigned k = 0; k < ghs.size(); k++) {
> tempMax = lhs[k] + ghs[k];
> if (max < tempMax) {
> max = tempMax;
> }
> }
> return max;
> }
>
> where: typedef vector<int> vector_t;
> (could be also plain array so I post this to clc as well)

A plain array would have been better, because the explanation has
nothing to do with vectors/templates/whatnot

> If k is unsigned both clang and gcc are not able to vectorize, producing
> scalar code.
> But, if k is int, both clang and gcc vectorize code producing 5x(SSE)
> or 10x(AVX2) faster loop!
> I am really stumbled as I can't see why type of index variable restricts
> this optimization.

Yes: unsigned has well-defined semantics regarding overflow, which
signed has not. Therefore, the compiler is free to make whatever
assumption it likes when the behavior is undefined. See:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

which I think is the best introduction to the subject, and includes lots
of examples.

-- Alain.

James Kuyper

unread,

Dec 3, 2014, 1:13:55 PM12/3/14

to

On 12/03/2014 12:53 PM, Alain Ketterlin wrote:
> Melzzzzz <m...@zzzzz.com> writes:
>
>> int testNormal(const vector_t & ghs, const vector_t & lhs) {
>> int max = 0;
>> int tempMax;
>> for (unsigned k = 0; k < ghs.size(); k++) {
>> tempMax = lhs[k] + ghs[k];
>> if (max < tempMax) {
>> max = tempMax;
>> }
>> }
>> return max;
>> }
>>
>> where: typedef vector<int> vector_t;
>> (could be also plain array so I post this to clc as well)
>
> A plain array would have been better, because the explanation has
> nothing to do with vectors/templates/whatnot

He's already posted (nearly three hours ago) a message indicating that
the problem does not occur when using plain arrays, so the explanation
must indeed have something to do with vectors/templates/whatnot - to be
more precise, it seems to be connected to the use of ghs.size().

Alain Ketterlin

unread,

Dec 4, 2014, 4:04:49 PM12/4/14

to

James Kuyper <james...@verizon.net> writes:

> On 12/03/2014 12:53 PM, Alain Ketterlin wrote:
>> Melzzzzz <m...@zzzzz.com> writes:
>>
>>> int testNormal(const vector_t & ghs, const vector_t & lhs) {
>>> int max = 0;
>>> int tempMax;
>>> for (unsigned k = 0; k < ghs.size(); k++) {
>>> tempMax = lhs[k] + ghs[k];
>>> if (max < tempMax) {
>>> max = tempMax;
>>> }
>>> }
>>> return max;
>>> }
>>>
>>> where: typedef vector<int> vector_t;
>>> (could be also plain array so I post this to clc as well)
>>
>> A plain array would have been better, because the explanation has
>> nothing to do with vectors/templates/whatnot

Sorry for the late answer, I was busy.

> He's already posted (nearly three hours ago) a message indicating that
> the problem does not occur when using plain arrays,

You're right. Some people stop reading in the middle of posts when they
meet a template. I admit I did stop reading in the middle of the thread
for lack of time.

(I come back to plain arrays below.)

> so the explanation must indeed have something to do with
> vectors/templates/whatnot - to be more precise, it seems to be
> connected to the use of ghs.size().

And your suggestion is?

Here is mine: ghs.size() (probably) has return type size_t, which is
(probably) bigger than unsigned on the target machine. (Actually size()
return size_type which is typedefed inside vector, usually as size_t bt
I'm not sure this is mandatory.)

Therefore, the loop may never end (if k wraps around---which it is
allowed to when unsigned), and this is enough for the compiler to decide
*not* to vectorize (because manipulating the loop trip count in that
case is tricky). The same thing does not happen when k is signed: this
is explained in the link I posted earlier.

So, why does it work with plain arrays? Well, it does not, but
Melzzzzz's modified program was vectorized because the size was now an
unsigned (not a size_t) coming in as a parameter, and the risk of
overflow during iteration was gone.

So why do I say it does not work with a "plain-array-like" version of
vector? Because a (rough) translation of vector<int> using a plain array
would be a structure like:

struct vectorlike {
size_t sz;
int * data;
};

and bam! the same would happen in C if the sz field were used as a loop
bound. Also, in the version of Melzzzzz if the parameters were "real"
plain arrays but the size were passed in as a size_t (instead of
unsigned, as it should I think), the same would happen. (And this make
the cross-posting to comp.lang.c not completely irrelevant.)

I'm speculating, because I haven't looked into the respective
vectorizers, but my experience with that unsigned/signed mess in
compilers makes me reasonably sure the problem lies somewhere there. I
maintain what I said, but of course if James or anybody else has a
better explanation I would love to read it.

-- Alain.

Melzzzzz

unread,

Dec 5, 2014, 4:07:44 AM12/5/14

to

You are absolutely right. If I change function
to:
int testNormal(const int* ghs, const int* lhs,size_t sz) {

int max = 0;
int tempMax;

for (unsigned k = 0; k < sz; k++) {

tempMax = lhs[k] + ghs[k];
if (max < tempMax) {
max = tempMax;
}
}
return max;
}

it *does not* vectorize as well!

>
> I'm speculating, because I haven't looked into the respective
> vectorizers, but my experience with that unsigned/signed mess in
> compilers makes me reasonably sure the problem lies somewhere there. I
> maintain what I said, but of course if James or anybody else has a
> better explanation I would love to read it.

Your explanation fits what happens with the code. It has to do
with unsigned is smaller then size_t and can overflow.

>
> -- Alain.

Alain Ketterlin

unread,

Dec 5, 2014, 4:42:37 AM12/5/14

to

Melzzzzz <m...@zzzzz.com> writes:

[...]

>> Also, in the version of Melzzzzz if the parameters were "real" plain
>> arrays but the size were passed in as a size_t (instead of unsigned,
>> as it should I think), the same would happen. (And this make the
>> cross-posting to comp.lang.c not completely irrelevant.)
>
> You are absolutely right. If I change function
> to:
> int testNormal(const int* ghs, const int* lhs,size_t sz) {
> int max = 0;
> int tempMax;
> for (unsigned k = 0; k < sz; k++) {
> tempMax = lhs[k] + ghs[k];
> if (max < tempMax) {
> max = tempMax;
> }
> }
> return max;
> }
>
> it *does not* vectorize as well!

[...]

> Your explanation fits what happens with the code. It has to do
> with unsigned is smaller then size_t and can overflow.

Good, we have understood something. BTW, if you are using vectors, you
can probably keep using size() as the bound, and use a counter of type
vector<int>::size_type (size_t in C):

for ( vector<int>::size_type k = 0 ; k<ghs.size() ; k++ )
....

Or in C with plain arrays, like above:

for ( size_t k = 0 ; k<sz ; k++ )
....

The compiler should be able to not call size() repeatedly.

TL;DR: array/vector sizes are of type size_t/vector<T>::size_type, and
loops iterating over arrays should use counters of type size_t/....

-- Alain.

Melzzzzz

unread,

Dec 5, 2014, 5:29:50 AM12/5/14

to

Yes, that is definitely. Compilers are now able to vectorize.
So for int, it worked because compilers assume undefined behavior
on overflow, but on unsigned, behavior is defined, so
it restricts optimizations?
So moral is that one has to be careful what to use for index variable ;)