atomic_global_add for floats

46 views
Skip to first unread message

Tomasz Koziara

unread,
Jun 14, 2013, 11:19:22 AM6/14/13
to ispc-...@googlegroups.com
Hi,

When trying to sum up elements of a vector split among threads as follows:

=========================================
  uniform int start = taskIndex*span;
  uniform int end = min((taskIndex+1)*span, numbod);
  uniform float sum2, sum3;
  float sum1 = 0.0;

  foreach (i = start ... end)
  {
    sum1 += v[i];
  }
  sum2 = reduce_add (sum1);
  atomic_add_global (&sum3, sum2);
=========================================

I am getting:

=========================================
lidem.ispc:84:3: Error: Unable to find any matching overload for call to function "atomic_add_global". 
        Passed types: (uniform float * uniform, uniform float) 
  atomic_add_global (&sum3, sum2);
=========================================

I'd appreciate any advice on how this should be done.

Best regards,
Tomek

Dmitry Babokin

unread,
Jun 14, 2013, 12:52:59 PM6/14/13
to ispc-...@googlegroups.com
Hi,

We really do miss global atomics for anything other than int32 and
int64 types (and only swap for float/double). The reason for that is
that LLVM doesn't support it. I'm not sure how to workaround this
problem.

Best regards,
Dmitry.
> --
> You received this message because you are subscribed to the Google Groups
> "Intel SPMD Program Compiler Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ispc-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Matt Pharr

unread,
Jun 14, 2013, 1:00:34 PM6/14/13
to ispc-...@googlegroups.com
I think that the underlying issue is that the HW doesn't support it--the ISA doesn't offer atomic ops for floats.  One reason for this is that floating-point addition isn't associative--(a+b)+c != a+(b+c) for floats, so different orderings of atomic float ops wouldn't give the same result (while they do for ints, etc.)

If you don't care about this issue, you should be able to work around using atomic_compare_exchange_global:

// untested
do {
  uniform float old_val = sum3;
  uniform float new_val = sum3 + sum2;
} while (atomic_compare_exchange_global(&sum3, old_val, new_val) != old_val);

Thanks,
Matt

Tomasz Koziara

unread,
Jun 17, 2013, 4:26:04 AM6/17/13
to ispc-...@googlegroups.com
Hi,

Thanks for your clarifications. It is not always obvious what is supposed to work with ISCP:)

I think that I will do it the old fashioned way in the end: add up split up vector in threads and output partial results to some

out[threadIndex] = result

and then sum them up serially outside.

Best regards,
Tomek
Reply all
Reply to author
Forward
0 new messages