How to find the index of max element in float<4> in avx2-i32x8 effeciently

18 views

Skip to first unread message

J Wesker

unread,

May 13, 2022, 12:33:31 AM5/13/22

to Intel SPMD Program Compiler Users

I'm going to get the index of max element in float<4> in avx2-i32x8.

and here is my code.

export uniform int YetAnother(uniform float<4>& InValue)
{
varying float src;
*((uniform float<4> *uniform)&src) = *((uniform float<4> *uniform)&InValue);
varying float max = reduce_max(src);

uniform int IMMask;
if(src == max)
{
IMMask = lanemask();
}

return count_trailing_zeros(IMMask);
}

i get the asm by this command

ispc -O2 -g --target=avx2-i32x8 --emit-asm

and the asm code like below

.def YetAnother;
.scl 2;
.type 32;
.endef
.globl YetAnother # -- Begin function YetAnother
.p2align 4, 0x90
YetAnother: # @YetAnother
.Lfunc_begin14:
.cv_func_id 39
# %bb.0: # %allocas
#DEBUG_VALUE: YetAnother:InValue <- undef
.cv_inline_site_id 40 within 39 inlined_at 1 0 0
.cv_loc 40 2 1354 18 # stdlib.ispc:1354:18
vmovaps (%rcx), %xmm0
.Ltmp256:
#DEBUG_VALUE: test <- undef
#DEBUG_VALUE: __mask <- undef
#DEBUG_VALUE: reduce_max:v <- $ymm0
vmaxps %xmm0, %xmm0, %xmm1
vpermilpd $1, %xmm1, %xmm2 # xmm2 = xmm1[1,0]
vmaxps %xmm1, %xmm2, %xmm1
vmovshdup %xmm1, %xmm2 # xmm2 = xmm1[1,1,3,3]
vmaxss %xmm1, %xmm2, %xmm1
.Ltmp257:
#DEBUG_VALUE: result <- $xmm1
#DEBUG_VALUE: __mask <- undef
vbroadcastss %xmm1, %ymm1
.Ltmp258:
#DEBUG_VALUE: __mask <- undef
#DEBUG_VALUE: max <- $ymm1
vcmpeqps %ymm1, %ymm0, %ymm0
.Ltmp259:
vmovmskps %ymm0, %eax
.Ltmp260:
#DEBUG_VALUE: __mask <- undef
#DEBUG_VALUE: count_trailing_zeros:v <- $eax
#DEBUG_VALUE: IMMask <- $eax
#DEBUG_VALUE: src <- undef
#DEBUG_VALUE: __mask <- undef
.cv_inline_site_id 41 within 39 inlined_at 1 196 9
.cv_loc 41 2 877 12 # stdlib.ispc:877:12
tzcntl %eax, %eax
.Ltmp261:
#DEBUG_VALUE: iflt_neg_max <- undef
#DEBUG_VALUE: __mask <- undef
.cv_loc 39 1 196 9 # simple.ispc:196:9
vzeroupper
.Ltmp262:
retq
.Ltmp263:
.Lfunc_end14:
# -- End function

I' m wondering if it is the best way to do that.

and the asm use ymm1 register.

how can i use only xmm1 to do vcmpeqps and vmovmskps.

I don‘t want to change target to avx2-i32x4

I want't to keep the TARGET_WIDTH = 8 to get other function in my programe effecient

Dmitry Babokin

unread,

May 16, 2022, 4:05:09 PM5/16/22

to ispc-...@googlegroups.com

Hello,

Right now we don't optimize for horizontal operations on short vectors (i.e. float<4>) and there's no straightforward way to do reduce_max(float<4>). But this topic is being actively discussed, supporting that would require redesigning stdlib (so basically all width functions are avilable atthe same time), we are thinking about adding this capability.

It would help if you file an issue with this specific request about reduce_max(float<4>).

PS mailing list is deprecated and it's preferable to use Github Discussions and Issues instead.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/7d3ea55b-0e3f-4305-996b-8b4536fa1331n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages