How to find the index of max element in float<4> in avx2-i32x8 effeciently

18 views
Skip to first unread message

J Wesker

unread,
May 13, 2022, 12:33:31 AM5/13/22
to Intel SPMD Program Compiler Users
I'm going to get the index of max element in float<4> in avx2-i32x8.
and here is my code.

export uniform int YetAnother(uniform float<4>& InValue)
{
    varying float src;
    *((uniform float<4> *uniform)&src) = *((uniform float<4> *uniform)&InValue);
    varying float max = reduce_max(src);
   
    uniform int IMMask;
    if(src == max)
    {
        IMMask = lanemask();
    }

    return count_trailing_zeros(IMMask);
}


i get the asm by this command
ispc -O2 -g --target=avx2-i32x8 --emit-asm

and the asm code like below

    .def     YetAnother;
    .scl    2;
    .type    32;
    .endef
    .globl    YetAnother                      # -- Begin function YetAnother
    .p2align    4, 0x90
YetAnother:                             # @YetAnother
.Lfunc_begin14:
    .cv_func_id 39
# %bb.0:                                # %allocas
    #DEBUG_VALUE: YetAnother:InValue <- undef
    .cv_inline_site_id 40 within 39 inlined_at 1 0 0
    .cv_loc    40 2 1354 18                    # stdlib.ispc:1354:18
    vmovaps    (%rcx), %xmm0
.Ltmp256:
    #DEBUG_VALUE: test <- undef
    #DEBUG_VALUE: __mask <- undef
    #DEBUG_VALUE: reduce_max:v <- $ymm0
    vmaxps    %xmm0, %xmm0, %xmm1
    vpermilpd    $1, %xmm1, %xmm2        # xmm2 = xmm1[1,0]
    vmaxps    %xmm1, %xmm2, %xmm1
    vmovshdup    %xmm1, %xmm2            # xmm2 = xmm1[1,1,3,3]
    vmaxss    %xmm1, %xmm2, %xmm1
.Ltmp257:
    #DEBUG_VALUE: result <- $xmm1
    #DEBUG_VALUE: __mask <- undef
    vbroadcastss    %xmm1, %ymm1
.Ltmp258:
    #DEBUG_VALUE: __mask <- undef
    #DEBUG_VALUE: max <- $ymm1
    vcmpeqps    %ymm1, %ymm0, %ymm0
.Ltmp259:
    vmovmskps    %ymm0, %eax
.Ltmp260:
    #DEBUG_VALUE: __mask <- undef
    #DEBUG_VALUE: count_trailing_zeros:v <- $eax
    #DEBUG_VALUE: IMMask <- $eax
    #DEBUG_VALUE: src <- undef
    #DEBUG_VALUE: __mask <- undef
    .cv_inline_site_id 41 within 39 inlined_at 1 196 9
    .cv_loc    41 2 877 12                     # stdlib.ispc:877:12
    tzcntl    %eax, %eax
.Ltmp261:
    #DEBUG_VALUE: iflt_neg_max <- undef
    #DEBUG_VALUE: __mask <- undef
    .cv_loc    39 1 196 9                      # simple.ispc:196:9
    vzeroupper
.Ltmp262:
    retq
.Ltmp263:
.Lfunc_end14:
                                        # -- End function


I' m wondering if it is the best way to do that.
and the asm use ymm1 register. 
how can i use only xmm1 to do vcmpeqps and vmovmskps.

I don‘t want to change target to avx2-i32x4
I want't to keep the TARGET_WIDTH = 8 to get other function in my programe effecient

Dmitry Babokin

unread,
May 16, 2022, 4:05:09 PM5/16/22
to ispc-...@googlegroups.com
Hello,

Right now we don't optimize for horizontal operations on short vectors (i.e. float<4>) and there's no straightforward way to do reduce_max(float<4>). But this topic is being actively discussed, supporting that would require redesigning stdlib (so basically all width functions are avilable atthe same time), we are thinking about adding this capability.

It would help if you file an issue with this specific request about reduce_max(float<4>).

PS mailing list is deprecated and it's preferable to use Github Discussions and Issues instead.

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ispc-users/7d3ea55b-0e3f-4305-996b-8b4536fa1331n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages