Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Comparing an array of bytes of constant length efficiently

186 views

Skip to first unread message

Nordlöw

unread,

Oct 1, 2007, 3:15:26 AM10/1/07

How can I compare an array of bytes with a constant length in the most
efficient way if I am sitting on an x86-architecture having
mmx,sse,sse2 (at least).

Example of code to optimized: memcmp(x,y,160).

Should I use a variant of the movXXX instruction?

Does alignment matter?

/Nordlöw

Nordlöw

unread,

Oct 1, 2007, 8:43:39 AM10/1/07

Or perhaps even smarter:
Can I somehow hint GCC
- with a constant value of the third argument (size) and
- alignment of x and y
to make GCC produce optimized code inline? I can assure that x and y
has "maximum" alignment (16 I guess).

/Nordlöw

Gerd Isenberg

unread,

Oct 1, 2007, 1:03:02 PM10/1/07

sse2 seems great to compare 16-bytes at once. If all bytes are equal,
pcmpeqb leaves 16 -1 (0xff) bytes, otherwise unequal bytes leave a
zero result byte. So you can compare inside a loop and intersect all
result-vectors to a -1 initialized xmm-accu. Finally fetch the sign-
bits of the accumulated compare results by pmovmskb into a general
purpose register, to look whether all 16-lower bits are set for
equality. Alignment matters, using movdqa is announced for
performance, none aligned leading and/or trailing bytes require some
additional code, eg. using movdqu. You may try sse3 lddqu (Load
Unaligned Double Quadword) though for the whole loop.

; input: esi - source 1, aligned 16
; edi - source 2, aligned 16
; ecx - byte count is multiple of 16
; return: eax - 1 strings are equal, 0 otherwise

shr ecx, 4 ; byte count / 16
pcmpeqb xmm0, xmm0 ; 16 times 0xff

cmploop:
movdqa xmm1, [esi]
movdqa xmm2, [edi]
add esi, 16
add edi, 16
pcmpeqb xmm1, xmm2
sub ecx, 1
pand xmm0, xmm1
jnz cmploop

cmpready:
pmovmskb eax, xmm0
cmp eax, 0xffff
je equal
xor eax, eax
equal:
and eax, 1
ret 0

If you really have huge streams to compare, you may agressivly unroll
the loop. You may periodically apply the inequality test, to
preliminary break if bytes are unequal. I am not sure about C-
compilers and their intrinsic memcmp - guess it is more about rep
cmps. You may use either gcc inline assembly or SSE2-intrinsics. Intel-
C has vectorization features, but I have no idea whether it can
conduct sse2-memcmp from appropriate C-source.

/Gerd

H. Peter Anvin

unread,

Oct 1, 2007, 3:30:44 PM10/1/07

gcc should do this automatically, at least recent versions of gcc. To
write custom macros that does this kind of things, you can use
__builtin_constant_p(), sizeof, and __alignof__.

-hpa

Terence

unread,

Oct 1, 2007, 5:16:30 PM10/1/07

Surely a simple repeat equal memory compare operation would work,
stopping on string end or first inequality?

0 new messages