It also depends on the array size (between 1 and 10000 in my case).
Assuming the crossover point is, say, 2000, it's probably best to use
AVX "branchless" for the first 2000 elements, and then continue with
AVX hard. I wanted to look into "branchless" some more, but, as
usual, other things needed my attention and so I did not pursue it
further.
I could not compile it on Debian 11 ("relocation R_X86_64_32S against
`.data' can not be used when making a PIE object; recompile with
-fPIE"; this means that the assembly code contains an absolute address
and should be replaced with a rip-relative address), so I compiled it
on Debian 8 (gcc-4.9.2).
Below is what I see. What does it mean?
On Skylake:
# Ints per cycle
# n normal expect AVX2 AVX2_unroll
128 0.250980 0.198142 0.328205 0.227758
256 0.378698 0.351648 1.000000 0.479401
512 0.498054 0.453901 0.486692 0.609524
1024 0.533889 0.509453 0.499025 0.727273
2048 0.549946 0.558952 0.515869 0.768769
4096 0.560022 0.562174 0.465243 0.821830
8192 0.562560 0.563179 0.616496 0.836260
16384 0.568376 0.566568 0.840464 1.221957
32768 0.569482 0.568612 0.998598 1.522960
65536 0.569640 0.569839 1.227496 2.413316
131072 0.569141 0.570295 1.334039 1.866857
262144 0.570032 0.568262 1.389593 1.929232
524288 0.569357 0.566879 1.508152 1.673972
1048576 0.561443 0.555999 1.533845 1.503037
2097152 0.560509 0.560691 1.458560 1.509459
4194304 0.559187 0.560557 1.456157 1.503564
8388608 0.561024 0.560462 1.494831 1.514211
16777216 0.560297 0.559024 1.496209 1.510765
33554432 0.559756 0.560659 1.501258 1.512948
67108864 0.559765 0.560249 1.507910 1.512386
134217728 0.560098 0.560409 1.506587 1.515123
268435456 0.560284 0.560472 1.509522 1.516031
536870912 0.559883 0.560436 1.508366 1.516430
536870912 0.560183 0.560181 1.509494 1.516893
268435456 0.560113 0.560441 1.507528 1.516041
134217728 0.559948 0.560224 1.509935 1.519144
67108864 0.561124 0.561204 1.505807 1.519437
33554432 0.560492 0.559996 1.518871 1.521890
16777216 0.561216 0.560984 1.501925 1.512587
8388608 0.560717 0.560970 1.481175 1.511185
4194304 0.559531 0.560643 1.456585 1.486993
2097152 0.558511 0.561203 1.401787 1.453253
1048576 0.562318 0.558435 1.330910 1.345867
524288 0.570140 0.567899 1.630715 1.883340
262144 0.570461 0.569180 2.328472 2.177891
131072 0.570325 0.570285 2.357071 2.205857
65536 0.570335 0.569650 1.842244 1.968639
32768 0.569601 0.569621 1.501879 1.559045
16384 0.567431 0.566254 1.169951 1.247259
8192 0.563566 0.562097 0.786936 0.833876
4096 0.564187 0.553214 0.518219 0.807253
2048 0.552617 0.553813 0.516911 0.791957
1024 0.540084 0.522983 0.510469 0.725212
512 0.509960 0.454707 0.490421 0.621359
256 0.477612 0.391437 0.468864 0.481203
128 0.450704 0.421053 0.400000 0.278261
On Zen 3:
# Ints per cycle
# n normal expect AVX2 AVX2_unroll
128 0.198142 0.177285 0.673684 0.374269
256 0.374269 0.336842 1.122807 0.962406
512 0.561404 0.396285 1.924812 1.347368
1024 0.585812 0.402200 3.368421 2.694737
2048 0.612440 0.411410 2.836565 3.592982
4096 0.612440 0.417789 2.629012 4.145749
8192 0.626683 0.419414 2.661468 5.013464
16384 0.629428 0.420232 2.728847 5.258023
32768 0.629887 0.420437 2.703184 5.226156
65536 0.630809 0.420642 3.193762 5.372684
131072 0.631040 0.422911 3.101855 5.331164
262144 0.629772 0.420719 3.108845 5.360160
524288 0.630751 0.420783 2.422235 5.402135
1048576 0.631184 0.420764 2.097454 5.460935
2097152 0.631162 0.422665 1.937176 5.322937
4194304 0.630690 0.421456 1.999682 4.177444
8388608 0.627358 0.420154 2.007738 3.061207
16777216 0.618820 0.418670 2.549933 2.558949
33554432 0.621342 0.418237 2.360456 2.400368
67108864 0.623856 0.418145 2.394890 2.419667
134217728 0.625304 0.417954 2.421189 2.449932
268435456 0.626265 0.417947 2.452580 2.475416
536870912 0.626237 0.417929 2.441393 2.459276
536870912 0.626340 0.417924 2.446799 2.465893
268435456 0.626351 0.417885 2.438281 2.455598
134217728 0.626210 0.417981 2.430257 2.454633
67108864 0.621872 0.418221 2.435699 2.456569
33554432 0.615973 0.418177 2.423352 2.464991
16777216 0.615432 0.418174 2.377893 2.401301
8388608 0.616635 0.418836 2.178726 2.195453
4194304 0.630528 0.420659 2.144731 2.754934
2097152 0.631170 0.420805 4.347582 5.448535
1048576 0.629830 0.420802 4.644690 5.443698
524288 0.631126 0.420757 4.547479 5.419109
262144 0.630809 0.420796 4.442065 5.410609
131072 0.634289 0.420539 4.382799 5.314735
65536 0.630578 0.420744 4.422132 5.475021
32768 0.630809 0.420027 4.311579 5.322937
16384 0.632196 0.419823 4.311579 5.194673
8192 0.628510 0.419414 4.145749 4.899522
4096 0.623061 0.417789 3.992203 4.145749
2048 0.612440 0.411410 3.849624 3.170279
1024 0.585812 0.408293 2.994152 2.245614
512 0.561404 0.374269 2.245614 1.347368
256 0.481203 0.354571 1.347368 0.748538
128 0.421053 0.336842 0.842105 0.421053
On Zen 2:
# Ints per cycle
# n normal expect AVX2 AVX2_unroll
128 0.224561 0.210526 0.561404 0.481203
256 0.449123 0.320802 1.347368 1.122807
512 0.561404 0.384962 2.245614 1.347368
1024 0.573348 0.390542 2.994152 3.368421
2048 0.579513 0.396285 2.245614 0.769925
4096 0.602176 0.399220 1.996101 3.368421
8192 0.597172 0.399961 2.092999 4.311579
16384 0.557052 0.407522 1.774312 4.311579
32768 0.602597 0.402575 2.103209 4.311579
65536 0.611571 0.401077 1.783487 4.175863
131072 0.601651 0.407859 1.825007 4.508841
262144 0.608228 0.405629 1.760277 4.453535
524288 0.608362 0.405533 1.707133 4.460735
1048576 0.603678 0.402904 1.525548 4.465066
2097152 0.601913 0.401237 1.524494 4.422487
4194304 0.552595 0.388290 1.598199 1.880156
8388608 0.502226 0.367968 1.548654 1.485521
16777216 0.518826 0.359816 1.446895 1.508812
33554432 0.533121 0.365163 1.498006 1.497000
67108864 0.497297 0.364485 1.489156 1.489815
134217728 0.503081 0.363182 1.485826 1.497298
268435456 0.502331 0.362487 1.476280 1.483783
536870912 0.497604 0.363058 1.471117 1.486186
536870912 0.501185 0.362092 1.474724 1.484411
268435456 0.505392 0.362086 1.480385 1.487473
134217728 0.505914 0.362678 1.477032 1.491003
67108864 0.502462 0.365261 1.489096 1.491641
33554432 0.510745 0.369752 1.499425 1.512902
16777216 0.288380 0.365712 1.486786 1.508833
8388608 0.508408 0.369453 1.474655 1.501383
4194304 0.549125 0.384958 1.482000 1.811350
2097152 0.608269 0.403455 1.714878 4.018949
1048576 0.608483 0.405545 3.159389 4.377932
524288 0.608416 0.407738 3.235707 4.465066
262144 0.608389 0.407787 3.234190 4.476656
131072 0.611463 0.407859 3.229647 4.473752
65536 0.604710 0.407714 3.241789 4.611314
32768 0.610705 0.407137 3.217596 4.377238
16384 0.606411 0.402951 3.170279 4.268890
8192 0.605559 0.406753 3.079699 3.992203
4096 0.592250 0.405224 3.079699 3.592982
2048 0.592250 0.402200 3.170279 3.170279
1024 0.573348 0.390542 2.994152 2.245614
512 0.538947 0.384962 1.924812 1.347368
256 0.481203 0.374269 1.122807 2.245614
128 0.481203 0.336842 1.684211 0.421053
On Zen:
# Ints per cycle
# n normal expect AVX2 AVX2_unroll
128 0.126984 0.122605 0.053872 0.126984
256 0.187135 0.182336 0.215488 0.374269
512 0.384384 0.278867 0.346883 0.677249
1024 0.270899 0.251721 0.451499 0.917563
2048 0.474074 0.326948 0.462511 0.729345
4096 0.421399 0.336621 0.702332 1.094017
8192 0.434266 0.334641 0.603596 0.702332
16384 0.435097 0.340397 0.682326 0.995867
32768 0.382607 0.336372 0.682837 0.835066
65536 0.247377 0.270015 0.651322 0.729637
131072 0.474074 0.332836 0.756942 0.902998
262144 0.460115 0.331397 0.741676 0.737171
524288 0.386670 0.310371 0.690772 0.679873
1048576 0.447765 0.263413 0.621113 0.639005
2097152 0.457797 0.327805 0.721960 0.666532
4194304 0.463389 0.329218 0.782020 0.794948
8388608 0.467029 0.329897 0.859277 0.855042
16777216 0.484859 0.323049 0.903128 1.051807
33554432 0.485079 0.334300 1.365597 1.332054
67108864 0.484846 0.334430 1.382202 1.341124
134217728 0.484798 0.334393 1.388014 1.344886
268435456 0.483813 0.334464 1.384969 1.343667
536870912 0.484345 0.334486 1.380940 1.349843
536870912 0.484308 0.334349 1.383726 1.346343
268435456 0.482901 0.334368 1.390845 1.350834
134217728 0.483038 0.334411 1.389173 1.348572
67108864 0.483145 0.334543 1.382173 1.339797
33554432 0.482469 0.334347 1.373056 1.339054
16777216 0.484889 0.334234 1.352430 1.320905
8388608 0.485658 0.334543 1.318118 1.295186
4194304 0.482499 0.333650 1.255438 1.245587
2097152 0.499248 0.337774 1.263211 1.287671
1048576 0.513389 0.342309 1.319222 1.791556
524288 0.513271 0.342462 1.909223 1.699365
262144 0.513307 0.342414 2.043148 1.760585
131072 0.513235 0.342350 1.728817 1.918277
65536 0.512801 0.342189 1.603916 1.882569
32768 0.511648 0.341932 1.961686 1.932531
16384 0.509643 0.341163 1.865209 1.850045
8192 0.504558 0.339635 1.996101 1.820444
4096 0.494686 0.336621 1.835125 1.835125
2048 0.474074 0.328838 1.723906 1.723906
1024 0.444444 0.323232 1.497076 1.673203
512 0.374269 0.296296 1.292929 1.422222
256 0.323232 0.263374 1.015873 1.185185
128 0.296296 0.222222 0.888889 0.507937
On Tiger Lake:
# Ints per cycle
# n normal expect AVX2 AVX2_unroll
128 0.507937 0.416938 0.220690 0.066082
256 0.677249 0.448336 0.523517 0.549356
512 0.986513 0.881239 0.702332 0.990329
1024 1.365333 1.253366 0.519007 1.102260
2048 1.505882 1.455579 0.634449 1.074502
4096 1.618972 1.579637 0.647282 1.081595
8192 1.666395 1.436185 0.598349 1.104788
16384 1.697296 1.689420 0.625845 1.077824
32768 1.655786 1.708892 0.699737 1.084818
65536 1.723180 1.713583 0.619556 1.340698
131072 1.718369 1.680906 1.005015 1.388548
262144 1.728601 1.710509 1.225676 1.425385
524288 1.523752 1.505083 1.162664 2.004450
1048576 1.510044 1.498694 1.268084 1.857777
2097152 1.322782 1.301467 1.209695 1.561971
4194304 1.313021 1.283718 1.168607 1.616382
8388608 1.302382 1.315042 1.158387 1.744677
16777216 1.295171 1.300065 1.203086 1.742706
33554432 1.298210 1.300364 1.193553 1.702402
67108864 1.298965 1.298582 1.201180 1.711111
134217728 1.290168 1.295982 1.405890 1.719377
268435456 1.298625 1.290871 1.724766 1.714571
536870912 1.298432 1.286179 1.714161 1.720285
536870912 1.292826 1.297217 1.724536 1.724241
268435456 1.299574 1.298605 1.726204 1.708710
134217728 1.296023 1.292664 1.737698 1.722198
67108864 1.298964 1.297574 1.727979 1.718513
33554432 1.299467 1.193327 1.731739 1.719353
16777216 1.301645 1.293439 1.708782 1.709346
8388608 1.296258 1.290181 1.618332 1.638156
4194304 1.263116 1.282948 1.546646 1.573795
2097152 1.296268 1.273609 1.428083 1.517347
1048576 1.580714 1.527439 1.327631 1.708135
524288 1.722903 1.713628 2.477650 2.922518
262144 1.729501 1.727769 2.557078 2.947558
131072 1.728407 1.725903 1.135826 1.759970
65536 1.725494 1.720240 1.119126 1.474376
32768 1.719383 1.710765 0.812819 1.093615
16384 1.702234 1.686638 0.811491 1.110629
8192 1.682136 1.641354 0.804873 1.096947
4096 1.652279 1.412414 0.803925 1.081881
2048 1.543331 1.426184 0.735104 1.067779
1024 1.216152 1.278402 1.992218 0.959700
512 1.221957 1.201878 0.695652 1.089362
256 1.000000 0.583144 0.744186 0.992248
128 0.761905 0.512000 0.677249 0.882759