Hi
I've made some local performance testing
1) Text
./memtier_benchmark --server XYZ --port 12345 -P memcache_text
ARM64 text
=========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
-------------------------------------------------------------------------
Sets 985.28 --- --- 20.02700 67.22
Gets 9842.00 0.00 9842.00 20.01900 248.83
Waits 0.00 --- --- 0.00000 ---
Totals 10827.28 0.00 9842.00 20.02000 316.05
X86 text
=========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
-------------------------------------------------------------------------
Sets 931.04 --- --- 20.06800 63.52
Gets 9300.21 0.00 9300.21 20.32600 235.13
Waits 0.00 --- --- 0.00000 ---
Totals 10231.26 0.00 9300.21 20.30200 298.66
2) Binary
./memtier_benchmark --server XYZ --port 12345 -P memcache_binary
ARM64 binary
=========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
-------------------------------------------------------------------------
Sets 829.68 --- --- 23.46500 63.90
Gets 8287.69 0.00 8287.69 23.56100 314.75
Waits 0.00 --- --- 0.00000 ---
Totals 9117.37 0.00 8287.69 23.55200 378.65
X86 binary
=========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
-------------------------------------------------------------------------
Sets 829.32 --- --- 23.63600 63.87
Gets 8284.10 0.00 8284.10 23.58600 314.61
Waits 0.00 --- --- 0.00000 ---
Totals 9113.42 0.00 8284.10 23.59100 378.48
Text is faster on the ARM64. Binary is similar for both.
The benchmarking tool runs on different machine than the ones running Memcached:
The ARM64 server has this spec:
$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: 0x48
Model: 0
Stepping: 0x1
BogoMIPS: 200.00
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 32768K
NUMA node0 CPU(s): 0-3
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
The x64 one:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities
Both with 16GB RAM.
Regards,
Martin