Let’s recap the entire thread real quick:
We see HMSET taking increasing amounts of time. The test is:
- generate keys in the form ("data/%019d/xxxxxxxxxxxxxxxxxx", i) with i between 0 and N, in this case 0 to 500000 (or 50,000 also works for tests).
- for each key, do a HMSET of the same ~100 field-value pairs
- split the total N tests across K threads, so each thread handles N/K non-conflicting HMSETs
- then, do the same HMSET multiple times, triggering a replace of every field-value pair in the set (all in one argument)
For easier testing, I converted the Java behavior to Lua so we can run it with redis-benchmark directly. See
https://gist.github.com/mattsta/f27f1856f22fab67dc58 (Note: it doesn’t split the input among all 10 parallel clients, it just lets each client do every 50,000 HMSET operations).
The tests below are on 2.8.9, Linux 3.13.0-22-lowlatency, Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz, 24 GB RAM.
We can see the increasing time behavior here (all defaults, no saving):
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 2957.39 ms
1 requests completed in 3312.97 ms
1 requests completed in 3471.98 ms
1 requests completed in 3531.64 ms
1 requests completed in 3569.08 ms
1 requests completed in 3616.65 ms
1 requests completed in 3614.22 ms
1 requests completed in 3640.54 ms
1 requests completed in 3676.59 ms
1 requests completed in 3691.97 ms
1 requests completed in 3701.63 ms
1 requests completed in 3707.43 ms
1 requests completed in 3715.06 ms
1 requests completed in 3728.09 ms
1 requests completed in 3730.66 ms
1 requests completed in 3755.90 ms
1 requests completed in 3756.98 ms
1 requests completed in 3761.99 ms
1 requests completed in 3773.13 ms
1 requests completed in 3775.28 ms
1 requests completed in 3779.64 ms
1 requests completed in 3794.19 ms
1 requests completed in 3806.37 ms
1 requests completed in 3802.50 ms
1 requests completed in 3802.10 ms
1 requests completed in 3804.84 ms
1 requests completed in 3801.41 ms
1 requests completed in 3809.59 ms
1 requests completed in 3807.96 ms
1 requests completed in 3810.22 ms
1 requests completed in 3815.79 ms
1 requests completed in 3828.82 ms
1 requests completed in 3820.12 ms
1 requests completed in 3830.07 ms
1 requests completed in 3833.39 ms
1 requests completed in 3839.27 ms
1 requests completed in 3836.13 ms
1 requests completed in 3833.77 ms
1 requests completed in 3838.66 ms
1 requests completed in 3842.17 ms
1 requests completed in 3852.43 ms
1 requests completed in 3857.80 ms
1 requests completed in 3855.75 ms
1 requests completed in 3859.72 ms
1 requests completed in 3855.54 ms
1 requests completed in 3857.76 ms
1 requests completed in 3855.38 ms
1 requests completed in 3862.82 ms
1 requests completed in 3863.22 ms
1 requests completed in 3868.30 ms
1 requests completed in 3869.38 ms
1 requests completed in 3871.46 ms
1 requests completed in 3862.30 ms
1 requests completed in 3855.73 ms
1 requests completed in 3862.29 ms
1 requests completed in 3867.02 ms
1 requests completed in 3863.58 ms
1 requests completed in 3861.85 ms
1 requests completed in 3864.25 ms
1 requests completed in 3864.58 ms
1 requests completed in 3868.04 ms
1 requests completed in 3883.58 ms
But, if I try again I see different behavior. We see cycles of increase with drop off, even after a FLUSHALL.
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 2555.01 ms
[increasing...]
1 requests completed in 3066.29 ms
1 requests completed in 3180.87 ms
1 requests completed in 3229.82 ms
1 requests completed in 3263.30 ms
1 requests completed in 3288.94 ms
1 requests completed in 3313.51 ms
1 requests completed in 3346.59 ms
1 requests completed in 3345.49 ms
1 requests completed in 3367.54 ms
1 requests completed in 3379.32 ms
1 requests completed in 3396.67 ms
[drops off then increases again…]
1 requests completed in 3148.03 ms
1 requests completed in 3174.49 ms
1 requests completed in 3166.26 ms
1 requests completed in 3204.28 ms
1 requests completed in 3205.40 ms
1 requests completed in 3201.72 ms
1 requests completed in 3215.64 ms
1 requests completed in 3226.69 ms
[manual FLUSHALL in other console]
[starts increasing again]
1 requests completed in 3104.83 ms
1 requests completed in 3507.88 ms
1 requests completed in 3591.01 ms
1 requests completed in 3634.19 ms
1 requests completed in 3684.58 ms
1 requests completed in 3704.18 ms
1 requests completed in 3737.98 ms
1 requests completed in 3762.40 ms
1 requests completed in 3786.08 ms
1 requests completed in 3795.76 ms
1 requests completed in 3816.11 ms
1 requests completed in 3855.56 ms
1 requests completed in 3880.03 ms
1 requests completed in 3566.14 ms
1 requests completed in 3639.80 ms
[drops off then increases again…]
1 requests completed in 3233.35 ms
1 requests completed in 3261.48 ms
1 requests completed in 3273.89 ms
1 requests completed in 3269.99 ms
1 requests completed in 3275.23 ms
1 requests completed in 3294.85 ms
1 requests completed in 3492.93 ms
1 requests completed in 3522.14 ms
1 requests completed in 3522.19 ms
1 requests completed in 3545.02 ms
The increasing pattern never happens if we delete the hash between additions. (HMSET key f1 v1 f2 v2…; DEL key; HMSET key f1 v1 f2 v2…)
Example:
matt@neon:~/repos/redis/src% while true; do ./hmset.sh del |grep ms; done
1 requests completed in 2948.66 ms
1 requests completed in 2927.06 ms
1 requests completed in 2930.17 ms
1 requests completed in 2936.77 ms
1 requests completed in 2927.65 ms
1 requests completed in 2930.67 ms
1 requests completed in 2923.08 ms
1 requests completed in 2929.65 ms
1 requests completed in 2920.07 ms
1 requests completed in 2930.07 ms
1 requests completed in 2927.66 ms
1 requests completed in 2940.00 ms
1 requests completed in 2932.67 ms
1 requests completed in 2931.67 ms
1 requests completed in 2934.20 ms
1 requests completed in 2936.38 ms
1 requests completed in 2935.94 ms
1 requests completed in 2929.72 ms
1 requests completed in 2931.22 ms
1 requests completed in 2935.13 ms
1 requests completed in 2930.59 ms
1 requests completed in 2931.79 ms
1 requests completed in 2939.92 ms
1 requests completed in 2939.28 ms
1 requests completed in 2938.17 ms
1 requests completed in 2946.88 ms
1 requests completed in 2935.52 ms
1 requests completed in 2942.49 ms
1 requests completed in 2938.85 ms
1 requests completed in 2934.49 ms
1 requests completed in 2934.91 ms
1 requests completed in 2950.79 ms
1 requests completed in 2942.07 ms
1 requests completed in 2932.05 ms
1 requests completed in 2946.40 ms
1 requests completed in 2936.98 ms
1 requests completed in 2941.45 ms
1 requests completed in 2943.31 ms
1 requests completed in 2944.87 ms
1 requests completed in 2950.28 ms
1 requests completed in 2937.97 ms
1 requests completed in 2941.62 ms
1 requests completed in 2941.05 ms
So, the issue is with setting the same hash multiple times and perhaps over-fragmenting the memory allocator. The behavior is worse under libc malloc and better under jemalloc.
A profile of these runs shows the dominant functions are decrRefCount and jemalloc’s free, consisting of a combined 25% to 35% of the total runtime. Coming in third is dictAdd with another 10% of total runtime.
We can intentionally make the situation worse overall, but less variable.
If we increase ziplist size so *none* of our hashes become hash tables, everything remains equally slow with perhaps a little increase->drop off effect.
./redis-server --save "" --hash-max-ziplist-entries 1000 --hash-max-ziplist-value 4096
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 7301.10 ms
1 requests completed in 8831.90 ms
1 requests completed in 8829.57 ms
1 requests completed in 8708.75 ms
1 requests completed in 8692.22 ms
1 requests completed in 8612.53 ms
1 requests completed in 8607.87 ms
1 requests completed in 8613.58 ms
1 requests completed in 8606.86 ms
1 requests completed in 8614.39 ms
1 requests completed in 8611.24 ms
1 requests completed in 8759.91 ms
1 requests completed in 8756.07 ms
1 requests completed in 8870.83 ms
1 requests completed in 8877.55 ms
1 requests completed in 8872.40 ms
1 requests completed in 8874.90 ms
1 requests completed in 8888.93 ms
1 requests completed in 8867.36 ms
1 requests completed in 8865.87 ms
1 requests completed in 8854.49 ms
1 requests completed in 8796.68 ms
1 requests completed in 8798.44 ms
1 requests completed in 8678.47 ms
1 requests completed in 8681.05 ms
Fun fact: after all these tests, when I start Redis it starts with a fragmentation ratio > 15:
# Memory
used_memory:502304
used_memory_human:490.53K
used_memory_rss:7929856
used_memory_peak:502304
used_memory_peak_human:490.53K
used_memory_lua:33792
mem_fragmentation_ratio:15.79
mem_allocator:jemalloc-3.2.0
The ratio drops back down after more data is allocated inside Redis:
# Memory
used_memory:895477296
used_memory_human:853.99M
used_memory_rss:922492928
used_memory_peak:895701840
used_memory_peak_human:854.21M
used_memory_lua:58368
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.2.0
> I can try DEL-eting the hash as you suggested if you think it can help investigation but it is not what real application will be doing - the hash will always be updated and never deleted.
Will you always be updating every field of the hash at the same time? The test effectively replaces the entire hash every time without deleting it, which is worst-case behavior.
> Actually, if you think the results above are still demonstrating some problem worth investigation, I can give you access to that EC2 box so you can take a look.
> Please let me know if any of you need the credentials and I will email it to you off the list.
Some quick notes:
- Let’s remove a zero from N so each run doesn’t take 60 to 90 seconds per attempt. You can still observe the same effect with 5 seconds per run.
- We need more than 10 runs. It can take 10 runs for the initial ramp up to to stabilize.
You are right that 2.8.9 increases more quickly at first and is slower overall. The 3.0 betas allow short strings to be repressed more efficiently, which may be the speed difference we see.
Here are some tests using the Java approach on 2.8.9:
#00 time = 3873
Press enter to continue
Updating data...
#01 time = 5284
#02 time = 5407
#03 time = 5833
#04 time = 6307
#05 time = 6710
#06 time = 6962
#07 time = 7031
#08 time = 7241
#09 time = 7303
#10 time = 7377
#11 time = 7394
#12 time = 7495
#13 time = 7517
#14 time = 7521
#15 time = 7519
#16 time = 7557
#17 time = 7569
#18 time = 7632
#19 time = 7619
#20 time = 7670
#21 time = 7668
#22 time = 7702
#23 time = 7650
#24 time = 7704
#25 time = 7675
#26 time = 7738
#27 time = 7737
#28 time = 7749
#29 time = 7725
#30 time = 7740
So, still increasing.
If we do a FLUSHALL while the test is still running:
#33 time = 2006
#34 time = 5223
#35 time = 5245
#36 time = 5964
#37 time = 6569
#38 time = 7145
#39 time = 7734
#40 time = 8257
#41 time = 8634
#42 time = 8713
#43 time = 8879
#44 time = 8920
#45 time = 9014
#46 time = 9040
#47 time = 9048
#48 time = 9122
#49 time = 9150
#50 time = 9183
#51 time = 9146
The time used drops, then quickly jumps back to where it left off.
Conclusion: unknown. Something about blaming the memory allocator. We can do one run and get increasing runtime. Then we can do another run and get no increasing runtime. Then we can do a FLUSHALL when we have no increasing runtime to generate increasing runtime. Sometimes runtime increases over a duration of 10 to 20 tests, then drops back down.
-Matt