Redis is slowing down with each test

151 views
Skip to first unread message

Dmitry Andrianov

unread,
May 1, 2014, 6:45:44 AM5/1/14
to redi...@googlegroups.com
Hi.
I was running some load-testing for Redis when I noticed that test takes more and more time with each run.
That was a bit puzzling because test just overwrites the same hashes with the very same data so dataset does not really change in size.

Eventually I created a synthetic test which demonstrates the issue. I am attaching the Java code - it uses Jedis library which is pretty low level as I understand.
So what test does, it creates a java HashMap with the static data (it is only populated once when application starts) and then continuously invokes HMSET with key "data/%019d/xxxxxxxxxxxxxxxxxx" (the integer goes from 0 to 500000) sending all the fields at once. 5 parallel threads are used (one does 0-100000, another - 100000-200000 etc)
When the iteration is over, the test does everything again. And time taken by each iteration is measured. As you can see, the time steadily increases:

#00 time = 48282
#01 time = 57322
#02 time = 64775
#03 time = 72576
#04 time = 81076
#05 time = 88446
#06 time = 93550
#07 time = 97255
#08 time = 98475
#09 time = 99898
#10 time = 100081
...
#100 time = 111468
...
#200 time = 114601
...
#300 time = 115964
...
#400 time = 116948


I agree that the increase becomes smaller and smaller with time but it is still there.
And even if I should not take the time of the first iteration into account (the DB was absolutely empty and probably was a bit faster inserting the first entries), still you can see the time doubled by the end of the test.

My redis.config contains just:

save ""

And output to "info memory" command after a few iterations is:

# Memory
used_memory:8948993184
used_memory_human:8.33G
used_memory_rss:9241251840
used_memory_peak:9037430864
used_memory_peak_human:8.42G
used_memory_lua:33792
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.2.0

From what I say, these numbers do not change as the test progresses - they are pretty much the same regardless if it is iteration #1 or #500. (That is expected of course because data set does not grow).

So why does redis becomes slower over time?
My only idea would be memory fragmentation but I am not sure about it. Also it is not clear if anything can be done.
I am a bit worried of a runaway process. Well, it does not really runs away, it is more like drifting but still how can I be sure that after a week of operations it won't slow down to a point where all requests will be just timing out?

Thanks in advance.
Dmitry








Rediska.java

Greg Andrews

unread,
May 1, 2014, 11:08:37 AM5/1/14
to redi...@googlegroups.com
Let me isolate two details of your description and ask you about them:

it creates a java HashMap with the static data (it is only populated once when application starts)
When the iteration is over, the test does everything again.
 
When you say "does everything again", does this perhaps include creating a new HashMap on each iteration, rather than re-using the HashMap created before the first iteration?

Or does it mean each iteration creates a new HashMap with the same name, populates it with keys in Redis, then at the end of the iteration deletes the HashMap, so each iteration includes a create/destroy of a HashMap?

If I remember Java well enough, those three approaches can make Java create a new HashMap object and allocate memory for its internal structures on each iteration, rather than re-use the existing object/memory as you might expect.  This could account for the increase in time that the Java code requires to perform the actions.  There might even be garbage collection of deleted data structures slowing down the Java client code.

The real test, though, is to measure just the interactions with Redis.  When the Java code sends commands to the Redis server process, is it really the replies from Redis that are coming back more slowly?

  -Greg

Dmitry Andrianov

unread,
May 1, 2014, 12:07:22 PM5/1/14
to redi...@googlegroups.com


On Thursday, 1 May 2014 16:08:37 UTC+1, GregA wrote:
Let me isolate two details of your description and ask you about them:

it creates a java HashMap with the static data (it is only populated once when application starts)
When the iteration is over, the test does everything again.
 
When you say "does everything again", does this perhaps include creating a new HashMap on each iteration, rather than re-using the HashMap created before the first iteration?

Or does it mean each iteration creates a new HashMap with the same name, populates it with keys in Redis, then at the end of the iteration deletes the HashMap, so each iteration includes a create/destroy of a HashMap?

If I remember Java well enough, those three approaches can make Java create a new HashMap object and allocate memory for its internal structures on each iteration, rather than re-use the existing object/memory as you might expect.  This could account for the increase in time that the Java code requires to perform the actions.  There might even be garbage collection of deleted data structures slowing down the Java client code.

I populate HashMap only once and then reuse the same HashMap in all the hmset calls.
I am almost 100% sure the issue has nothing to do with garbage collection on the side of Java test client - it barely allocates any memory as it runs.
 

The real test, though, is to measure just the interactions with Redis.  When the Java code sends commands to the Redis server process, is it really the replies from Redis that are coming back more slowly?

I am quite sure about that. If I kill my Java client and restart it again - the time per iteration will be very close to time reported by the last iteration before I kill it.
To me, this eliminates the test itself from the equation because restarting it cures all the memory issues (even if test had it which I really doubt).

However, restarting Redis, immediately brings iteration times to their initial level (from which they then grow).

Cheers.

PS: there is Java source attached to my original post so you can check what exactly I am doing

Josiah Carlson

unread,
May 1, 2014, 1:40:36 PM5/1/14
to redi...@googlegroups.com
This smells to me like you are primarily exercising the memory allocator, which may or may not be running into fragmentation issues.

Rather than testing repeated bulk inserts with no reads, have you considered testing what you are actually planning on using Redis for?

 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

unread,
May 1, 2014, 1:48:00 PM5/1/14
to Redis DB
Hello,

an easy way to check if this is an artifact of the test of a Redis
actual slowdown is to stop the Java test and re-run it without
restarting Redis.
It restarts fast again, or is still slow as the latest iteration of
the previous execution as it should if the problem is in Redis?

Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Dmitry Andrianov

unread,
May 1, 2014, 1:52:58 PM5/1/14
to redi...@googlegroups.com
Josiah,
it is exactly what I am doing.
There will be two processes - one continuously updating data in Redis, and another - periodically reading that data.
And "the data" itself will be thousands/millions of hashes with a hundred keys each.
My current test only simulates the producer of course. But as I will have way more data updated per second than queried, I would prefer to performance test this part separately before adding querying.
So the only thing I need is my producers to have roughly the same latency updating data so that latency does not grow over time uncontrollably.

Cheers

Josiah Carlson

unread,
May 1, 2014, 1:54:58 PM5/1/14
to redi...@googlegroups.com
Small correction/redirection: the repeated operations you are performing could result in the continuing growth of the page tables for the Redis process. As page tables grow, the ability for the processor to cache the mapping of pages to in-memory regions with the TLB (which maps process virtual-memory addresses to the actual physical regions in-memory) for the process is reduced, and growing TLB misses would result in slower Redis performance. That restarting Redis helps the situation corroborates this idea (along with general memory fragmentation), which you might be able to verify with tools described here: http://lwn.net/Articles/379748/ . Alternatively, you could try running with huge pages to see if Redis performance stops degrading as drastically, and if so, you've got your answer.

 - Josiah

Dmitry Andrianov

unread,
May 1, 2014, 1:55:23 PM5/1/14
to redi...@googlegroups.com
Salvatore,
I have already done it and the test starts slow.
Only restarting Redis makes it run faster again.

Cheers

Dmitry Andrianov

unread,
May 1, 2014, 1:57:36 PM5/1/14
to redi...@googlegroups.com
Josiah,
thanks I will read/try these tomorrow.

However it is a bit of surprise to me the page table grows given that both vsize and rss stay pretty much the same.

Thanks!

Salvatore Sanfilippo

unread,
May 1, 2014, 2:04:40 PM5/1/14
to Redis DB
On Thu, May 1, 2014 at 7:55 PM, Dmitry Andrianov
<dmitry.a...@gmail.com> wrote:

> Only restarting Redis makes it run faster again.

That's interesting, please could you verify that the memory peak is
constant during the tests?
Also how the fragmentation ration changes during them?

If it is a matter of allocation you should see either the slowdown
progress together with the fragmentation ratio, or, as the
fragmentation ration starts to be constant, also the allocation
slowdown should stop.

If there is a monotonic increase of the memory usage it is also
possible that there is a memory leak triggered by the code.
I'm a so huge Java noob that I'm not even able to run your example
code to verify what happens, but tomorrow I can ask for help to do so.

Salvatore

Josiah Carlson

unread,
May 1, 2014, 2:09:36 PM5/1/14
to redi...@googlegroups.com
It's a hypothesis that fits the existing data, but which requires testing. I wouldn't bet my kid on my being right, but I'd wager a beer. :)

 - Josiah

Dmitry Andrianov

unread,
May 2, 2014, 6:01:56 AM5/2/14
to redi...@googlegroups.com
Salvatore,
To answer your questions I ran the following test:
1. run the first iteration (not timed) just to populate the data set
2. get memory info from Redis
3. run a few more iterations (timed)
4. grab memory info again

So, after the first iteration (all data is already loaded)

127.0.0.1:6379> info memory
# Memory
used_memory:8954602608
used_memory_human:8.34G
used_memory_rss:9156677632
used_memory_peak:8954602608
used_memory_peak_human:8.34G
used_memory_lua:33792
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.2.0

$ cat /proc/19433/status
...
VmPeak:     8979628 kB
VmSize:     8979628 kB
VmLck:           0 kB
VmPin:           0 kB
VmHWM:     8942068 kB
VmRSS:     8942068 kB
VmData:     8962412 kB
VmStk:         136 kB
VmExe:         836 kB
VmLib:        2984 kB
VmPTE:       17528 kB
VmSwap:           0 kB
Threads:    3
...

Then goes test:

#01 time = 56495
#02 time = 63967
#03 time = 71984
#04 time = 79716
#05 time = 86748
#06 time = 92847
#07 time = 96445
#08 time = 98373
#09 time = 100583


And then again

127.0.0.1:6379> info memory
# Memory
used_memory:8948695664
used_memory_human:8.33G
used_memory_rss:9156677632
used_memory_peak:8954857056
used_memory_peak_human:8.34G
used_memory_lua:33792
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.2.0

$ cat /proc/19433/status
...
VmPeak:     8979628 kB
VmSize:     8979628 kB
VmLck:           0 kB
VmPin:           0 kB
VmHWM:     8942068 kB
VmRSS:     8942068 kB
VmData:     8962412 kB
VmStk:         136 kB
VmExe:         836 kB
VmLib:        2984 kB
VmPTE:       17528 kB
VmSwap:           0 kB
Threads:    3
...

So
1. the peak increased only slightly
2. fragmentation as it is reported by Redis does not change
3. The RSS is not growing

I think that fragmentation ratio calculation in Redis are very simplistic so that value is not a good indication of what is really going on with the memory. Under my test Redis continuously allocates and frees memory blocks of random size and this just has to create lots of fragments. I am not sure how good Jemalloc in dealing with these - I will read an article about Jemalloc later today.
Also I still haven't tried using huge pages - also in TODO list.

Regarding running the test yourself - if you like I can build you JAR file so you won't need to compile anything (you will still need to have Java installed though).

Thanks

Salvatore Sanfilippo

unread,
May 2, 2014, 6:17:50 AM5/2/14
to Redis DB
Hello Dmitry,

thanks a lot for your additional info. I'm still very perplexed about
this being a jemalloc issue... I would expect a more logarithmic
behavior.
However today I'm allocating time to understand what is happening, so
expect a report for the end of the day.

About the JAR, maybe it's time I learn, if not Java, at least how to
compile and run a Redis-related program. I'll ask for help in the IRC
channel.

Cheers,
Salvatore

On Fri, May 2, 2014 at 12:01 PM, Dmitry Andrianov
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



Salvatore Sanfilippo

unread,
May 2, 2014, 7:08:50 AM5/2/14
to Redis DB
Ok, I can compile the code and reproduce the issue with my Linux box.
To reset performances back to the initial state a FLUSHALL is enough,
no need to restart the server.

Investigating more...

Salvatore

Dmitry Andrianov

unread,
May 2, 2014, 7:09:20 AM5/2/14
to redi...@googlegroups.com

If you never did Java before, I will feel guilty making you to figure out how to add the dependencies and compile the thing.
So I packed everything into a Maven project which should be very easy to build.

$ tar -xzf Rediska.tgz

it will create you Rediska directory. Go in there and then just

$ mvn install

to compile it (assuming you have Maven. If not - "sudo apt-get install maven" for Ubuntu or something similar for other distributions)

Finally,

$ java -jar target/rediska-0.1.jar

to run it. It will generate initial dataset (connecting to Redis at localhost) - requires 6 Gb of free RAM available to Redis
Then it will pause to let you capture metrics. When you press ENTER - it will continue running more iterations.

Hope that helps.
Rediska.tgz

Salvatore Sanfilippo

unread,
May 2, 2014, 7:12:15 AM5/2/14
to Redis DB
On Fri, May 2, 2014 at 1:09 PM, Dmitry Andrianov
<dmitry.a...@gmail.com> wrote:
> If you never did Java before, I will feel guilty making you to figure out
> how to add the dependencies and compile the thing.
> So I packed everything into a Maven project which should be very easy to
> build.

Thank you, actually this was a good exercise :-)

I spent 10 minutes to figure out I needed to remove ".class" when
trying to run the executable ;-) But now it works.

Yet another data point, now that I can reproduce the issue. The issue
is still there if I compile Redis to use glibc malloc instead of
jemalloc.
Of course it is possible that different allocators are suffering from
the same problem, but I wonder if instead it has more to do with some
other thing related to the hash data type implementation...

Well now that I can reproduce the issue, I guess it is just a matter
of time to track the source cause.

Dmitry Andrianov

unread,
May 2, 2014, 7:30:12 AM5/2/14
to redi...@googlegroups.com
Sorry, I forgot to mention I tried libc malloc and from my test it was much worse - unlike 1.02 fragmentation wiith jemalloc it was about 1.5 so amount of memory actually used was a lot bigger.

Salvatore Sanfilippo

unread,
May 2, 2014, 8:26:21 AM5/2/14
to Redis DB
Hello Dmitry, actually I was wrong, with libc malloc while the
fragmentation ratio is 1.26, the test runs at roughly the same time
again and again.
My samples at the end of this email. So this points to the direction
of a jemalloc-specific issue. However I'll try with jemalloc on my
system for a similar number of runs to have more data points.

#00 time = 47913
#01 time = 55707
#02 time = 56036
#03 time = 64595
#04 time = 64263
#05 time = 95277
#06 time = 66504
#07 time = 78023
#08 time = 67147
#09 time = 66644
#10 time = 72622
#11 time = 64403
#12 time = 77629
#13 time = 66823
#14 time = 75559
#15 time = 68355
#16 time = 68973
#17 time = 70365
#18 time = 66513
#19 time = 72503
#20 time = 67431
#21 time = 74072
#22 time = 72226
#23 time = 70468
#24 time = 72433
#25 time = 66625
#26 time = 72087
#27 time = 65946
#28 time = 71951
#29 time = 68404
#30 time = 71691
#31 time = 97123
#32 time = 66155
#33 time = 72828
#34 time = 66426
#35 time = 73305
#36 time = 95107
#37 time = 65675
#38 time = 71839
#39 time = 64720
#40 time = 74692
#41 time = 91147
#42 time = 75335
#43 time = 70435
#44 time = 64694
#45 time = 72476
#46 time = 73758
#47 time = 89848
#48 time = 69886
#49 time = 64304
#50 time = 71186
#51 time = 65715
#52 time = 97313
#53 time = 68610
#54 time = 66842
#55 time = 70840
#56 time = 65535
#57 time = 93396
#58 time = 71078
#59 time = 67715
#60 time = 69595
#61 time = 66374
#62 time = 73074
#63 time = 85355
#64 time = 71900

On Fri, May 2, 2014 at 1:30 PM, Dmitry Andrianov
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



Salvatore Sanfilippo

unread,
May 2, 2014, 8:52:04 AM5/2/14
to Redis DB
And... a new update. I was trusting too little samples, actually I'm
not able to reproduce the issue here, I see some increase in the first
samples but then it goes down again to ~55 seconds per execution and
remains there.

Please could you try the following two things?

1) Disable persistence at all (no save entries nor AOF enabled), and
see if you can reproduce.
2) Modify the program so that it accesses items in random order every
time, and see if you can reproduce.

I think this can provide additional hints to us in the case what is
happening is that, we are seeing more and more cache misses due to the
internal jemalloc arenas handling, or because of the memory page
kernel handling due to copy on write.

In the meantime I'm doing more tests.

Salvatore

Salvatore Sanfilippo

unread,
May 2, 2014, 9:41:49 AM5/2/14
to Redis DB
I'm at 200 tests now, same jemalloc as your, not modified benchmark:

used_memory:8948728048
used_memory_human:8.33G
used_memory_rss:9206165504
used_memory_peak:8996357232
used_memory_peak_human:8.38G
used_memory_lua:33792
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.2.0

Stopping the test now since it is clear that for some reason it is not
happening here.

Salvatore

Dmitry Andrianov

unread,
May 2, 2014, 10:25:08 AM5/2/14
to redi...@googlegroups.com
Salvatore,
I can confirm that MALLOC=libc demonstrated less growth of the time taken per iteration. But if I remember correctly there were two issues with it:
1. It consumed more memory and reported fragmentation was higher.
2. The memory consumption was slowly growing up. It may not be visible on the very first iterations but it starts at some point.
I did not like idea of growing memory consumption much more of growing response time so I did not experiment with libc malloc a lot.

To answer your questions: I run redis with config file consisting of a single line:
save ""

I believe it disables snapshots so there should not no copy-on-write issues at all. (?)
Also I worked from the assumption that AOF is disabled by default so I do not need to disable it explicitly. Do I?

I will try accessing keys in random order and will let you know.

Many thanks.

Salvatore Sanfilippo

unread,
May 2, 2014, 10:30:38 AM5/2/14
to Redis DB
On Fri, May 2, 2014 at 4:25 PM, Dmitry Andrianov
<dmitry.a...@gmail.com> wrote:
> I believe it disables snapshots so there should not no copy-on-write issues
> at all. (?)

Exactly

> Also I worked from the assumption that AOF is disabled by default so I do
> not need to disable it explicitly. Do I?

Again, true, so you are running persistenceless.

> I will try accessing keys in random order and will let you know.

Thanks! I executed multiple tests both on osx and Linux. I'm not able
to reproduce it in any of the following setups: osx-libc osx-jemalloc
linux-libc linux-malloc. Really curious about what triggers it!

Salvatore Sanfilippo

unread,
May 2, 2014, 10:32:26 AM5/2/14
to Redis DB
p.s. would be great to know if some Redis user with a ~ 8GB available
can reproduce it in some environment, to understand if it is something
very peculiar of Dmitry environment or not.
Note ho win Dmitry output the time monotonically increases, it's a
pretty clear pattern if you run ~ 50 samples or alike.

Salvatore

Dmitry Andrianov

unread,
May 2, 2014, 11:02:34 AM5/2/14
to redi...@googlegroups.com
Salvatore,
I tried a quick test. In the application I changed


                for (long i = 0; i < items; i++) {
                    final String key = String.format("data/%019d/xxxxxxxxxxxxxxxxxx", firstItem + i);

to:

                List<Long> itemList = new ArrayList<Long>();
                for (long i = 0; i < items; i++) {
                    itemList.add(firstItem + i);
                }
                Collections.shuffle(itemList);

                for (int i = 0; i < items; i++) {
                    final String key = String.format("data/%019d/xxxxxxxxxxxxxxxxxx", itemList.get(i));

and added

import java.util.*;

at the top.
It is a quick&dirty test and I am not sure how good it is. But it seem to do its job
And I have:

Proceeding...
#01 time = 52489
#02 time = 63884
#03 time = 74920
#04 time = 87818
#05 time = 101196
#06 time = 112334
#07 time = 118549
#08 time = 120611
#09 time = 122213

so still not good. Stopping test and running it again - time keeps growing from the last result.
Sending FLUSHALL to Redis takes 68.93s and brings time back to initial level.

Also, I realised that I never mentioned my machine configuration. I am testing it on Ubuntu workstation (3.2.0-39-generic #62-Ubuntu SMP) which has 24 Gb of RAM. It is a physical hardware, no virtualisation is involved.

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Core(TM) i7 CPU         960  @ 3.20GHz
stepping        : 5
microcode       : 0x15
cpu MHz         : 1596.000
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe s
yscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2
 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6400.69
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

The test is not absolutely clean - there are obviously other processes running on the same workstation but I try to minimise all activities during test. I do not think any other process contributes to the result and there are plenty of free memory(about 9G) even during the test when data set is already populated.

My first attempt to run with huge pages was unsuccessful - it rendered that workstation almost unusable so I better postpone for now it and run next week on one of Amazon EC2 instances.

Cheers

Salvatore Sanfilippo

unread,
May 2, 2014, 11:06:22 AM5/2/14
to Redis DB
On Fri, May 2, 2014 at 5:02 PM, Dmitry Andrianov
<dmitry.a...@gmail.com> wrote:
> Also, I realised that I never mentioned my machine configuration. I am
> testing it on Ubuntu workstation (3.2.0-39-generic #62-Ubuntu SMP) which has
> 24 Gb of RAM. It is a physical hardware, no virtualisation is involved.

Thanks Dmitry, my test was executed in a remarkably similar computer:
Ubuntu 14.04, 24 GB of memory, Kernel is 3.13.
I hope other users will run the test on Linux computers and share the
result so at least we can understand if it is a common thing or not.

Matt Stancliff

unread,
May 2, 2014, 11:15:18 AM5/2/14
to redi...@googlegroups.com

On May 2, 2014, at 11:06 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

> On Fri, May 2, 2014 at 5:02 PM, Dmitry Andrianov
> <dmitry.a...@gmail.com> wrote:
>> Also, I realised that I never mentioned my machine configuration. I am
>> testing it on Ubuntu workstation (3.2.0-39-generic #62-Ubuntu SMP) which has
>> 24 Gb of RAM. It is a physical hardware, no virtualisation is involved.
>
> Thanks Dmitry, my test was executed in a remarkably similar computer:
> Ubuntu 14.04, 24 GB of memory, Kernel is 3.13.
> I hope other users will run the test on Linux computers and share the
> result so at least we can understand if it is a common thing or not.

Do we know which versions of Redis people are testing under?


-Matt

Salvatore Sanfilippo

unread,
May 2, 2014, 11:17:02 AM5/2/14
to Redis DB
On Fri, May 2, 2014 at 5:15 PM, Matt Stancliff <mstan...@gopivotal.com> wrote:
> Do we know which versions of Redis people are testing under?

Good call. I was testing "unstable", latest commit.

Dmitry Andrianov

unread,
May 2, 2014, 11:27:47 AM5/2/14
to redi...@googlegroups.com
Oops ) Forgot that bit

Redis 2.8.9 (00000000/0) 64 bit

Matt Stancliff

unread,
May 2, 2014, 12:13:38 PM5/2/14
to redi...@googlegroups.com

On May 2, 2014, at 11:27 AM, Dmitry Andrianov <dmitry.a...@gmail.com> wrote:

> Oops ) Forgot that bit
>
> Redis 2.8.9 (00000000/0) 64 bit

Can you try running with this configuration?
./redis-server --save “" --activerehashing no


-Matt

Salvatore Sanfilippo

unread,
May 2, 2014, 12:16:57 PM5/2/14
to Redis DB
On Fri, May 2, 2014 at 6:13 PM, Matt Stancliff <mstan...@gopivotal.com> wrote:
> Can you try running with this configuration?
> ./redis-server --save "" --activerehashing no

+1, sounds a good idea.

Another try worth it is to call DEL against the key before to HMSET it
instead of overwriting the existing key.

Salvatore Sanfilippo

unread,
May 2, 2014, 12:19:09 PM5/2/14
to Redis DB
Another thing that can provide a lot of info is to use a profiler:

Profile the first run.

Run another 200 iterations.

Profile it again.

Compare the profiler output to understand what is taking more time.

Just random ideas but in this specific case the only thing possible is
to try to observe more and more to get more context / state.

Salvatore

Dmitry Andrianov

unread,
May 2, 2014, 1:03:08 PM5/2/14
to redi...@googlegroups.com
I downloaded 3.0 beta 2 (Redis 2.9.51) and installed it to an Amazon m3.xlarge instance (Ubuntu 14.04 64 bit) and it gave me:

Updating data...
#01 time = 50866
#02 time = 55370
#03 time = 58052
#04 time = 59071
#05 time = 59893
#06 time = 60225
#07 time = 60468
#08 time = 60644
#09 time = 60964

It still grows but much slower compared to what I had on my 12.04.

Then I started it it as Matt suggested: ./redis-server --save "" --activerehashing no
And it gave me

Updating data...
#01 time = 51055
#02 time = 55640
#03 time = 58237
#04 time = 59876
#05 time = 60130
#06 time = 60631
#07 time = 60915
#08 time = 61281
#09 time = 61448

which I would call pretty much the same result.

I can try DEL-eting the hash as you suggested if you think it can help investigation but it is not what real application will be doing - the hash will always be updated and never deleted.

Finally, I built the same 2.8.9 I was using before. And ran it with src/redis-server --save ""
(without 'activerehashing no' just to replicate the same environment I was in so far) and that gave me:

Updating data...
#01 time = 55797
#02 time = 62236
#03 time = 69477
#04 time = 78776
#05 time = 83903
#06 time = 88015
#07 time = 90535
#08 time = 91836
#09 time = 92311

Which look much closer to what I had before. So 2.8.9 and 3.0 do it differently and 3.0 slows down a lot slower and may even settle on something like 61 second if more iterations are run.
While 2.8.9 seems to run away....

Actually, if you think the results above are still demonstrating some problem worth investigation, I can give you access to that EC2 box so you can take a look.
Please let me know if any of you need the credentials and I will email it to you off the list.

Many thanks.
Dmitry



Cheers

Dmitry Andrianov

unread,
May 2, 2014, 1:46:02 PM5/2/14
to redi...@googlegroups.com

Update: out of curiosity I ran two more iterations against 2.8.9 and time per iteration seems to be settling down:

Run #2:

Updating data...
#01 time = 92913
#02 time = 93092
#03 time = 93193
#04 time = 93459
#05 time = 93390
#06 time = 93559
#07 time = 93642
#08 time = 93837
#09 time = 93912

Run #3:

Updating data...
#01 time = 94095
#02 time = 94332
#03 time = 94479
#04 time = 94616
#05 time = 94630
#06 time = 94753
#07 time = 95038
#08 time = 95050
#09 time = 95238

still growing but not that quickly as during the first iterations...

Cheers

Matt Stancliff

unread,
May 2, 2014, 5:16:10 PM5/2/14
to redi...@googlegroups.com
Let’s recap the entire thread real quick:

We see HMSET taking increasing amounts of time. The test is:
- generate keys in the form ("data/%019d/xxxxxxxxxxxxxxxxxx", i) with i between 0 and N, in this case 0 to 500000 (or 50,000 also works for tests).
- for each key, do a HMSET of the same ~100 field-value pairs
- split the total N tests across K threads, so each thread handles N/K non-conflicting HMSETs
- then, do the same HMSET multiple times, triggering a replace of every field-value pair in the set (all in one argument)

For easier testing, I converted the Java behavior to Lua so we can run it with redis-benchmark directly. See https://gist.github.com/mattsta/f27f1856f22fab67dc58 (Note: it doesn’t split the input among all 10 parallel clients, it just lets each client do every 50,000 HMSET operations).

The tests below are on 2.8.9, Linux 3.13.0-22-lowlatency, Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz, 24 GB RAM.

We can see the increasing time behavior here (all defaults, no saving):
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 2957.39 ms
1 requests completed in 3312.97 ms
1 requests completed in 3471.98 ms
1 requests completed in 3531.64 ms
1 requests completed in 3569.08 ms
1 requests completed in 3616.65 ms
1 requests completed in 3614.22 ms
1 requests completed in 3640.54 ms
1 requests completed in 3676.59 ms
1 requests completed in 3691.97 ms
1 requests completed in 3701.63 ms
1 requests completed in 3707.43 ms
1 requests completed in 3715.06 ms
1 requests completed in 3728.09 ms
1 requests completed in 3730.66 ms
1 requests completed in 3755.90 ms
1 requests completed in 3756.98 ms
1 requests completed in 3761.99 ms
1 requests completed in 3773.13 ms
1 requests completed in 3775.28 ms
1 requests completed in 3779.64 ms
1 requests completed in 3794.19 ms
1 requests completed in 3806.37 ms
1 requests completed in 3802.50 ms
1 requests completed in 3802.10 ms
1 requests completed in 3804.84 ms
1 requests completed in 3801.41 ms
1 requests completed in 3809.59 ms
1 requests completed in 3807.96 ms
1 requests completed in 3810.22 ms
1 requests completed in 3815.79 ms
1 requests completed in 3828.82 ms
1 requests completed in 3820.12 ms
1 requests completed in 3830.07 ms
1 requests completed in 3833.39 ms
1 requests completed in 3839.27 ms
1 requests completed in 3836.13 ms
1 requests completed in 3833.77 ms
1 requests completed in 3838.66 ms
1 requests completed in 3842.17 ms
1 requests completed in 3852.43 ms
1 requests completed in 3857.80 ms
1 requests completed in 3855.75 ms
1 requests completed in 3859.72 ms
1 requests completed in 3855.54 ms
1 requests completed in 3857.76 ms
1 requests completed in 3855.38 ms
1 requests completed in 3862.82 ms
1 requests completed in 3863.22 ms
1 requests completed in 3868.30 ms
1 requests completed in 3869.38 ms
1 requests completed in 3871.46 ms
1 requests completed in 3862.30 ms
1 requests completed in 3855.73 ms
1 requests completed in 3862.29 ms
1 requests completed in 3867.02 ms
1 requests completed in 3863.58 ms
1 requests completed in 3861.85 ms
1 requests completed in 3864.25 ms
1 requests completed in 3864.58 ms
1 requests completed in 3868.04 ms
1 requests completed in 3883.58 ms


But, if I try again I see different behavior. We see cycles of increase with drop off, even after a FLUSHALL.
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 2555.01 ms
[increasing...]
1 requests completed in 3066.29 ms
1 requests completed in 3180.87 ms
1 requests completed in 3229.82 ms
1 requests completed in 3263.30 ms
1 requests completed in 3288.94 ms
1 requests completed in 3313.51 ms
1 requests completed in 3346.59 ms
1 requests completed in 3345.49 ms
1 requests completed in 3367.54 ms
1 requests completed in 3379.32 ms
1 requests completed in 3396.67 ms
[drops off then increases again…]
1 requests completed in 3148.03 ms
1 requests completed in 3174.49 ms
1 requests completed in 3166.26 ms
1 requests completed in 3204.28 ms
1 requests completed in 3205.40 ms
1 requests completed in 3201.72 ms
1 requests completed in 3215.64 ms
1 requests completed in 3226.69 ms
[manual FLUSHALL in other console]
[starts increasing again]
1 requests completed in 3104.83 ms
1 requests completed in 3507.88 ms
1 requests completed in 3591.01 ms
1 requests completed in 3634.19 ms
1 requests completed in 3684.58 ms
1 requests completed in 3704.18 ms
1 requests completed in 3737.98 ms
1 requests completed in 3762.40 ms
1 requests completed in 3786.08 ms
1 requests completed in 3795.76 ms
1 requests completed in 3816.11 ms
1 requests completed in 3855.56 ms
1 requests completed in 3880.03 ms
1 requests completed in 3566.14 ms
1 requests completed in 3639.80 ms
[drops off then increases again…]
1 requests completed in 3233.35 ms
1 requests completed in 3261.48 ms
1 requests completed in 3273.89 ms
1 requests completed in 3269.99 ms
1 requests completed in 3275.23 ms
1 requests completed in 3294.85 ms
1 requests completed in 3492.93 ms
1 requests completed in 3522.14 ms
1 requests completed in 3522.19 ms
1 requests completed in 3545.02 ms



The increasing pattern never happens if we delete the hash between additions. (HMSET key f1 v1 f2 v2…; DEL key; HMSET key f1 v1 f2 v2…)
Example:
matt@neon:~/repos/redis/src% while true; do ./hmset.sh del |grep ms; done
1 requests completed in 2948.66 ms
1 requests completed in 2927.06 ms
1 requests completed in 2930.17 ms
1 requests completed in 2936.77 ms
1 requests completed in 2927.65 ms
1 requests completed in 2930.67 ms
1 requests completed in 2923.08 ms
1 requests completed in 2929.65 ms
1 requests completed in 2920.07 ms
1 requests completed in 2930.07 ms
1 requests completed in 2927.66 ms
1 requests completed in 2940.00 ms
1 requests completed in 2932.67 ms
1 requests completed in 2931.67 ms
1 requests completed in 2934.20 ms
1 requests completed in 2936.38 ms
1 requests completed in 2935.94 ms
1 requests completed in 2929.72 ms
1 requests completed in 2931.22 ms
1 requests completed in 2935.13 ms
1 requests completed in 2930.59 ms
1 requests completed in 2931.79 ms
1 requests completed in 2939.92 ms
1 requests completed in 2939.28 ms
1 requests completed in 2938.17 ms
1 requests completed in 2946.88 ms
1 requests completed in 2935.52 ms
1 requests completed in 2942.49 ms
1 requests completed in 2938.85 ms
1 requests completed in 2934.49 ms
1 requests completed in 2934.91 ms
1 requests completed in 2950.79 ms
1 requests completed in 2942.07 ms
1 requests completed in 2932.05 ms
1 requests completed in 2946.40 ms
1 requests completed in 2936.98 ms
1 requests completed in 2941.45 ms
1 requests completed in 2943.31 ms
1 requests completed in 2944.87 ms
1 requests completed in 2950.28 ms
1 requests completed in 2937.97 ms
1 requests completed in 2941.62 ms
1 requests completed in 2941.05 ms


So, the issue is with setting the same hash multiple times and perhaps over-fragmenting the memory allocator. The behavior is worse under libc malloc and better under jemalloc.

A profile of these runs shows the dominant functions are decrRefCount and jemalloc’s free, consisting of a combined 25% to 35% of the total runtime. Coming in third is dictAdd with another 10% of total runtime.


We can intentionally make the situation worse overall, but less variable.
If we increase ziplist size so *none* of our hashes become hash tables, everything remains equally slow with perhaps a little increase->drop off effect.
./redis-server --save "" --hash-max-ziplist-entries 1000 --hash-max-ziplist-value 4096
matt@neon:~/repos/redis/src% while true; do ./hmset.sh |grep ms; done
1 requests completed in 7301.10 ms
1 requests completed in 8831.90 ms
1 requests completed in 8829.57 ms
1 requests completed in 8708.75 ms
1 requests completed in 8692.22 ms
1 requests completed in 8612.53 ms
1 requests completed in 8607.87 ms
1 requests completed in 8613.58 ms
1 requests completed in 8606.86 ms
1 requests completed in 8614.39 ms
1 requests completed in 8611.24 ms
1 requests completed in 8759.91 ms
1 requests completed in 8756.07 ms
1 requests completed in 8870.83 ms
1 requests completed in 8877.55 ms
1 requests completed in 8872.40 ms
1 requests completed in 8874.90 ms
1 requests completed in 8888.93 ms
1 requests completed in 8867.36 ms
1 requests completed in 8865.87 ms
1 requests completed in 8854.49 ms
1 requests completed in 8796.68 ms
1 requests completed in 8798.44 ms
1 requests completed in 8678.47 ms
1 requests completed in 8681.05 ms


Fun fact: after all these tests, when I start Redis it starts with a fragmentation ratio > 15:
# Memory
used_memory:502304
used_memory_human:490.53K
used_memory_rss:7929856
used_memory_peak:502304
used_memory_peak_human:490.53K
used_memory_lua:33792
mem_fragmentation_ratio:15.79
mem_allocator:jemalloc-3.2.0

The ratio drops back down after more data is allocated inside Redis:
# Memory
used_memory:895477296
used_memory_human:853.99M
used_memory_rss:922492928
used_memory_peak:895701840
used_memory_peak_human:854.21M
used_memory_lua:58368
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.2.0


> I can try DEL-eting the hash as you suggested if you think it can help investigation but it is not what real application will be doing - the hash will always be updated and never deleted.

Will you always be updating every field of the hash at the same time? The test effectively replaces the entire hash every time without deleting it, which is worst-case behavior.

> Actually, if you think the results above are still demonstrating some problem worth investigation, I can give you access to that EC2 box so you can take a look.
> Please let me know if any of you need the credentials and I will email it to you off the list.

Some quick notes:
- Let’s remove a zero from N so each run doesn’t take 60 to 90 seconds per attempt. You can still observe the same effect with 5 seconds per run.
- We need more than 10 runs. It can take 10 runs for the initial ramp up to to stabilize.

You are right that 2.8.9 increases more quickly at first and is slower overall. The 3.0 betas allow short strings to be repressed more efficiently, which may be the speed difference we see.

Here are some tests using the Java approach on 2.8.9:
#00 time = 3873
Press enter to continue
Updating data...
#01 time = 5284
#02 time = 5407
#03 time = 5833
#04 time = 6307
#05 time = 6710
#06 time = 6962
#07 time = 7031
#08 time = 7241
#09 time = 7303
#10 time = 7377
#11 time = 7394
#12 time = 7495
#13 time = 7517
#14 time = 7521
#15 time = 7519
#16 time = 7557
#17 time = 7569
#18 time = 7632
#19 time = 7619
#20 time = 7670
#21 time = 7668
#22 time = 7702
#23 time = 7650
#24 time = 7704
#25 time = 7675
#26 time = 7738
#27 time = 7737
#28 time = 7749
#29 time = 7725
#30 time = 7740

So, still increasing.


If we do a FLUSHALL while the test is still running:
#33 time = 2006
#34 time = 5223
#35 time = 5245
#36 time = 5964
#37 time = 6569
#38 time = 7145
#39 time = 7734
#40 time = 8257
#41 time = 8634
#42 time = 8713
#43 time = 8879
#44 time = 8920
#45 time = 9014
#46 time = 9040
#47 time = 9048
#48 time = 9122
#49 time = 9150
#50 time = 9183
#51 time = 9146

The time used drops, then quickly jumps back to where it left off.




Conclusion: unknown. Something about blaming the memory allocator. We can do one run and get increasing runtime. Then we can do another run and get no increasing runtime. Then we can do a FLUSHALL when we have no increasing runtime to generate increasing runtime. Sometimes runtime increases over a duration of 10 to 20 tests, then drops back down.


-Matt


Salvatore Sanfilippo

unread,
May 2, 2014, 5:39:29 PM5/2/14
to Redis DB
Matt, thanks, that's some amazing work. Now after your investigation
and the new evidence from Dmitry, I want to investigate more and trace
that.
Redis is a simple system that we mostly understand completely, it is a
shame to leave the unknown to dominate us.

So starting from monday I'll start investigating it with 2.8 to check
what happens.
It is *very* interesting that if you DEL before setting the behavior
is no growth. I somewhat expected this, since writing the same stuff
against an already populate data structure may create some kind of
cycle in the representation that makes cache misses more likely or
something like that, while with the DEL, the resulting data structure
will be again and again the same. This is a very interesting starting
point.

About reducing the computation and still observing the effect, this is
something I tried today, without results as I was testing with
unstable where the increase most of the times seems cyclic (returning
back to lower values after a few). This helps debugging since we can
go for faster iterations. It is boring to wait 1 minute per iteration.

So thank you Matt and Dmitry, I'll spend more time myself to
understand better this issue.

Salvatore
> [drops off then increases again...]
> [drops off then increases again...]
> 1 requests completed in 3233.35 ms
> 1 requests completed in 3261.48 ms
> 1 requests completed in 3273.89 ms
> 1 requests completed in 3269.99 ms
> 1 requests completed in 3275.23 ms
> 1 requests completed in 3294.85 ms
> 1 requests completed in 3492.93 ms
> 1 requests completed in 3522.14 ms
> 1 requests completed in 3522.19 ms
> 1 requests completed in 3545.02 ms
>
>
>
> The increasing pattern never happens if we delete the hash between additions. (HMSET key f1 v1 f2 v2...; DEL key; HMSET key f1 v1 f2 v2...)
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



Dmitry Andrianov

unread,
May 6, 2014, 4:33:31 AM5/6/14
to redi...@googlegroups.com
Great!
Let me know if I can be any help in your investigation.

Thanks Matt for the research. And to answer your question - no, in real scenario we won't be replacing all the keys every time. I expect only about 10% of them to be replaced.
I just wanted to test the worst possible case to know the limits.

Thank you.
Reply all
Reply to author
Forward
0 new messages