Redis memory fragmentation after 2 weeks

Mike K

unread,

Apr 27, 2011, 6:58:58 PM4/27/11

to Redis DB

Hey all,

On one of our main Redis instances, our 4.45GB dataset is currently
taking about 14.4GB of RAM after 2 weeks of uptime:

used_memory:4776139496
used_memory_human:4.45G
used_memory_rss:14642315264
mem_fragmentation_ratio:3.07

We haven't seen this behavior from Redis before, and the replica for
this particular redis instance has a more normal-looking memory
profile:

used_memory:4826775848
used_memory_human:4.50G
used_memory_rss:7707877376
mem_fragmentation_ratio:1.60

We failed over to the replica since the high-memory was concerning
(its a 17GB machine in total). Two questions:

1. Does this behavior seem indicative of a larger problem? Is a mem
fragmentation ratio of 3+ normal?
2. We didn't get to the point where Redis ran out of memory, but would
it swap at that point, or is this RSS statistic a sort of 'upper
bound' on how much memory is allocated to Redis?

This instance holds about 10 million keys. I've left the 'old' master
up and in its same state if there's any interesting debug information
to pull from it.

Thanks!
Mike

Salvatore Sanfilippo

unread,

Apr 27, 2011, 7:03:15 PM4/27/11

to redi...@googlegroups.com

On Thu, Apr 28, 2011 at 12:58 AM, Mike K <mik...@instagram.com> wrote:
> 1. Does this behavior seem indicative of a larger problem? Is a mem
> fragmentation ratio of 3+ normal?

Hello, are you sure that this instance was not having much more data
inside at some point?
This is the reason I added "peak memory" in INFO in Redis unstable
branch, since many of this reports are actually caused by the fact
that this instances are fill with more data, then partially emptied,
and the RSS does not go low in such a case (and the showed
fragmentation ratio is not correct).

> 2. We didn't get to the point where Redis ran out of memory, but would
> it swap at that point, or is this RSS statistic a sort of 'upper
> bound' on how much memory is allocated to Redis?

The RSS in Redis is usually the figure of the max memory used
multiplied for the actual fragmentation ratio that can be 1.3 or
alike.
So if you actually have a 3.0 fragmentation since you used to have 12
GB of data inside Redis and now you just just 4GB, actually adding 8
GB of data will likely not change the RSS as the same pages will be
used to add more data.

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Mike K

unread,

Apr 27, 2011, 7:41:06 PM4/27/11

to Redis DB

Hey Salvatore,

Thanks for the quick response!

> Hello, are you sure that this instance was not having much more data
> inside at some point?
> This is the reason I added "peak memory" in INFO in Redis unstable
> branch, since many of this reports are actually caused by the fact
> that this instances are fill with more data, then partially emptied,
> and the RSS does not go low in such a case (and the showed
> fragmentation ratio is not correct).

The amount of data in the instance is gradually growing, but hasn't
changed much in the last two weeks (graph: http://d.pr/DoTm); the
access pattern for this machine is that it's a lot of lists that get
LPUSH'd and then LTRIM'd about 20% of the time; we've set list-max-
ziplist-entries to 1024 since our lists' target length is 500. So, in
a way, a lot of data gets thrown out by the LTRIM, but the # of keys
and amount of data in those keys remains about the same. Could
LTRIMming account for the extra memory usage, in this case?

Mike

Salvatore Sanfilippo

unread,

Apr 28, 2011, 4:53:25 AM4/28/11

to redi...@googlegroups.com

Hey Mike, do you have the same graph for the RSS in the same period? Thank you.

On Thu, Apr 28, 2011 at 1:41 AM, Mike K <mik...@instagram.com> wrote:
> http://d.pr/DoTm

Salvatore Sanfilippo

unread,

Apr 28, 2011, 5:35:50 AM4/28/11

to redi...@googlegroups.com

On Thu, Apr 28, 2011 at 10:53 AM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> Hey Mike, do you have the same graph for the RSS in the same period? Thank you.

More things that can help for sure:

output of CONFIG GET * in both the master and the slave.
full output of INFO. If you are using unstable branch please use "INFO
ALL" instead.

Thanks!
Salvatore

Mike K

unread,

Apr 28, 2011, 6:31:04 PM4/28/11

to Redis DB

Hey Salvatore,

> > Hey Mike, do you have the same graph for the RSS in the same period? Thank you.

Sure thing:
http://d.pr/pFO3

The big bump and then dropoff is that we started running a different
Redis instance on this machine as well, but then moved it to a
different machine (you'll notice that the graph would have grown at
the same rate without that other instance, as far as I can tell).

>
> More things that can help for sure:
>
> output of CONFIG GET * in both the master and the slave.

Here the master is the one with the large RSS which is no longer
receiving connections, and the slave is the new master:
master CONFIG GET *: https://gist.github.com/5a6c09d93ae1e0682a13
master INFO: https://gist.github.com/e8eb7f5cf04d51e2ec4e

slave/new master CONFIG GET *: https://gist.github.com/2a143941754c11cfbd9c
slave/new master INFO: https://gist.github.com/b80f90419339f2905d5b

(all on 2.2.4)

Thanks again, let me know if there's any other info that would be
helpful.
Mike

Ethan Collins

unread,

May 3, 2011, 4:33:12 AM5/3/11

to Redis DB

I was following this thread. Is this problem solved? If so, can you
put the solution here as well?

Ethan

Salvatore Sanfilippo

unread,

May 3, 2011, 4:55:44 AM5/3/11

to redi...@googlegroups.com

On Tue, May 3, 2011 at 10:33 AM, Ethan Collins <collins...@gmail.com> wrote:
> I was following this thread. Is this problem solved? If so, can you
> put the solution here as well?

Hello, no sorry problem not solved, I analyzed the data and this does
not make sense to me at a first glance, for a reason: why the slave
that gets the same write pattern does not show the same problem?

I updated lloogg.com that has exactly the same workload of lpush+ltrim
to Redis unstable that contains the ziplist implementation, but so far
no evidence of this problem. So for now I've currently no idea...

From the graphs that Mike provided it is also clear that the RSS was
growing in a linear way so this really sounds like as a fragmentation
problem.

My only hypothesis is the following Consider the ziplist lpush+ltrim
pattern. What happens if you start with an empty DB is that you often
realloc to a bigger ziplist block on lpush, so you end with a lot of
small freed allocations that can't be used when the ziplist reallocs.
Until you reach the max number of elements in your lists.

So maybe the slave was not showing this problem as it was synched with
the master when lists where already at max size?

ziplists are different than all the other strings used inside redis as
they don't used the sds.c lib. Sds allocates in power of two
minimizing this kind of problems. So the fix would be to allocate
ziplists to the next power of two size as well...

I'll try an example script to verify if this is really the problem or
not and report back.

Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Mike K

unread,

May 8, 2011, 2:36:19 PM5/8/11

to Redis DB

Hey all,

An update on this issue from our end; the slave ended up showing the
same behavior, it was just a bit masked since it's running 3 slave
instances so the growth of one instance wasn't as apparent.

Salvatore/Pieter, if we wanted to try out an alternative malloc (we're
using glibc malloc right now) to see if it solves the fragmentation
issue, which should we use? There was some mention of jemalloc support
in one of Pieter's branches, what's the closest to 2.2.4 that we could
use for this?

Thanks!
Mike

On May 3, 1:55 am, Salvatore Sanfilippo <anti...@gmail.com> wrote:

> > For more options, visit this group athttp://groups.google.com/group/redis-db?hl=en.

Salvatore Sanfilippo

unread,

May 8, 2011, 3:54:56 PM5/8/11

to redi...@googlegroups.com

On Sun, May 8, 2011 at 8:36 PM, Mike K <mik...@instagram.com> wrote:
> Salvatore/Pieter, if we wanted to try out an alternative malloc (we're
> using glibc malloc right now) to see if it solves the fragmentation
> issue, which should we use? There was some mention of jemalloc support
> in one of Pieter's branches, what's the closest to 2.2.4 that we could
> use for this?

I Mike, we received pasts reports that tcmalloc() (there is support
into 2.2.4 for that) is better at fragmentation.
However jemalloc() support is trivial to backport into 2.2.x

I would try jemalloc, as if it works well for you (big) chances are
that it will get integrated into Redis source code to be used if the
target is Linux... so if you agree we are going to backport the
jemalloc support into 2.2.x branch and release it as 2.2.7 tomorrow.

Thank you for assisting us with this issue, this is very very appreciated.
If, time permitting, soon or later you can send a description of the
workload of this instance so that we can try to reproduce this problem
even into a long running (weeks) instance, this would be cool.
Especially what will help is the exact kind of write operations like
LPUSH/RPUSH/LINSERT/LREM/... and the average list size and list
element size.

If we can also get the output of
https://github.com/antirez/redis-sampler running against the slave,
that would be great.

Thank you!
Salvatore

Mike K

unread,

May 8, 2011, 8:28:57 PM5/8/11

to Redis DB

> I would try jemalloc, as if it works well for you (big) chances are
> that it will get integrated into Redis source code to be used if the
> target is Linux... so if you agree we are going to backport the
> jemalloc support into 2.2.x branch and release it as 2.2.7 tomorrow.

Absolutely--just let me know when you've pushed a 2.2 branch with
jemalloc support, we'll spin up a slave with this tomorrow and
assuming all goes well we can cut over the master to it.

>
> Thank you for assisting us with this issue, this is very very appreciated.
> If, time permitting, soon or later you can send a description of the
> workload of this instance so that we can try to reproduce this problem
> even into a long running (weeks) instance, this would be cool.
> Especially what will help is the exact kind of write operations like
> LPUSH/RPUSH/LINSERT/LREM/... and the average list size and list
> element size.

Sure; the main work done in this instance are LPUSHes of integers and
20% of the time, LTRIMs back to 500. Target list size is 500 elements,
so up to 600 elements in the worst case. The pushes & trims are often
done in a pipeline (using redis-py). list-max-ziplist-length is 512,
and list-max-ziplist-size is the default.

>
> If we can also get the output ofhttps://github.com/antirez/redis-samplerrunning against the slave,
> that would be great.

Yep, can get that to you tomorrow as well before we update it with the
new branch.

Thanks!
Mike

Mike K

unread,

May 8, 2011, 8:45:38 PM5/8/11

to Redis DB

Small correction: max ziplist length is at 1024, not 512.

On May 8, 5:28 pm, Mike K <mik...@instagram.com> wrote:
> > I would try jemalloc, as if it works well for you (big) chances are
> > that it will get integrated into Redis source code to be used if the
> > target is Linux... so if you agree we are going to backport the
> > jemalloc support into 2.2.x branch and release it as 2.2.7 tomorrow.
>
> Absolutely--just let me know when you've pushed a 2.2 branch with
> jemalloc support, we'll spin up a slave with this tomorrow and
> assuming all goes well we can cut over the master to it.
>
>
>
> > Thank you for assisting us with this issue, this is very very appreciated.
> > If, time permitting, soon or later you can send a description of the
> > workload of this instance so that we can try to reproduce this problem
> > even into a long running (weeks) instance, this would be cool.
> > Especially what will help is the exact kind of write operations like
> > LPUSH/RPUSH/LINSERT/LREM/... and the average list size and list
> > element size.
>
> Sure; the main work done in this instance are LPUSHes of integers and
> 20% of the time, LTRIMs back to 500. Target list size is 500 elements,
> so up to 600 elements in the worst case. The pushes & trims are often
> done in a pipeline (using redis-py). list-max-ziplist-length is 512,
> and list-max-ziplist-size is the default.
>
>
>

> > If we can also get the output ofhttps://github.com/antirez/redis-samplerrunningagainst the slave,

Salvatore Sanfilippo

unread,

May 9, 2011, 5:42:31 AM5/9/11

to redi...@googlegroups.com

On Mon, May 9, 2011 at 2:28 AM, Mike K <mik...@instagram.com> wrote:
> Absolutely--just let me know when you've pushed a 2.2 branch with
> jemalloc support, we'll spin up a slave with this tomorrow and
> assuming all goes well we can cut over the master to it.

Hello Mike, thank you for all the information provided!

The 2.2-jemalloc branch is online:

http://github.com/antirez/redis/tree/2.2-jemalloc

To compile just use:

make USE_JEMALLOC=yes

It will also build jemalloc itself (shipped together with the source
code of Redis) and link with it.
Everything seems fine and all tests are passing, but it is indeed
better to test all this into a slave as a first step.

Cheers,

Mike K

unread,

May 16, 2011, 9:21:20 PM5/16/11

to Redis DB

Hey Salvatore,

Results are in after a week, and unfortunately not very good:

glib malloc over time: http://d.pr/4IIL
jemalloc over time: http://d.pr/u78K

While it looks a bit better, it still seems to be heading to taking up
all the memory on the machine. The DB size as reported by Redis itself
is better:

glib malloc: http://d.pr/RMok
jemalloc: http://d.pr/Ku9W

We're going to try to revert to our old way of LTRIM'ing every time
rather than a percentage of the time, starting from a fresh load of
the data on a new slave, to see if this helps.

Happy to try other ideas / run other diagnostics, just let me know.

Mike

Mike K

unread,

May 16, 2011, 9:31:52 PM5/16/11

to Redis DB

Also, if it's helpful, the INFO stuff from the jemalloc slave:

redis> info
redis_version:2.2.6
redis_git_sha1:bd9c5a16
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:2701
uptime_in_seconds:619789
uptime_in_days:7
lru_clock:536159
used_cpu_sys:6680.02
used_cpu_user:254.84
used_cpu_sys_childrens:0.00
used_cpu_user_childrens:0.00
connected_clients:2
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:5116996840
used_memory_human:4.77G
used_memory_rss:11380084736
mem_fragmentation_ratio:2.22
mem_allocator:jemalloc
loading:0
aof_enabled:0
changes_since_last_save:512764820
bgsave_in_progress:0
last_save_time:1304976044
bgrewriteaof_in_progress:0
total_connections_received:10939
total_commands_processed:584426549
expired_keys:0
evicted_keys:0
keyspace_hits:436302399
keyspace_misses:1845723
hash_max_zipmap_entries:512
hash_max_zipmap_value:64
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:slave
master_host:xxx
master_port:xxx
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
allocation_stats:
6=1,8=151367618,9=33146,10=76130122,11=7284340,12=79793969,13=78012707,14=575814538,15=2362329,16=3463196020,17=1354726639,18=2577063,19=13301491,20=119019400,21=314821884,22=154711,23=6349750,24=813911782,25=31461,26=402712,27=94745,28=4431400,29=7429225,30=6036588,31=15492728,32=146708712,33=368862,34=809809,35=6701363,36=3765485,37=1715822,38=1966358,39=2150110,40=3943281,41=6905742,42=1323839,43=986142,44=2668373,45=569966,46=372909,47=5407009,48=6806817,49=105189,50=84089,51=40272,52=1720447,53=5042547,54=31587,55=28761,56=1595261,57=5024,58=19621,59=4895004,60=1489286,61=2181,62=15686,63=3078,64=1425246,65=4789255,66=11422,67=25357,68=1327823,69=26093,70=9530,71=4674069,72=5565550,73=284368,74=8018,75=118,76=1200193,77=4582375,78=6888,79=60,80=1144797,81=112,82=5976,83=4498916,84=1094127,85=55,86=5160,87=129,88=1091138,89=4421600,90=4564,91=74,92=1004569,93=122,94=4072,95=4350284,96=5019886,97=240,98=3678,99=623,100=927687,101=4283662,102=3326,103=7139,104=892843,105=103,106=2986,107=4220810,108=860084,109=739,110=2700,111=4227,112=829601,113=4168906,114=2432,115=82,116=800948,117=115,118=2208,119=4106276,120=4636562,121=824,122=1986,123=6069,124=748651,125=4056134,126=1820,127=7742,128=747284,129=287,130=1660,131=4003623,132=702178,133=3510,134=1516,135=7940,136=680787,137=3971047,138=1370,139=11637,140=660817,141=5825,142=1280,143=3908601,144=4344289,145=53,146=1168,147=61,148=623064,149=3864314,150=1078,151=37,152=605572,153=75,154=984,155=3822308,156=589415,157=47,158=900,159=58,160=573983,161=3782248,162=826,163=38,164=558866,165=76,166=768,167=3743609,168=4110556,169=48,170=724,171=47,172=530042,173=3706411,174=672,175=32,176=516854,177=53,178=622,179=3670968,180=503908,181=31,182=586,183=64,184=491517,185=3635658,186=544,187=85,188=479771,189=131,190=506,191=3602128,192=3913355,193=138,194=482,195=150,196=457484,197=3569747,198=454,199=1019,200=447006,201=153,202=428,203=3538651,204=436811,205=1282,206=408,207=2682,208=427060,209=3509471,210=378,211=197,212=417779,213=319,214=370,215=3479178,216=3744445,217=1445,218=342,219=2920,220=399830,221=3449843,222=334,223=416,224=391501,225=1164,226=324,227=3422182,228=383391,229=2248,230=304,231=110,232=375557,233=3393127,234=294,235=307,236=367897,237=380,238=272,239=3368098,240=3596922,241=1794,242=260,243=1512,244=353449,245=3339435,246=246,247=1416,248=346691,249=1499,250=240,251=3314383,252=340073,253=2815,254=226,255=2810,>=256=1291872031
db0:keys=11860830,expires=0

Salvatore Sanfilippo

unread,

May 17, 2011, 3:03:42 AM5/17/11

to redi...@googlegroups.com

On Tue, May 17, 2011 at 3:21 AM, Mike K <mik...@instagram.com> wrote:
> Hey Salvatore,
>
> Results are in after a week, and unfortunately not very good:
>
> glib malloc over time: http://d.pr/4IIL
> jemalloc over time: http://d.pr/u78K

Hello Mike, did you tried my branch unaltered or after applying the
patch proposed by Didier Spezia?
Unfortunately for a Makefile problem my branch is *not* using jemalloc
at all. There was a topic about this issue here in the mailing list
but I did an error not informing you directly.

If you tried my branch I'll update it today to fix this problem, and I
ask you to retry. In that case sorry for the time you wasted with the
broken branch.

Salvatore Sanfilippo

unread,

May 17, 2011, 4:47:57 AM5/17/11

to redi...@googlegroups.com

Hello Mike,

I can confirm there was a problem in the old build, I just applied the
patch provided by Didier in the 2.2-jemalloc branch so now this should
be the safe one. There is to build it with:

MAKE USE_JEMALLOC=yes

Apparently this is a huge improvement when I try to load big datasets,
but I can't test fragmentation unfortunately.
Hope you'll get good results with this.

Salvatore

Daniel Mezzatto

unread,

May 17, 2011, 9:51:22 AM5/17/11

to Redis DB

I have the same memory behavior, but less intense.
I have a big cluster with more or less 150 redis instances. Each one
has 200.000 keys (more or less 53MB).
Im using jemalloc branch since May 09 and it went from 53MB per
instance to 58MB per instance in one week.
I will try the new version (with Didier path) and see what happens.

Daniel Mezzatto

unread,

May 17, 2011, 9:59:03 AM5/17/11

to Redis DB

Im building in 32 bits mode:

make USE_JEMALLOC=yes 32bit

and Im getting this error:

/usr/bin/ld: skipping incompatible ../deps/jemalloc/lib/libjemalloc.a
when searching for -ljemalloc
/usr/bin/ld: skipping incompatible /usr/local/lib/libjemalloc.so when
searching for -ljemalloc
/usr/bin/ld: skipping incompatible /usr/local/lib/libjemalloc.a when
searching for -ljemalloc
/usr/bin/ld: cannot find -ljemalloc

On May 17, 10:51 am, Daniel Mezzatto <daniel.mezza...@gmail.com>
wrote:

Salvatore Sanfilippo

unread,

May 17, 2011, 10:07:57 AM5/17/11

to redi...@googlegroups.com

Hi Daniel,

currently you can't make a 32 bit build using jemalloc unfortunately,
at least not out of the box.
You may try to compile jemalloc under /deps with 32 bit target
tweaking the make file and then build Redis.

Another solution for this fragmentation issues is to turn zmalloc()
into a slab allocator, but this means a much bigger memory footprint
for the same data set... Better to try IMHO if jemalloc can solve our
issues.

Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Salvatore Sanfilippo

unread,

May 17, 2011, 10:12:34 AM5/17/11

to redi...@googlegroups.com

On Tue, May 17, 2011 at 3:51 PM, Daniel Mezzatto
<daniel....@gmail.com> wrote:
> Im using jemalloc branch since May 09 and it went from 53MB per
> instance to 58MB per instance in one week.

Please can you describe your work load? Is it also related to
LPUSH/LTRIM? Thanks.

Also I think that 53MB -> 58MB can be ok as long as there is at some
point no longer any growth of the fragmentation.
What is currently your fragmentation ratio? As long as it is <= 1.4 it
makes sense.

Daniel Mezzatto

unread,

May 18, 2011, 9:54:50 AM5/18/11

to Redis DB

I was able to compile jemalloc in 32bit by changing the Makefile like
this:

../deps/jemalloc/lib/libjemalloc.a:
cd ../deps/jemalloc && ./configure CFLAGS="-std=gnu99 -Wall -
pipe -g3 -fvisibility=hidden -O3 -funroll-loops -m32" --enable-cc-
silence && $(MAKE) lib/libjemalloc.a

> > For more options, visit this group athttp://groups.google.com/group/redis-db?hl=en.

Daniel Mezzatto

unread,

May 18, 2011, 10:09:25 AM5/18/11

to Redis DB

My system does about 100 million HGETs and about 80 million HSETs per
day on a cluster with 128 redis instances spread around 13 machines.
Each instance holds about 200.000 keys, 4 fields inside each hash.
These 128 instances are in fact 64 master and 64 slaves (for
availability purposes).

With glibc, the used_memory is about 53MB per instance (6.7GB total)
and the used_memory_rss is about 58MB per instance (7.4GB total).
mem_fragmentation_ratio is 1.12 (was 1.08 last week).

With jemalloc since yesterday, the used_memory is about 58MB per
instance (7.4GB total) and the used_memory_rss is about 62MB per
instance (7.9GB total). mem_fragmentation_ratio is 1.06 (was 1.04
yesterday).

The redis.conf of these instances is the following:

daemonize yes
pidfile /var/run/redis.pid
port 63701
timeout 120
loglevel notice
logfile /var/log/redis/redis.log
databases 16
save 300 1
maxmemory 512mb
maxmemory-policy volatile-lru
rdbcompression yes
dbfilename redis.rdb
dir /var/lib/redis/
slave-serve-stale-data yes
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
vm-enabled no
hash-max-zipmap-entries 8
hash-max-zipmap-value 2048
list-max-ziplist-entries 512
list-max-ziplist-value 2048
set-max-intset-entries 512
activerehashing yes

On May 17, 11:12 am, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> On Tue, May 17, 2011 at 3:51 PM, Daniel Mezzatto
>

Salvatore Sanfilippo

unread,

May 18, 2011, 10:14:58 AM5/18/11

to redi...@googlegroups.com

On Wed, May 18, 2011 at 4:09 PM, Daniel Mezzatto
<daniel....@gmail.com> wrote:
> With glibc, the used_memory is about 53MB per instance (6.7GB total)
> and the used_memory_rss is about 58MB per instance (7.4GB total).
> mem_fragmentation_ratio is 1.12 (was 1.08 last week).
>
> With jemalloc since yesterday, the used_memory is about 58MB per
> instance (7.4GB total) and the used_memory_rss is about 62MB per
> instance (7.9GB total). mem_fragmentation_ratio is 1.06 (was 1.04
> yesterday).

Thank you for the interesting information.

So far both looks good, up to 1.3 / 1.4 fragmentation is ok as long as
it does not tend to monotonically increase, but stops when reaching
such a value.

I'm working right now to a modification of zmalloc.c that should
prevent fragmentation problems without requiring a non standard
allocator, but it is just an experiment. We'll see if it works... I'll
push the branch today I hope.

Cheers,

Daniel Mezzatto

unread,

May 18, 2011, 10:50:12 AM5/18/11

to Redis DB

I have Just updated a few of my instances to the 2.2-zmalloc2 branch.
(http://groups.google.com/group/redis-db/browse_thread/thread/
eb337871f8935781)

The used_memory is about 73MB and the used_memory_rss is about 81MB.
mem_fragmentation_ratio is 1.11.

Lets see what happens after a few days :)

On 18 maio, 11:14, Salvatore Sanfilippo <anti...@gmail.com> wrote:
> On Wed, May 18, 2011 at 4:09 PM, Daniel Mezzatto
>

Salvatore Sanfilippo

unread,

May 18, 2011, 10:54:11 AM5/18/11

to redi...@googlegroups.com

Thank you Daniel, very appreciated :)

Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Mike K

unread,

May 18, 2011, 12:54:46 PM5/18/11

to Redis DB

As an interesting data point, here's what happened once we stopped
LTRIM'ing 20% of the time and moved to LTRIM'ing 100% of the time:

http://d.pr/MJPU

Looks like much more stable mem usage, and seems to isolate it to that
particular set of operations. We'll roll out the jemalloc-fixed branch
to a slave and once stable we'll cut over, and if it looks right we
can try the 20% LTRIM usage pattern again.

Best,
Mike

Salvatore Sanfilippo

unread,

May 18, 2011, 1:13:51 PM5/18/11

to redi...@googlegroups.com

On Wed, May 18, 2011 at 6:54 PM, Mike K <mik...@instagram.com> wrote:
> As an interesting data point, here's what happened once we stopped
> LTRIM'ing 20% of the time and moved to LTRIM'ing 100% of the time:
>
> http://d.pr/MJPU
>
> Looks like much more stable mem usage, and seems to isolate it to that
> particular set of operations. We'll roll out the jemalloc-fixed branch
> to a slave and once stable we'll cut over, and if it looks right we
> can try the 20% LTRIM usage pattern again.

Hello Mike, that makes a lot of sense. Trimming at every push makes
the ziplist fluctuation in length very small, just the difference
between the list minus the old element, and the list plus the new one.
With the trim 20% of times the difference is bigger and this opens
spots for fragmentation.

But this in turns makes a lot more likely that my zmalloc2 branch will
make a huge difference...

Ciao,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--

Mike K

unread,

May 19, 2011, 1:04:53 PM5/19/11

to Redis DB

jemalloc success, with our LTRIM workload:

http://d.pr/7FvI with the jemalloc branch
http://d.pr/E1jw with 2.2

In other words, it seems like jemalloc solves the problem. Would
definitely support having it merged into 2.2 (though we're happy to
run this branch as long as needed).

Thanks,
Mike

> > For more options, visit this group athttp://groups.google.com/group/redis-db?hl=en.

Salvatore Sanfilippo

unread,

May 19, 2011, 2:32:26 PM5/19/11

to redi...@googlegroups.com

Thanks a lot Mike! this is an awesome finding.

I know it is not trivial to test things in production environments,
but it would be awesome if you can also try the 2.2-zmalloc2 branch in
the same condition to see if it also solves the problem. We probably
will go anyway for jemalloc but it is interesting to see if there are
or not alternatives.

Cheers,
Salvatore

> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Mike K

unread,

May 19, 2011, 6:06:14 PM5/19/11

to redi...@googlegroups.com

Hey Salvatore,

We're now using the jemalloc branch as master, and I've rolled out a 2.2-zmalloc2 slave listening to it—I'll report back on how it does with fragmentation.

Cheers,

Mike

Salvatore Sanfilippo

unread,

May 19, 2011, 6:13:42 PM5/19/11

to redi...@googlegroups.com

On Fri, May 20, 2011 at 12:06 AM, Mike K <mik...@instagram.com> wrote:
> We're now using the jemalloc branch as master, and I've rolled out a
> 2.2-zmalloc2 slave listening to it—I'll report back on how it does with
> fragmentation.

Thank you Mike :)

Daniel Mezzatto

unread,

May 20, 2011, 3:29:42 PM5/20/11

to Redis DB

So far:

jemalloc (uptime 3 days):
- used: 61MB
- real: 64MB
- frag: 1.06

zmalloc (uptime 11 days):
- used: 54MB
- real: 59MB
- frag: 1.09

zmalloc2 (uptime 2 days):
- used: 75MB
- real: 84MB
- frag: 1.11

Mike K

unread,

May 21, 2011, 1:43:18 AM5/21/11

to Redis DB

Some good results with the 2-2-zmalloc2 branch:

2.2 - jemalloc: http://d.pr/8hOC
2.2 - zmalloc2: http://d.pr/G68S

Looks like fragmentation is very low on both; the interesting thing is
that jemalloc's baseline memory usage is much lower, but looks like in
terms of fragmentation prevention, both are equally effective.

Cheers!
Mike

Didier Spezia

unread,

May 21, 2011, 3:49:16 AM5/21/11

to Redis DB

On Daniel's environment, the memory footprint of jemalloc is bigger
than libc malloc. For small objects, jemalloc usually leads to
a smaller footprint, so I think it is due to mid-size objects
(128-4096).

The default 64 bits allocation classes of jemalloc are:

Category
Small Tiny [8]
Small Quantum-spaced [16, 32, 48, ..., 128]
Small Cacheline-spaced [192, 256, 320, ..., 512]
Subpage-spaced [768, 1024, 1280, ..., 3840]
Large [4 KiB, 8 KiB, 12 KiB, ..., 4072 KiB]
Huge [4 MiB, 8 MiB, 12 MiB, ...]

This can be adjusted by tweaking MALLOC_CONF (env variable).
The "quantum-spaced" range can be extended with opt.lg_qspace_max
The "cacheline-spaced" range can be extended with opt.lg_cspace_max
Both parameters are expressed as a power of two.

For instance:
export MALLOC_CONF="lg_qspace_max:8,lg_cspace_max:11"
(use quantum-spaced up to 256, and cacheline-spaced up to 2048).

Regards,
Didier.

Reply all

Reply to author

Forward