Redis at Superfeedr

Julien

unread,

Feb 16, 2010, 5:51:27 AM2/16/10

to Redis DB

Hello!

We're slowly (but surely!) switching away from the Memcache + MySQL
infrastructure we had for a piece of data at Superfeedr to play with
Redis as our datastore and I have a few questions.

Superfeedr is a service that does feed fetching and parsing and then
pushes (via PubSubHubbub or XMPP) the _new_ entries to given
subscribers.

One of the key features is to identify new entries from 'older'
entries in the feeds. For that, we have a unique ID for each entry
(extracted from the feed itself or computed).

We use ZSETs. Each feed_id (feeds are still stored in a regular RDBMS)
is a key and the value is a ZSETS of all the feeds entries. we store
the unique id as the ZSETs values and we use a Timestamp (the entry
publication date) for the score.

To avoid any issue, I wanted to avoid having the 1 big REDIS store and
started to shard based on the feed id. We can't use any hashing,
because that would mean we would get duplicates every now and then.
However, since our feed id are sequential, we just define ranges of
feed id that will go on each server.

Our REDIS servers have 2GB of RAM. I found out that the memory
consumed by each key/value is roughly 5k in REDIS. So I decided to go
for 250,000 keys (feeds) in the REDIS store. This gives then roughly
1.2 GB when all the keys are occupied, which should always be good
since we have 2GB.

Yet... our REDIS server don't seem to behave like expected. Here is an
example on one of our REDIS servers:
A INFO query returns that :
INFO
$381
redis_version:1.2.0
arch_bits:64
multiplexing_api:epoll
uptime_in_seconds:41143
uptime_in_days:0
connected_clients:7
connected_slaves:0
used_memory:756858723
changes_since_last_save:3951
bgsave_in_progress:0
last_save_time:1266316631
bgrewriteaof_in_progress:0
total_connections_received:4355
total_commands_processed:15277037
role:master
db0:keys=170400,expires=0

So, roughly 4442 Bytes per key, which should be good based on my
assumptions. Also, even though there is a ton of memory available, the
server started to swap :
total used free shared buffers
cached
Mem: 2164044 1005160 1158884 0 972
34976
-/+ buffers/cache: 969212 1194832
Swap: 4194296 347948 3846348

We use collectd to monitor our servers and it's quite clear to me that
the machine is doing very well most of the time and then, every time a
snapshot is done, the machine eats a sh*t tone of CPU _and_ memory.
It's also at that moment that it starts to swap.

I'm pretty sure I'm missing something there, because it can't be that
bad to do save to disk. Anyone can help?

Thanks!

Sergey Shepelev

unread,

Feb 16, 2010, 6:21:19 AM2/16/10

to redi...@googlegroups.com

I'm sure this must be in some kind of FAQ.

Redis is forking a child to save data to disk. Now you have 2
processes of 1.2GB virtual memory each, but they are still consuming
only 1.2GB of real RAM because... on most modern OS, fork has
copy-on-write memory semantics, which means, that forked processes
really share their memory until someone of them starts writing. When
"master" redis (the one handling connections and commands) updates
data, kernel separates these two processes virtual memory by copying
pages, which leads to growth of memory usage.

If you pause updates until writing is finished, no extra memory will be used.

> Thanks!
>
>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Julien Genestoux

unread,

Feb 16, 2010, 6:31:11 AM2/16/10

to redi...@googlegroups.com

Well, that, I can't really do :( any other option?

Thanks!

Sergey Shepelev

unread,

Feb 16, 2010, 6:37:36 AM2/16/10

to redi...@googlegroups.com

>> I'm sure this must be in some kind of FAQ.
>>
>> Redis is forking a child to save data to disk. Now you have 2
>> processes of 1.2GB virtual memory each, but they are still consuming
>> only 1.2GB of real RAM because... on most modern OS, fork has
>> copy-on-write memory semantics, which means, that forked processes
>> really share their memory until someone of them starts writing. When
>> "master" redis (the one handling connections and commands) updates
>> data, kernel separates these two processes virtual memory by copying
>> pages, which leads to growth of memory usage.
>>
>> If you pause updates until writing is finished, no extra memory will be
>> used.
>
> Well, that, I can't really do :( any other option?
> Thanks!
>

Use Redis HEAD with VM and threaded save.

Julien Genestoux

unread,

Feb 16, 2010, 6:40:44 AM2/16/10

to redi...@googlegroups.com

Thanks Sergey! Can you tell me more on that? (or at least point to a few links... I have no idea what you're talking about :()

Sergey Shepelev

unread,

Feb 16, 2010, 6:45:14 AM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 2:40 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:
>
> On Tue, Feb 16, 2010 at 12:37 PM, Sergey Shepelev <tem...@gmail.com> wrote:
>>
>> >> I'm sure this must be in some kind of FAQ.
>> >>
>> >> Redis is forking a child to save data to disk. Now you have 2
>> >> processes of 1.2GB virtual memory each, but they are still consuming
>> >> only 1.2GB of real RAM because... on most modern OS, fork has
>> >> copy-on-write memory semantics, which means, that forked processes
>> >> really share their memory until someone of them starts writing. When
>> >> "master" redis (the one handling connections and commands) updates
>> >> data, kernel separates these two processes virtual memory by copying
>> >> pages, which leads to growth of memory usage.
>> >>
>> >> If you pause updates until writing is finished, no extra memory will be
>> >> used.
>> >
>> > Well, that, I can't really do :( any other option?
>> > Thanks!
>> >
>>
>> Use Redis HEAD with VM and threaded save.
>
> Thanks Sergey! Can you tell me more on that? (or at least point to a few
> links... I have no idea what you're talking about :()
> Thanks!

For VM:
http://antirez.com/post/redis-virtual-memory-story.html

Append-only file is another option to reduce memory consumption. I
really should have mention it first, but i forgot.
http://code.google.com/p/redis/wiki/AppendOnlyFileHowto

Salvatore Sanfilippo

unread,

Feb 16, 2010, 7:42:37 AM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 11:51 AM, Julien <julien.g...@gmail.com> wrote:
> Hello!

Hello Julien!

there are different possibilities, I need a few more information, for instance:

- the output of "ps -C redis-server uw", before and after a save.
- what happens if you run "BGSAVE" by hand a couple of times? Is the
first time slow, but the next times faster?

btw the problem appears to be that Linux swaps out pages of the
redis-server instance because there are large contiguously allocated
stuff that are rarely used. When a BGSAVE starts, all this swapped
pages are loaded in memory, as a BGSAVE performs a full memory scan.

There are different solutions, including lowering the save interval,
using AOF, and so forth, but it's better to look at the numbers in
order to be sure. There is also another thing that may happen, that
is, the memory reported by Redis is largely lower than the real memory
usage (but we can check this from the ps output).

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Julien Genestoux

unread,

Feb 16, 2010, 9:36:37 AM2/16/10

to redi...@googlegroups.com

Ciao Salvatore,

On Tue, Feb 16, 2010 at 1:42 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

On Tue, Feb 16, 2010 at 11:51 AM, Julien <julien.g...@gmail.com> wrote:
> Hello!

Hello Julien!

there are different possibilities, I need a few more information, for instance:

- the output of "ps -C redis-server uw", before and after a save.

ps -C redis-server uw

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

redis 2451 0.1 47.8 1180760 1035492 ? Ds Feb15 0:55 /usr/bin/redis-server /etc/redis/redis.conf

# telnet localhost 6379

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

SAVE

+OK

quit

Connection closed by foreign host.

(It took more or less 2 minutes)

# ps -C redis-server uw

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

redis 2451 0.1 54.3 1182252 1175848 ? Ss Feb15 1:00 /usr/bin/redis-server /etc/redis/redis.conf

- what happens if you run "BGSAVE" by hand a couple of times? Is the
first time slow, but the next times faster?

Well, we've reached a state where, basically there is _always_ a running BGSAVE

BGSAVE

-ERR background save already in progress

BGSAVE

-ERR background save already in progress

BGSAVE

btw the problem appears to be that Linux swaps out pages of the
redis-server instance because there are large contiguously allocated
stuff that are rarely used. When a BGSAVE starts, all this swapped
pages are loaded in memory, as a BGSAVE performs a full memory scan.

I think that's what sergey meant when he asked me to stop updating the data while a save is being performed, right?

There are different solutions, including lowering the save interval,
using AOF, and so forth, but it's better to look at the numbers in
order to be sure. There is also another thing that may happen, that
is, the memory reported by Redis is largely lower than the real memory
usage (but we can check this from the ps output).

It's not significantly higher...

I'll give a shot to AOF, but I was under the impression that it was more about reducing the risk of data-loss upon termination of the server. Am I correct?

Thanks!

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Salvatore Sanfilippo

unread,

Feb 16, 2010, 10:16:16 AM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 3:36 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Ciao Salvatore,

Ciao Julien,

> ps -C redis-server uw
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> redis 2451 0.1 47.8 1180760 1035492 ? Ds Feb15 0:55

Ok, it looks like the problem is not that Redis pages are swapped on
disk indeed.
I don't know a better way to check if/what percentage of the process
memory space is swapped on disk.

> Well, we've reached a state where, basically there is _always_ a
> running BGSAVE
> BGSAVE
> -ERR background save already in progress
> BGSAVE
> -ERR background save already in progress
> BGSAVE

Ok but you said that the redis process is fine most of the time, but
it starts to have problems once a snapshot is performed. Now it seems
that actually a background saving is almost always in action. Does it
happen in a specific moment of the BGSAVE? Just to have more info.

Btw I bet you already set overcommit to the right value, but just for
completeness, the right value is:
sysctl vm.overcommit_memory=1

>> btw the problem appears to be that Linux swaps out pages of the
>> redis-server instance because there are large contiguously allocated
>> stuff that are rarely used. When a BGSAVE starts, all this swapped
>> pages are loaded in memory, as a BGSAVE performs a full memory scan.
>
> I think that's what sergey meant when he asked me to stop updating the data
> while a save is being performed, right?

This is a slightly different issue actually. In order to continue the
reasoning started by Sergey, how many updates per second (that is,
write operations) you perform while Redis is saving? This can indeed
be the cause.

> I'll give a shot to AOF, but I was under the impression that it was more
> about reducing the risk of data-loss upon termination of the server. Am I
> correct?

AOF will fix your issue almost certainly actually. Basically with AOF
you don't need to SAVE or BGSAVE at all.
There is a log redis writes and reloads when it is restarted. The
problem is that this log gets longer and longer, so it is a good idea
to rewrite it every 24 hours or less, using the command BGREWRITEAOF.
This command is very similar to BGSAVE, and you'll probably notice the
same swapping / memory / CPU problem, but once every day for a few
minutes is much better than continuously.

In order to switch from snapshotting to AOF, you need to perform the
following steps:

- make a backup copy of your database. AOF is now in a stable release,
but how many instances there are running this persistence mode in
production is unclear (please reply to this email if you are one of
this users).
- Issue a BGREWRITEAOF command, and wait for termination. This will
create the append only file.
- stop the server, edit redis.conf, configure it for append only, set
the "fsync" policy to "every second"
- Restart the server

Btw, BGSAVE should work fine so it would be interesting to understand
what's happening, but if you have a lot of updates while the BGSAVE is
in progress, probably this is the culprit.

Thanks for your help!

Julien Genestoux

unread,

Feb 16, 2010, 10:46:35 AM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 4:16 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

On Tue, Feb 16, 2010 at 3:36 PM, Julien Genestoux

<julien.g...@gmail.com> wrote:

> Ciao Salvatore,

Ciao Julien,

> ps -C redis-server uw
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> redis 2451 0.1 47.8 1180760 1035492 ? Ds Feb15 0:55

Ok, it looks like the problem is not that Redis pages are swapped on
disk indeed.
I don't know a better way to check if/what percentage of the process
memory space is swapped on disk.

> Well, we've reached a state where, basically there is _always_ a
> running BGSAVE
> BGSAVE
> -ERR background save already in progress
> BGSAVE
> -ERR background save already in progress
> BGSAVE

Ok but you said that the redis process is fine most of the time, but
it starts to have problems once a snapshot is performed. Now it seems
that actually a background saving is almost always in action. Does it
happen in a specific moment of the BGSAVE? Just to have more info.

Yeah, I think the "save" settings were too low for our application... so it was bascially saving all the time. Which is indeed wrong.

Btw I bet you already set overcommit to the right value, but just for
completeness, the right value is:
sysctl vm.overcommit_memory=1

>> btw the problem appears to be that Linux swaps out pages of the
>> redis-server instance because there are large contiguously allocated
>> stuff that are rarely used. When a BGSAVE starts, all this swapped
>> pages are loaded in memory, as a BGSAVE performs a full memory scan.
>
> I think that's what sergey meant when he asked me to stop updating the data
> while a save is being performed, right?

This is a slightly different issue actually. In order to continue the
reasoning started by Sergey, how many updates per second (that is,
write operations) you perform while Redis is saving? This can indeed
be the cause.

So, when the redis store will be full (250000 keys), then, the QPS would be around 280. Right now, it's more at 150, at any time. we don't have any way to reduce that when redis is saving the snapshot...

> I'll give a shot to AOF, but I was under the impression that it was more
> about reducing the risk of data-loss upon termination of the server. Am I
> correct?

AOF will fix your issue almost certainly actually. Basically with AOF
you don't need to SAVE or BGSAVE at all.
There is a log redis writes and reloads when it is restarted. The
problem is that this log gets longer and longer, so it is a good idea
to rewrite it every 24 hours or less, using the command BGREWRITEAOF.
This command is very similar to BGSAVE, and you'll probably notice the
same swapping / memory / CPU problem, but once every day for a few
minutes is much better than continuously.

Could we also use "save 43200 1" in the redis.conf file? so that it takes a snapshot every 12 hours or so? if so, I think this is the right approach then. use AOF for "instant" updates and then, once every few hours, do a full save.

I guess BGREWRITEAOF would take care of all the log "cleaning"? Is there a way to force the "save xxx y" to use BGREWRITEAOF instead of BGSAVE or SAVE? or does it do that by default when AOF is enabled?

In order to switch from snapshotting to AOF, you need to perform the
following steps:

- make a backup copy of your database. AOF is now in a stable release,
but how many instances there are running this persistence mode in
production is unclear (please reply to this email if you are one of
this users).

yup, will do.

- Issue a BGREWRITEAOF command, and wait for termination. This will
create the append only file.

Ok.

- stop the server, edit redis.conf, configure it for append only, set
the "fsync" policy to "every second"

ok.

- Restart the server

ok

Btw, BGSAVE should work fine so it would be interesting to understand
what's happening, but if you have a lot of updates while the BGSAVE is
in progress, probably this is the culprit.

Thanks for your help!

Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Aníbal Rojas

unread,

Feb 16, 2010, 11:19:13 AM2/16/10

to redi...@googlegroups.com

Julien,

Your other option is not saving at all. Setup a slave and save in
the slave only. The main instance will be working as a volatile cache.
Redis replication is extremely efficient.

I am not sure if it will be relevant if the the slave actually swaps.

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Salvatore Sanfilippo

unread,

Feb 16, 2010, 12:15:37 PM2/16/10

to redi...@googlegroups.com

2010/2/16 Aníbal Rojas <aniba...@gmail.com>:

> Julien,
>
> Your other option is not saving at all. Setup a slave and save in
> the slave only. The main instance will be working as a volatile cache.
> Redis replication is extremely efficient.
>
> I am not sure if it will be relevant if the the slave actually swaps.

Indeed, this is also a good idea. Possibly not with 1.2.1 as it's
required to use more memory, but with VM in the use case of the
Julien, where there are not so much keys and many big values (sorted
sets), it will be trivial to setup a slave that uses a very small
amount of RAM in order to act as a "saving slave".

In interesting idea that's worth to explore more in the future (with VM).

Cheers,

Salvatore Sanfilippo

unread,

Feb 16, 2010, 12:26:23 PM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 4:46 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:

> Yeah, I think the "save" settings were too low for our application... so it
> was bascially saving all the time. Which is indeed wrong.

Yes, still there is something we currently don't fully understand
about Redis behavior and Linux VM. As there are many users (including
myself) with Redis servers configured for snapshotting not slowing any
delay when the saving child is active. Instead in some other rare
setup it starts to behave this way. I could like to understand more
about this issue, and I think I got the tools to do it with
redis-load.

> So, when the redis store will be full (250000 keys), then, the QPS would be
> around 280. Right now, it's more at 150, at any time. we don't have any way
> to reduce that when redis is saving the snapshot...

Load-wise, 280 query per seconds are very little, but the point is,
how many pages can touch this 280 queries per second in the time
require to save? (2 minutes if I'm correct). Well assuming the worst
case scenario (every query will touch a different swapped page):

4096*280*120 = 31 MB

and this is the *worst* scenario. This is why I'm not convinced we
fully understand what's happening.

So please let me ask a few more questions: did this behavior starts
happening when your used memory is greater than a given threshold?
Maybe if you have 200 MB less of data in memory it works perfectly?
I guess there is something to investigate. I think I'll post a message
in the kernel mailing list about this issue.

> Could we also use "save 43200 1" in the redis.conf file? so that it takes a
> snapshot every 12 hours or so? if so, I think this is the right approach
> then. use AOF for "instant" updates and then, once every few hours, do a
> full save.
> I guess BGREWRITEAOF would take care of all the log "cleaning"? Is there a
> way to force the "save xxx y" to use BGREWRITEAOF instead of BGSAVE or SAVE?
> or does it do that by default when AOF is enabled?

Basically AOF and BGSAVE/SAVE are two completely separated ways to
handle persistence.
If you are using only AOF, you don't need BGSAVE at all. When you use
BGREWRITEAOF you actually get a new full dump of the DB, it just
happens to be in a different format (in the form of a log).

The only reason to perform a BGSAVE when AOF is on, is because the
implementation many not be mature enough currently, even if I tested
it as best as my possibilities, but sill there is 1 AOF user for 99
SAVE users out there, currently, even if many people are slowly
switching to AOF. So maybe you may want to safe from time to time just
for durability for the first times, but after the first restarts with
AOF where everything looks sane, I suggest to go ahead and just use
AOF.

But this depends on the importance of this data. So far there were
zero reports of corruptions due to AOF, but still, with very little
production users probably.

Please keep us informed! :)

Cheers,

Julien Genestoux

unread,

Feb 16, 2010, 4:33:14 PM2/16/10

to redi...@googlegroups.com

Hey

On Tue, Feb 16, 2010 at 6:26 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

On Tue, Feb 16, 2010 at 4:46 PM, Julien Genestoux

<julien.g...@gmail.com> wrote:

> Yeah, I think the "save" settings were too low for our application... so it
> was bascially saving all the time. Which is indeed wrong.

Yes, still there is something we currently don't fully understand
about Redis behavior and Linux VM. As there are many users (including
myself) with Redis servers configured for snapshotting not slowing any
delay when the saving child is active. Instead in some other rare
setup it starts to behave this way. I could like to understand more
about this issue, and I think I got the tools to do it with
redis-load.

I can probably grant you access to one of redis servers, so you can check it out and see what's wrong there? (I'll send another email to you in person about that).

> So, when the redis store will be full (250000 keys), then, the QPS would be
> around 280. Right now, it's more at 150, at any time. we don't have any way
> to reduce that when redis is saving the snapshot...

Load-wise, 280 query per seconds are very little, but the point is,
how many pages can touch this 280 queries per second in the time
require to save? (2 minutes if I'm correct). Well assuming the worst
case scenario (every query will touch a different swapped page):

4096*280*120 = 31 MB

and this is the *worst* scenario. This is why I'm not convinced we
fully understand what's happening.

Yeah... indeed.

So please let me ask a few more questions: did this behavior starts
happening when your used memory is greater than a given threshold?

Not that I noticed. It's getting worse though : the bigger the redis, the longer and the worse.

Maybe if you have 200 MB less of data in memory it works perfectly?
I guess there is something to investigate. I think I'll post a message
in the kernel mailing list about this issue.

> Could we also use "save 43200 1" in the redis.conf file? so that it takes a
> snapshot every 12 hours or so? if so, I think this is the right approach
> then. use AOF for "instant" updates and then, once every few hours, do a
> full save.
> I guess BGREWRITEAOF would take care of all the log "cleaning"? Is there a
> way to force the "save xxx y" to use BGREWRITEAOF instead of BGSAVE or SAVE?
> or does it do that by default when AOF is enabled?

Basically AOF and BGSAVE/SAVE are two completely separated ways to
handle persistence.
If you are using only AOF, you don't need BGSAVE at all. When you use
BGREWRITEAOF you actually get a new full dump of the DB, it just
happens to be in a different format (in the form of a log).

ha, ok... I though AOF would just be able to "complete" a dump with incoming queries.

The only reason to perform a BGSAVE when AOF is on, is because the
implementation many not be mature enough currently, even if I tested
it as best as my possibilities, but sill there is 1 AOF user for 99
SAVE users out there, currently, even if many people are slowly
switching to AOF. So maybe you may want to safe from time to time just
for durability for the first times, but after the first restarts with
AOF where everything looks sane, I suggest to go ahead and just use
AOF.

But this depends on the importance of this data. So far there were
zero reports of corruptions due to AOF, but still, with very little
production users probably.

Ok, I'll keep that in mind.

Please keep us informed! :)

Cheers,

Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Julien Genestoux

unread,

Feb 16, 2010, 4:34:07 PM2/16/10

to redi...@googlegroups.com

Anibal, Thnks for the feedback. Yes, I thought about that, and that could be an option, but for now, I'd like to avoid running another server just for that purpose.

Thanks though!
--
Julien Genestoux,

http://twitter.com/julien51
http://superfeedr.com

+1 (415) 830 6574
+33 (0)9 70 44 76 29

2010/2/16 Aníbal Rojas <aniba...@gmail.com>

Julien Genestoux

unread,

Feb 16, 2010, 5:19:22 PM2/16/10

to redi...@googlegroups.com

Following up on the AOF. It seems that using AOF forces the daemonization?

I'm using runit and when I set up AOF, it just breaks it, as the process seems to be forked...

Am I missing something here?

Salvatore Sanfilippo

unread,

Feb 16, 2010, 5:33:01 PM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 11:19 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Following up on the AOF. It seems that using AOF forces the daemonization?
> I'm using runit and when I set up AOF, it just breaks it, as the process
> seems to be forked...
> Am I missing something here?

Hello Julien, well no, AOF should not force daemonization, but it's
known that Redis and "runit" don't mix well at all!
You can have other kind of problems, the best way to run Redis is
using the built-in daemonization. At some time I'll try to understand
what's wrong in the interaction between Redis and runit btw.

Cheers,

Julien Genestoux

unread,

Feb 16, 2010, 5:35:16 PM2/16/10

to redi...@googlegroups.com

Hum... what I like with runit is the fact that it "restarts" redis if this one fails... which is quite handy. What happens if the regulatr deamon if for some reason redis dies?

--
Julien Genestoux,

http://twitter.com/julien51
http://superfeedr.com

+1 (415) 830 6574
+33 (0)9 70 44 76 29

Salvatore Sanfilippo

unread,

Feb 16, 2010, 5:37:57 PM2/16/10

to redi...@googlegroups.com

On Tue, Feb 16, 2010 at 11:35 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Hum... what I like with runit is the fact that it "restarts" redis if this
> one fails... which is quite handy. What happens if the regulatr deamon if
> for some reason redis dies?

Nothing, it will not go up again, but Redis stable crashing is very
very unlikely, unless you hit a hard out of memory.
Btw at some point I'll try to fix this issue, it is probably related
to how Redis logs to standard output. To use runit + "logfile
<filename>" may actually work, but it's better to avoid it for now.

Sergey Shepelev

unread,

Feb 16, 2010, 5:42:33 PM2/16/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 1:35 AM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Hum... what I like with runit is the fact that it "restarts" redis if this
> one fails... which is quite handy. What happens if the regulatr deamon if
> for some reason redis dies?

It's called supervision. There are a lot of supervisors out there,
runit is not the only one. Ubuntu even uses supervising init
replacement as main and one option to run something "as service":
upstart (which i recommend in place of any other similar system, btw).

Aside from upstart, you can also try daemontools, supervisord.

If "regular daemon" (any program here actually) dies, it dies. Period.
If redis *and* runit dies, nothing will happen. Your service is stuck.
Which means for really 100% guarantee, you have to continuously
monitor (ping) your services and do hardware reset in case it is
really stuck. It's called watchdog.

Sergey Shepelev

unread,

Feb 16, 2010, 5:43:46 PM2/16/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 1:37 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Tue, Feb 16, 2010 at 11:35 PM, Julien Genestoux
> <julien.g...@gmail.com> wrote:
>> Hum... what I like with runit is the fact that it "restarts" redis if this
>> one fails... which is quite handy. What happens if the regulatr deamon if
>> for some reason redis dies?
>
> Nothing, it will not go up again, but Redis stable crashing is very
> very unlikely, unless you hit a hard out of memory.
> Btw at some point I'll try to fix this issue, it is probably related
> to how Redis logs to standard output. To use runit + "logfile
> <filename>" may actually work, but it's better to avoid it for now.
>

How about runit + "redis > log" ?

> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Sergey Shepelev

unread,

Feb 16, 2010, 5:45:59 PM2/16/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 1:37 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:

> On Tue, Feb 16, 2010 at 11:35 PM, Julien Genestoux
> <julien.g...@gmail.com> wrote:
>> Hum... what I like with runit is the fact that it "restarts" redis if this
>> one fails... which is quite handy. What happens if the regulatr deamon if
>> for some reason redis dies?
>
> Nothing, it will not go up again, but Redis stable crashing is very
> very unlikely, unless you hit a hard out of memory.
> Btw at some point I'll try to fix this issue, it is probably related
> to how Redis logs to standard output. To use runit + "logfile
> <filename>" may actually work, but it's better to avoid it for now.
>

BTW, daemontools has some tools to redirect logging and provide
guarantees that log will be durably and consistently saved. And you
don't have to switch to daemontools supervisor to use those tools.

> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Julien Genestoux

unread,

Feb 16, 2010, 6:37:08 PM2/16/10

to redi...@googlegroups.com

Alright, I'll forget runit for now...

So, I've been able to switch to AOF on one of our redis. So far, so good.

One thing that scares me is the time that it takes to "restart" redis in that mode. As a matter of fact, I ran it for 15 minutes and then did a simple /etc/init.d/redis-server restart and... it took more than 30 seconds to read all the aof log and come back. Assuming this is linear, that means that if I restart after 24 hours, it may take up to 45 minutes just to start? huh...

I'll now see what is the performance impact of running BGREWRITEAOF

I'd really love to find why SAVE and BGSAVE suck so much in our context, and that can help other users too... Let me know if you'd like to come and check it out!

Thanks,

Cheers,

Salvatore Sanfilippo

unread,

Feb 17, 2010, 5:55:55 AM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 12:37 AM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Alright, I'll forget runit for now...
> So, I've been able to switch to AOF on one of our redis. So far, so good.
> One thing that scares me is the time that it takes to "restart" redis in
> that mode. As a matter of fact, I ran it for 15 minutes and then did a
> simple /etc/init.d/redis-server restart and... it took more than 30 seconds
> to read all the aof log and come back. Assuming this is linear, that means
> that if I restart after 24 hours, it may take up to 45 minutes just to
> start? huh...

Hello Julien,

yes if you don't run BGREWRITEAOF before a restart, and you have many
writes, it will take a lot of time to restart.
To put it a bit more scientifically, the time to restart is
proportional to the write queries received since the last
log rewrite. At every rewrite this number of queries will be reset to
the total number of objects present in Redis.

So basically if there are little writes but many reads, and a
BGREWRITEAOF is called from time to time, this is not a big issue.
Similarly if a BGREWRITEAOF is called before a scheduled restart, this
is not a big problem, BUT if the server crashes after many write
queries without a BGREWRITEAOF, it will take some time to restart.

Btw even if we can consider this time to be proportional
asymptotically, in the practice it could be a bit better than this, I
think that depending on the dataset even if after 30 seconds it took
N, after 300 seconds it may take less than N*10, but there are no
better ways than direct testing to figure this.

> I'll now see what is the performance impact of running BGREWRITEAOF
> I'd really love to find why SAVE and BGSAVE suck so much in our context, and
> that can help other users too... Let me know if you'd like to come and check
> it out!

Sure, I want to investigate it, I'll send you my RSA key today. Thank
you very much for this!

Julien Genestoux

unread,

Feb 17, 2010, 6:55:25 AM2/17/10

to redi...@googlegroups.com

Hello,

So we're now running all of our redis with AOF!

We don't have any more CPU and swapping issues on a "regular" basis. 3 things :

- I just ran a BGREWRITEAOF. it took 25 minutes, during wich the machine had to swap a lot and consumed a lot of CPU (we use collectd to monitor or servers, and we had a lot of IO WAIT while doing it). I guess this is the same issue that we have for BGSAVE and SAVE.

- How to automate that? cron? Any other technique?

- We see huge latencies on the clients connected to REDIS : up to 30 seconds, while the CPU usage on the server is quite low (about 0.1 on a quadcore). This really sucks. Even if it's not critical (we're not serving web pages with redis, and our clients are evented), it's really not acceptable in our app. Any way to know where this comes from? or to debug?

Cheers!

Julien

Salvatore Sanfilippo

unread,

Feb 17, 2010, 7:07:33 AM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 12:55 PM, Julien Genestoux
<julien.g...@gmail.com> wrote:
> Hello,
> So we're now running all of our redis with AOF!
> We don't have any more CPU and swapping issues on a "regular" basis. 3
> things :
> - I just ran a BGREWRITEAOF. it took 25 minutes, during wich the machine had
> to swap a lot and consumed a lot of CPU (we use collectd to monitor or
> servers, and we had a lot of IO WAIT while doing it). I guess this is the
> same issue that we have for BGSAVE and SAVE.

Exactly, they work in a *very* similar way. One question: is this a
real box or a virtual machine?

> - How to automate that? cron? Any other technique?

Yes I think that to automate it via cron is the best thing. You can
just use redis-cli for this.

> - We see huge latencies on the clients connected to REDIS : up to 30
> seconds, while the CPU usage on the server is quite low (about 0.1 on a
> quadcore). This really sucks. Even if it's not critical (we're not serving

This is really strange. Does this happen when Redis is performing a
log rewrite or under normal work?
Do you use "slow" operations against Redis? Large big sort, or LRANGE
or ZRANGE involving many elements? (That is, not ZRANGE or LRANGE
against big lists or sorted sets, but asking for a lot of elements).

> web pages with redis, and our clients are evented), it's really not
> acceptable in our app. Any way to know where this comes from? or to debug?
> Cheers!
> Julien

This is surely inacceptable, and this is another sign something
strange is happening.

I'll investigate myself today trying to understand what happens.

Cheers,
Salvatore

Salvatore Sanfilippo

unread,

Feb 17, 2010, 11:48:53 AM2/17/10

to redi...@googlegroups.com

Hello all,

this follow up is to tell you that we fixed the issue.
Actually the issue was already fixed! In the Git version of Redis, as
while coding the VM I realized the problem and fixed it.

What was happening? Basically 1.2.x totally ruined the COW semantic of
BGSAVE and BGREWRITEAOF

There was a function, very elegant IMHO in the way it worked, that was
used to play well with specially encoded obejcts (when a string can be
represented as an integer, Redis is able to encode it in a special
way).
For instance if you have to call a function that writes an object on
disk, you can't just do:

writeOnDisk(myObject);

because myObject can be a specially encoded object. There are two
solutions: every function getting objects should be able to understand
all the kinds of encoding (for now it's just integers, but in the
future? What about compressed strings), or to pass to this kind of
functions always "decoded" objects. So I used this trick:

myObject = getDecodedObject(myObject);
writeOnDisk(myObject);
decrRefCount(myObject);

Basically if myObject was not specially encoded, getDecodedObject()
just increments the reference count, and it will get decremented
later, so all is ok.

If instead the object is encoded, getDecodedObject() will return a new
object, that is a pure string representation of the original one. The
result is that decrRefCount() will destroy it (as it is a new object
with just 1 reference).

It's cool from a programming point of view, but this totally breaks
copy-on-write, as incrementing and decrementing all the refcount of
all the objects means, basically, to touch ALL the memory pages, more
or less.
So in Redis git, a few weeks ago, I fixed the issue, but
underestimated the real effects this problem had in 1.2.x.

Now that the effects are clear, I'm going to patch 1.2.1 and release
1.2.2 ASAP, with ASAP == today.

I'll announce the new release in little time, as it's just a matter of
backporting a few lines of code form git.

Many thanks to Julien at Superfeedr for giving me help and access to
their *production* boxes in order to check what was happening.

Cheers,
Salvatore

Sergey Shepelev

unread,

Feb 17, 2010, 11:57:10 AM2/17/10

to redi...@googlegroups.com

A great study case how abstractions leak. :)

So how did you fix it? Immutability idiom?

Salvatore Sanfilippo

unread,

Feb 17, 2010, 12:26:48 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 5:57 PM, Sergey Shepelev <tem...@gmail.com> wrote:

> A great study case how abstractions leak. :)

Lol indeed :)

> So how did you fix it? Immutability idiom?

Well, resorted to the much lamest but safer approach:

encoded = object->encoding != REDIS_ENCODING_RAW;

if (encoded) object = getDecodedObject(object);
doSomethingWithObject(object);
if (encoded) decrRefCount(object);

But only in context where we can be running in child-space. For all
the rest the above form was retained.

Still of course if the object *is* encoded, there will be some heap
movement. Should not hurt so much, but probably it's much better if we
can actually handle this stuff in a special way, just using the stack.

So basically I think the long term solution will be that all the
functions called in the context of the child will be "aware" of the
different encodings of all the types. For now it's just a matter of
the integers, but soon we'll have the Hashes that will have a double
encoding as well (zipmap, a data structure and C library I'm
designing, and Hashes if they are larger than N elements). It's more
code, it's lamer, but it's a better fit for copy on write.

Btw all this was already fixed in Git, so you can already read the new
code if you wish.

Michael Russo

unread,

Feb 17, 2010, 1:06:39 PM2/17/10

to redi...@googlegroups.com

Thanks Salvatore, Julien. Really happy to hear that this is resolved.

A teammate was just getting ready to assemble a bug report (very similar to Superfeedr's) in the hopes that it would help get to the bottom of this issue faster.

We run several Redis servers for different purposes with different datasets. On one of our servers, Redis was constantly crashing during the BGSAVE process (the parent and child both killed by the kernel due to low memory). Box has plenty of free RAM before the BGSAVE, there is little write activity against the master during the background save (and, in some cases, no write activity), and there are no issues with memory usage with synchronous save.

Looking very forward to testing 1.2.2 against this same workload.

Best,

Michael

Sergey Shepelev

unread,

Feb 17, 2010, 2:12:49 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 8:26 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Wed, Feb 17, 2010 at 5:57 PM, Sergey Shepelev <tem...@gmail.com> wrote:
>
>> A great study case how abstractions leak. :)
>
> Lol indeed :)
>
>> So how did you fix it? Immutability idiom?
>
> Well, resorted to the much lamest but safer approach:
>
> encoded = object->encoding != REDIS_ENCODING_RAW;
>
> if (encoded) object = getDecodedObject(object);
> doSomethingWithObject(object);
> if (encoded) decrRefCount(object);
>
> But only in context where we can be running in child-space. For all
> the rest the above form was retained.
>
> Still of course if the object *is* encoded, there will be some heap
> movement. Should not hurt so much, but probably it's much better if we
> can actually handle this stuff in a special way, just using the stack.
>

Like this?

object2 = copy(object); // which doesn't use malloc, but returns a
struct on stack
doSomethingWithObject(object2);

> So basically I think the long term solution will be that all the
> functions called in the context of the child will be "aware" of the
> different encodings of all the types. For now it's just a matter of
> the integers, but soon we'll have the Hashes that will have a double
> encoding as well (zipmap, a data structure and C library I'm
> designing, and Hashes if they are larger than N elements). It's more
> code, it's lamer, but it's a better fit for copy on write.
>
> Btw all this was already fixed in Git, so you can already read the new
> code if you wish.
>
> Cheers,
> Salvatore
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Salvatore Sanfilippo

unread,

Feb 17, 2010, 2:19:08 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 8:12 PM, Sergey Shepelev <tem...@gmail.com> wrote:

> Like this?
>
> object2 = copy(object); // which doesn't use malloc, but returns a
> struct on stack
> doSomethingWithObject(object2);

There is no way to get this working with ANSI-C, you can't return
something on the stack from a C function.
But you can do this:

char buf[1024];

copyDecodedObject(buf,1024,object);

if it returns REDIS_OK there was enough space, otherwise we'll use the
vanilla decoding function.

Sergey Shepelev

unread,

Feb 17, 2010, 2:26:47 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 10:19 PM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> On Wed, Feb 17, 2010 at 8:12 PM, Sergey Shepelev <tem...@gmail.com> wrote:
>
>> Like this?
>>
>> object2 = copy(object); // which doesn't use malloc, but returns a
>> struct on stack
>> doSomethingWithObject(object2);
>
> There is no way to get this working with ANSI-C, you can't return
> something on the stack from a C function.
> But you can do this:
>
> char buf[1024];
>
> copyDecodedObject(buf,1024,object);
>
> if it returns REDIS_OK there was enough space, otherwise we'll use the
> vanilla decoding function.
>

So you already thought that through.

The funny thing is that the larger object we're dealing with, the more
we want it to not modify heap, right?

> Cheers,
> Salvatore
>
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Sergey Shepelev

unread,

Feb 17, 2010, 2:28:45 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 10:19 PM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> On Wed, Feb 17, 2010 at 8:12 PM, Sergey Shepelev <tem...@gmail.com> wrote:
>
>> Like this?
>>
>> object2 = copy(object); // which doesn't use malloc, but returns a
>> struct on stack
>> doSomethingWithObject(object2);
>
> There is no way to get this working with ANSI-C, you can't return
> something on the stack from a C function.

On freenode #gcc, i was told that:

"struct hello { }; struct hello f (void) { struct hello res; return
res; }" is valid standard C

> But you can do this:
>
> char buf[1024];
>
> copyDecodedObject(buf,1024,object);
>
> if it returns REDIS_OK there was enough space, otherwise we'll use the
> vanilla decoding function.
>
> Cheers,
> Salvatore
>
>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Michael

unread,

Feb 17, 2010, 2:50:08 PM2/17/10

to redi...@googlegroups.com

On Thu, Feb 18, 2010 at 4:28 AM, Sergey Shepelev <tem...@gmail.com> wrote:

On Wed, Feb 17, 2010 at 10:19 PM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> On Wed, Feb 17, 2010 at 8:12 PM, Sergey Shepelev <tem...@gmail.com> wrote:
>
>> Like this?
>>
>> object2 = copy(object); // which doesn't use malloc, but returns a
>> struct on stack
>> doSomethingWithObject(object2);
>
> There is no way to get this working with ANSI-C, you can't return
> something on the stack from a C function.

On freenode #gcc, i was told that:

"struct hello { }; struct hello f (void) { struct hello res; return
res; }" is valid standard C

This a very good example of how easy one "can shot himself in the foot". Indeed, you _can_ do that in C, and compiler will be completely ok with that, and everything will be fine if you reference this structure in the scope of this function. But once you return it and try to use somewhere outside of it you are on very thin ice (pretty much doomed I would say). Sometimes is may even work, by accident, until you allocate a new local variable or call another function or use the stack in any other way.

Chris Streeter

unread,

Feb 17, 2010, 2:52:26 PM2/17/10

to redi...@googlegroups.com

It is valid C. However, it is introducing a memory leak. You are returning an object that was allocated on the stack of the function call f. So accessing it after the function call can potentially work, but will cease to work if a context switch happens, another function call is made or any other operation that modifies the stack pointer on the OS (interrupts coming in can cause this to happen too... ie. a new packet on the ethernet device).

- Chris

Salvatore Sanfilippo

unread,

Feb 17, 2010, 4:57:47 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 8:52 PM, Chris Streeter <cjstr...@gmail.com> wrote:
> It is valid C. However, it is introducing a memory leak. You are returning

Hello Chris,

returning stack allocated objects and memory leaks are a very strange mix!
Actually it's impossible to "leak" from the stack, as it is reused
again and again
(subtracting the stack pointer when a function is called, adding when
it returns).

> an object that was allocated on the stack of the function call f. So
> accessing it after the function call can potentially work, but will cease to

Yes I don't mean that you can't compile such a code. It may even work
in a reproducible and systematic way if you know what you are doing.
For instance if you never use the non-standard function alloca() in
the caller function, and you don't pass the returned valued around,
the caller can actually use a value returned from the stack without
problems.

But if you do this:

object = callFunctionReturningStackStuff()
doSomethingWith(object);

or even

object = callFunctionReturningStackStuff()
doSomethingUnrelated()
if (object->foo == ...) ... /* work with the object without passing it around */

You have big troubles. The above applies to 99% of the code you can
write, as it's hard to do useful work without calling other functions
at all :)

The reason is that both callFunctionReturningStackStuff and
doSomethingUnrelated will use part of the stack that the original
function returned.

> work if a context switch happens, another function call is made or any other

Context switching has nothing to do with this. It's a kernel thing,
when the kernel will continue running your process (or to be more
specific, your thread, in this context), it will restore the original
stack pointer.
The problem is as said that every other function call can potentially
ruin the part of the stack where your object lives.

> operation that modifies the stack pointer on the OS (interrupts coming in
> can cause this to happen too... ie. a new packet on the ethernet device).

No this is totally unrelated, trust me.

Btw given that we turned this thread into a C topic (and I've nothing
against it, I love talking about programming), for similar reasons the
contrary is perfectly ok, that is *passing* a stack allocated stuff to
another function, like in this example:

void incr(int *i) {
*i = *i + 10;
}

void myfunction(void) {
int x; /* stack allocated */
x = 0;
incr(&x);
printf("%d\n", x);
}

This is ok indeed, as the stack pointer can only be decremented
calling "more" functions, and never incremented (the stack works in a
reversed fashion), so our stack allocated value is always safe if we
don't return form myfunction().

Cheers,
Salvatore

Mike Shaver

unread,

Feb 17, 2010, 9:02:45 PM2/17/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 4:57 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> object = callFunctionReturningStackStuff()
> doSomethingUnrelated()
> if (object->foo == ...) ... /* work with the object without passing it around */

That's the case if you return the struct by reference, such as

struct object { ... };

struct object *
callFunction()
{
struct object o;
return &o;
}

but if you return the struct by value it should be copied into the
caller's frame appropriately, just like returning a scalar:

struct object /* no * */
callFunction()
{
struct object o;
return o; /* no & */
}

Isn't the latter the form that was given above? Are there ABIs for
which returning a struct by value doesn't work? That would surprise
me, since it would seem that it should be identical to struct
assignment, which AFAIK works everywhere.

Mike

Chris Streeter

unread,

Feb 18, 2010, 2:36:29 PM2/18/10

to redi...@googlegroups.com

On Wed, Feb 17, 2010 at 13:57, Salvatore Sanfilippo <ant...@gmail.com> wrote:

On Wed, Feb 17, 2010 at 8:52 PM, Chris Streeter <cjstr...@gmail.com> wrote:
> It is valid C. However, it is introducing a memory leak. You are returning

Hello Chris,

returning stack allocated objects and memory leaks are a very strange mix!
Actually it's impossible to "leak" from the stack, as it is reused
again and again
(subtracting the stack pointer when a function is called, adding when
it returns).

Oops, yeah, you are correct. I was too liberal in my language. Yes it isn't a memory leak, but it can be a potentially bad reference causing memory corruption unless you're careful about using it.

> an object that was allocated on the stack of the function call f. So
> accessing it after the function call can potentially work, but will cease to

Yes I don't mean that you can't compile such a code. It may even work
in a reproducible and systematic way if you know what you are doing.
For instance if you never use the non-standard function alloca() in
the caller function, and you don't pass the returned valued around,
the caller can actually use a value returned from the stack without
problems.

Yep, I agree.

But if you do this:

object = callFunctionReturningStackStuff()
doSomethingWith(object);

or even

object = callFunctionReturningStackStuff()
doSomethingUnrelated()
if (object->foo == ...) ... /* work with the object without passing it around */

You have big troubles. The above applies to 99% of the code you can
write, as it's hard to do useful work without calling other functions
at all :)

The reason is that both callFunctionReturningStackStuff and
doSomethingUnrelated will use part of the stack that the original
function returned.

> work if a context switch happens, another function call is made or any other

Context switching has nothing to do with this. It's a kernel thing,
when the kernel will continue running your process (or to be more
specific, your thread, in this context), it will restore the original
stack pointer.
The problem is as said that every other function call can potentially
ruin the part of the stack where your object lives.

I forgot that the kernel reserves a special memory segment for the ISRs to use. So I was initially thinking that a context switch would cause the ISR to append it's stack to the stack of the program. There are a number of issues with my assumption (virtual memory is a big one), so yeah, context switching shouldn't cause problems.

> operation that modifies the stack pointer on the OS (interrupts coming in
> can cause this to happen too... ie. a new packet on the ethernet device).

No this is totally unrelated, trust me.

Btw given that we turned this thread into a C topic (and I've nothing
against it, I love talking about programming), for similar reasons the
contrary is perfectly ok, that is *passing* a stack allocated stuff to
another function, like in this example:

void incr(int *i) {
*i = *i + 10;
}

void myfunction(void) {
int x; /* stack allocated */
x = 0;
incr(&x);
printf("%d\n", x);
}

This is ok indeed, as the stack pointer can only be decremented
calling "more" functions, and never incremented (the stack works in a
reversed fashion), so our stack allocated value is always safe if we
don't return form myfunction().

Yeah, passing a reference to your own stack pointer is fine. The main issue is that one can run into problems if one starts using returned stack data without realizing that it was allocated on the stack. When the callFunctionReturningStackStuff() function is initially written, it could very well be correct in the cases it is used in. But as a project progresses, one just needs to take care with the return value from that function to make sure that the return value is understood. Otherwise you can run into memory corruption issues if the data is accessed improperly. In any case, I'm not saying you're wrong or there are bugs at all, just that care must be taken and the documentation must be clear, to prevent bugs and memory corruption now, and down the road.

- Chris

Joey

unread,

Mar 10, 2010, 12:00:26 PM3/10/10

to Redis DB

After upgrading to 1.2.2, we occasionally experience the same latency
during BGREWRITEAOF that was described prior to the 1.2.2 fixes. This
results in errors from the redis-rb library like:

Errno::EAGAIN: Resource temporarily unavailable - Timeout reading from
the socket

Any ideas would be appreciated,
Joey

Our INFO output:

redis_version:1.2.2
arch_bits:64
multiplexing_api:epoll
uptime_in_seconds:830628
uptime_in_days:9
connected_clients:11
connected_slaves:1
used_memory:1166093671
used_memory_human:1.09G
changes_since_last_save:26663882
bgsave_in_progress:0
last_save_time:1267410030
bgrewriteaof_in_progress:0
total_connections_received:1075
total_commands_processed:205724760
role:master
db0:keys=702857,expires=9190

> >> On Wed, Feb 17, 2010 at 11:55 AM, Salvatore Sanfilippo <anti...@gmail.com>

> >> wrote:
>
> >>> On Wed, Feb 17, 2010 at 12:37 AM, Julien Genestoux
> >>> <julien.genest...@gmail.com> wrote:
> >>> > Alright, I'll forget runit for now...
> >>> > So, I've been able to switch to AOF on one of our redis. So far, so
> >>> > good.
> >>> > One thing that scares me is the time that it takes to "restart" redis in
> >>> > that mode. As a matter of fact, I ran it for 15 minutes and then did a
> >>> > simple /etc/init.d/redis-server restart and... it took more than 30
> >>> > seconds
> >>> > to read all the aof log and come back. Assuming this is linear, that
> >>> > means
> >>> > that if I restart after 24 hours, it may take up to 45 minutes just to
> >>> > start? huh...
>
> >>> Hello Julien,
>

> >>> yes if you don't runBGREWRITEAOFbefore a restart, and you have many

> >>> writes, it will take a lot of time to restart.
> >>> To put it a bit more scientifically, the time to restart is
> >>> proportional to the write queries received since the last
> >>> log rewrite. At every rewrite this number of queries will be reset to
> >>> the total number of objects present in Redis.
>
> >>> So basically if there are little writes but many reads, and a

> >>>BGREWRITEAOFis called from time to time, this is not a big issue.
> >>> Similarly if aBGREWRITEAOFis called before a scheduled restart, this

> >>> is not a big problem, BUT if the server crashes after many write

> >>> queries without aBGREWRITEAOF, it will take some time to restart.

> Salvatore 'antirez' Sanfilippohttp://invece.org

Reply all

Reply to author

Forward