Hello,
5 million keys of 1k each is not a lot of memory with Redis. I suggest
doing this in memory.
Don't use VM, it was a legitimate try but is IMHO a failure.
To reply to Tim briefly at the same time as I still did not found the
time to reply to his email properly: also disk store is currently low
priority. I trust Redis-on-disk every day less. But it is not
impossible that we'll do other work in this field, but as long as
features remain so low, and SSD performances basically trashed by the
OS API in the specific case you want to use it as a safe random access
memory, I continue to think as Redis on disk as a bad deal.
Gopalakrishnan: either go with Redis in memory or keep Tyrant IHMO :)
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
i was mainly trying to make the point that the existing diskstore
implementation is fully usable -- provided that the use case is
compatible, i.e. more data than fits in memory, small percentage
of hot keys at any given time, and values which are not too small.
tim
On 2011-05-31, at 19:57 , Salvatore Sanfilippo wrote:
> To reply to Tim briefly at the same time as I still did not found the
> time to reply to his email properly: also disk store is currently low
> priority. I trust Redis-on-disk every day less. But it is not
> impossible that we'll do other work in this field, but as long as
> features remain so low, and SSD performances basically trashed by the
> OS API in the specific case you want to use it as a safe random access
> memory, I continue to think as Redis on disk as a bad deal.
the other requirement, small percentage of hot keys at any given
time, is far more important i think.
tim
For Redis on disk to work well, you need:
1) Very biased data access.
2) Mostly reads.
3) Dataset consisting of key->value data where values are small.
4) A dataset that is big enough to really pose memory/cost problems on
the ever growing RAM you find in a entry level server.
the intersection of 1+2+3+4 is small and fits exactly in the case where
Redis for metadata, or as a cache, plus another datastore designed to
work on disk is the right pick. So why should not we focus, instead,
into doing what we already do (the in-memory but persistent data
structure server) better? It would be already an huge success to
enhance what we already have.
--
Hello, first of all, let's put things in context. In the "what
happens" scenario we are talking about 99% of happy Redis users.
Diskstore + VM together probably don't reach 1% of the user base.
So what happens to the vast majority of Redis instances out there when
they run out of memory?
That the system will start to perform not well, will be slower, and so
forth, as the OS will start to swap Redis pages on disk.
If your system is configured with too little (or zero) swap space
likely the OOM killer will kill the Redis instance.
It is also possible that you misconfigured your system and the
overcommit policy is set the wrong way (but Redis warns about this in
the logs), and fork() will start failing when there is not enough
memory, causing the DB saving (or the AOF log rewrite) to fail.
But I can't understand why you force a connection between persistence
and having or not a disk-backed data storage.
Those are two completely different things, if Redis crashes for out of
memory, or if it crashes since you had a power failure in your data
center, there is no difference: you had AOF enabled? You likely have
all your data minus the latest few seconds.
You had .rdb enabled? You have your .rdb files saved as configured.
Nothing magical or different than usual.
On Wed, Jun 1, 2011 at 3:28 PM, Xiangrong Fang <xrf...@gmail.com> wrote:
Hello Paolo,
I think the reality is more or less *exactly* the contrary.
One of the problems of diskstore is that you need to have a time limit
after you change a key to flush it into disk (otherwise it is not a
store at all).
So even if the same key is changing continuously like into an high
performance INCR business, you end having the write queue always
populated (imagine having many counters). And what to do once you
can't write fast enough? The only thing you can do is blocking
clients, or violate the contract with the user that a modified key
will be transfered in at max tot seconds.
That's point one. Point two is: there is no such a think like a
problem with BGSAVE saving on disk under high write load.
The *peak* requirement is 2x memory, but in Redis 2.2 (make sure to
use the absolute latest 2.2 patch level) this was improved
significantly, and with 2.4 this is much better as the persistence
itself is much faster. One of the problems with 2.2 was the dict
iterator creating copy-on-write of pages even without any write
against the key.
So in the worst case you need 2x memory, but this worst case is now
really unlikely to happen compared to the past. When this happens,
this is a requirement to take in mind.
And interesting: this has nothing to do with diskstore. The
requirement above is a direct consequences of the fact that we create
*point in time* snapshots of the dataset.
> If you take a look at the mailing lists threads in late december [1],
> [2], [3] you'll notice that the development of diskstore was sparked
> by two needs, be a viable alternative to VM allowing datasets larger
> than RAM capacity but also to offer a different model of persistency
> compatible with write heavy loads.
You are wrong, diskstore only other advantage is *fast restarts*. And
it is also more or less a fake thing as actually the startup will not
provide the same performances as normal running time, as every key
will produce a disk access.
I can assure you that to make write faster you don't start flushing
things on disk ;)
> So while you state that persistency is a redis feature you want to
> provide, the descoping of diskstore (or a similar alternative) from
> the roadmap gives redis a weak and problematic persistency support in
> scenarios where instead redis is meant to be shining.
Again, not agreed. The fact that the peak memory can be 2x the RAM
does not mean we have a persistence problem. Just if you have a lot of
writes, and you want point-in-time persistence, the price to pay is
obviously up to 2x memory. But I suggest trying this into Redis 2.4 in
a simulation, it is much better than it used to be.
It is however possible to reason about having an alternative
persistence not guaranteeing point-in-time snapshots, but guaranteeing
instead very low memory usage while saving. In the persistence arena
there is a lot to experiment for us still, but this has nothing to do
with diskstore in my opinion.
> Hello Paolo,
>
> I think the reality is more or less *exactly* the contrary.
> One of the problems of diskstore is that you need to have a time limit
> after you change a key to flush it into disk (otherwise it is not a
> store at all).
> So even if the same key is changing continuously like into an high
> performance INCR business, you end having the write queue always
> populated (imagine having many counters). And what to do once you
> can't write fast enough? The only thing you can do is blocking
> clients, or violate the contract with the user that a modified key
> will be transfered in at max tot seconds.
Redis basic caching behavior wouldn't change, you'd just write to persistent storage after X changes happened, but instead of writing out the entire memory content you write out only the values that changed. This can't be slower or worse than writing out GBs of data each time.
You don't need to keep a timer on each key, you simply need to write to disk only the keys that have actually changed instead of the entire snapshot.
> So in the worst case you need 2x memory, but this worst case is now
> really unlikely to happen compared to the past. When this happens,
> this is a requirement to take in mind.
Worst case might be 2x but even if it was 1.5x it's still 30% of the memory that you can't use for storage but have to reserve for backups.
> And interesting: this has nothing to do with diskstore. The
> requirement above is a direct consequences of the fact that we create
> *point in time* snapshots of the dataset.
Being diskstore more key-oriented it should be possible to have the requirement above be done at the # key-value changes size level rather than at the entire dataset level.
>> If you take a look at the mailing lists threads in late december [1],
>> [2], [3] you'll notice that the development of diskstore was sparked
>> by two needs, be a viable alternative to VM allowing datasets larger
>> than RAM capacity but also to offer a different model of persistency
>> compatible with write heavy loads.
>
> You are wrong, diskstore only other advantage is *fast restarts*. And
> it is also more or less a fake thing as actually the startup will not
> provide the same performances as normal running time, as every key
> will produce a disk access.
> I can assure you that to make write faster you don't start flushing
> things on disk ;)
Why is its only advantage fast restart? Its biggest advantage is better memory efficiency, I think very few care about fast restarts and even in that case fast restarts happen mostly with big datasets that will be crippled first by the non efficient use of memory.
>> So while you state that persistency is a redis feature you want to
>> provide, the descoping of diskstore (or a similar alternative) from
>> the roadmap gives redis a weak and problematic persistency support in
>> scenarios where instead redis is meant to be shining.
>
> Again, not agreed. The fact that the peak memory can be 2x the RAM
> does not mean we have a persistence problem. Just if you have a lot of
> writes, and you want point-in-time persistence, the price to pay is
> obviously up to 2x memory. But I suggest trying this into Redis 2.4 in
> a simulation, it is much better than it used to be.
But you can't store stuff that is bigger than RAM even though 60% of the time you use 10% of what you are storing in redis.
> It is however possible to reason about having an alternative
> persistence not guaranteeing point-in-time snapshots, but guaranteeing
> instead very low memory usage while saving. In the persistence arena
> there is a lot to experiment for us still, but this has nothing to do
> with diskstore in my opinion.
RDMSes have been doing point-in-time snapshots and guaranteeing persistency for a very long time.
--
Valentino Volonghi aka Dialtone
Now Running MacOSX 10.6
http://www.adroll.com/
> Redis basic caching behavior wouldn't change, you'd just write to persistent storage after X changes happened, but instead of writing out the entire memory content you write out only the values that changed. This can't be slower or worse than writing out GBs of data each time.
>
> You don't need to keep a timer on each key, you simply need to write to disk only the keys that have actually changed instead of the entire snapshot.
This is not how it works, just an example: you set a time of 1000
seconds between saves, in this 1000 seconds you touch 50% of the
dataset. Then you need to have half of the dataset in memory, as
modified keys can't be discarded to free memory. So you can no longer
guarantee diskstore-max-memory. It is more complex than that btw but
the example is enough to show that things are more interesting than
what you may think at first.
> Worst case might be 2x but even if it was 1.5x it's still 30% of the memory that you can't use for storage but have to reserve for backups.
If you want an in-memory data store, you can't escape the 2x rule. I
can provide you a mathematical proof of that.
You have two storage media, one is RAM, one is disk. You want to
transfer a point in time snapshot from RAM to disk.
Once you start dumping the content of the first media into the second
one, new writes may arrive in the first media. In order to guarantee
the point-in-time semantics you need to accumulate this changes in
some way.
So, Law Of Redis #1: an in memory database takes, to produce an
on-disk point-in-time snapshot of the dataset, an amount of additional
memory proportional to the changes received in the dataset while the
snapshot is being performed.
You can implement that in different ways but the rule does not change.
In our case we use copy-on-write, so it is particularly severe as
every modified byte will copy a whole page. So for instance in the
worst case just 5000 modified keys will COW 19 MB of pages.
However the alternative is to duplicate values when needed, and Redis
values can be very large lists for instance, so COW is the best thing
we can do. Now that we optimized it, in most conditions this is a non
issue for many users. And for users with a trilion of changes per
second, 2x RAM is the requirement.
> Being diskstore more key-oriented it should be possible to have the requirement above be done at the # key-value changes size level rather than at the entire dataset level.
I think that people want simple behavior for persistance, without
weights to assign to keys.
But this is not the point of what I was saying. What I was saying in
the above sentence is that diskstore is unrelated to that, you can
just have, without diskstore, a saving thread that saves the keys one
after the other without guaranteeing any point-in-time feature, and
such a system will use constant additional memory to save.
So we are mixing arguments IMHO. Diskstore was addressing just two things:
1) datasets larger than RAM.
2) fast restarts, as there is no load time since memory is just a cache.
We'll see later in this email how actually it sucks at "1" and at "2" anyway.
> Why is its only advantage fast restart? Its biggest advantage is better memory efficiency, I think very few care about fast restarts and even in that case fast restarts happen mostly with big datasets that will be crippled first by the non efficient use of memory.
I agree that fast restarts are not that killer feature, but I just was
mentioning one of the advantages.
The other is that you use less memory to store the same amount of
data. For that to work well you need:
1) That you have a very biased access pattern. If access is evenly
distributed the system will behave like an on-disk system, and this is
not the goal of Redis.
2) That you have little writes, otherwise you can either select to
have an hard memory bound and slow down writes, or you can try to
handle peaks going higher with memory usage for a while. But at the
end of the day you need to start slowing down clients to save your
long queue of keys. Remember: you have a fixed memory limit in this
setup.
3) Also values need to be small. Basically this means to turn Redis
into a plain key-value store. You can't have a big sorted set as a
value, as to serialize-unserialize that is hard. Otherwise you need to
go much more forward and implement all the Redis types on disk
directly, efficiently, and without fragmentation. This means to create
a completely different project: could be nice but is not Redis.
So I'm pretty shocked to hear that diskstore is the solution to a
persistence problem.
If we can have a different persistence engine that drops point-in-time
optionally in favor of some other kind of persistence mechanism, why
not? This would be cool. But fixing that with a disk-based storage
engine does not make sense if we want a fast in-memory store that can
do writes as fast as reads, can have a key holding a 40 millions
entries sorted set without even noticing the load, and so forth.
> But you can't store stuff that is bigger than RAM even though 60% of the time you use 10% of what you are storing in redis.
The secret is being able to accept compromises in systems.
When there is such a big bias, it is at application level that you
need to get smarter. Use Redis as a cache, but even write against your
cache and transfer your values when needed (I do this with good
results for instance). But expose all this tradeoffs to the
application.
It is crystal clear that Redis data model is not compatible with on
disk storage, at least with the limitations imposed by the OS API and
disk controllers, where you can't have decent guarantee of consistency
without resorting to things like fsync() or to journals. So even SSD
are out of questions, you can't model complex data structures with
pointers on disk and expect it to rock.
> RDMSes have been doing point-in-time snapshots and guaranteeing persistency for a very long time.
The 2x memory problem only exists when you want point-in-time on an
external media.
When both your "live" data and your "persistence" data are the same
thing to have point-in-time is a non brainer and done by RDMSs
forever.
Ciao,
Salvatore
> --
> Valentino Volonghi aka Dialtone
> Now Running MacOSX 10.6
> http://www.adroll.com/
>
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
--
That is all up to our factors:
1) time it takes to persist on disk.
2) numer of changes per unit of time.
Additional memory is 1 * 2.
When this is not acceptable there are two possibile solutions we
should start developing.
SOLUTION A: a persistence method that does not have point-in-time
guarantees. Just a thread that saves a key after the other.
SOLUTION B: using the append only file, but write it in segments, and
write tools that can compact this pieces tracing DELs and other
commands. So you don't need a background rewrite process.
I like that we are talking about improving that part of Redis, and
there are the tools to do that, and nice challenges as well :)
But diskstore is not the solution for this problem... it is instead
the start of many other issues.
> On Wed, Jun 1, 2011 at 9:07 PM, Valentino Volonghi <dial...@gmail.com> wrote:
>
>> Redis basic caching behavior wouldn't change, you'd just write to persistent storage after X changes happened, but instead of writing out the entire memory content you write out only the values that changed. This can't be slower or worse than writing out GBs of data each time.
>>
>> You don't need to keep a timer on each key, you simply need to write to disk only the keys that have actually changed instead of the entire snapshot.
>
> This is not how it works, just an example: you set a time of 1000
> seconds between saves, in this 1000 seconds you touch 50% of the
> dataset. Then you need to have half of the dataset in memory, as
> modified keys can't be discarded to free memory. So you can no longer
> guarantee diskstore-max-memory. It is more complex than that btw but
> the example is enough to show that things are more interesting than
> what you may think at first.
It is only slightly more complex, if you also have a max-memory requirement then you can set a flush memory threshold that flushes to disk, the same way you currently have multiple settings for the redis snapshot depending on how many changes there are or how much time passed.
If diskstore-max-memory is an hard limit then it's gonna have to be part of the flushing decision beyond the timer setting. Likewise for all other hard memory limits that a user sets.
>> Worst case might be 2x but even if it was 1.5x it's still 30% of the memory that you can't use for storage but have to reserve for backups.
>
> If you want an in-memory data store, you can't escape the 2x rule. I
> can provide you a mathematical proof of that.
The 2x rule is only valid when you save the entire dataset instead of just what has changed in which case it would only be valid in a very extreme case.
>> Being diskstore more key-oriented it should be possible to have the requirement above be done at the # key-value changes size level rather than at the entire dataset level.
>
> I think that people want simple behavior for persistance, without
> weights to assign to keys.
Nobody assigns weights to keys any more than what it's currently being done in volatile-lru. Most accessed stays in memory, least accessed doesn't.
>> Why is its only advantage fast restart? Its biggest advantage is better memory efficiency, I think very few care about fast restarts and even in that case fast restarts happen mostly with big datasets that will be crippled first by the non efficient use of memory.
>
> I agree that fast restarts are not that killer feature, but I just was
> mentioning one of the advantages.
> The other is that you use less memory to store the same amount of
> data. For that to work well you need:
These are the same 2 advantages that I mentioned, just to be clear, so we are on the same page.
> 1) That you have a very biased access pattern. If access is evenly
> distributed the system will behave like an on-disk system, and this is
> not the goal of Redis.
I can't understand this.
Why is it impossible to implement a memory store that uses a background process to save changed keys?
Redis has no power to decide on the frequency of access of each key. If you access evenly every key that you'll need more memory, this point is moot. But the vast majority of use-cases accesses a few keys most of the time, in this case the solution proposed would work fine. People who enable diskstore do it knowing that it's the usecase. All redis users are already keeping everything in memory so you'd be providing a new feature to better handle more data without requiring more machines.
> 2) That you have little writes, otherwise you can either select to
> have an hard memory bound and slow down writes, or you can try to
> handle peaks going higher with memory usage for a while. But at the
> end of the day you need to start slowing down clients to save your
> long queue of keys. Remember: you have a fixed memory limit in this
> setup.
writes to memory get aggregated in memory then saved only once to disk. If you can save the entire dataset every X minutes today there is just no way you wouldn't be able to save just a few keys. Guaranteeing fixed memory limits to the single byte is really hard but I can't see it as a problem.
> 3) Also values need to be small. Basically this means to turn Redis
> into a plain key-value store. You can't have a big sorted set as a
> value, as to serialize-unserialize that is hard. Otherwise you need to
> go much more forward and implement all the Redis types on disk
> directly, efficiently, and without fragmentation. This means to create
> a completely different project: could be nice but is not Redis.
I can see that this is an issue, but the rest you mentioned is definitely not.
> So I'm pretty shocked to hear that diskstore is the solution to a
> persistence problem.
It's a memory efficiency problem from my point of view, not only you can't manage datasets bigger than RAM, you actually can't manage datasets bigger than 50% of your RAM.
> If we can have a different persistence engine that drops point-in-time
> optionally in favor of some other kind of persistence mechanism, why
> not? This would be cool. But fixing that with a disk-based storage
> engine does not make sense if we want a fast in-memory store that can
> do writes as fast as reads, can have a key holding a 40 millions
> entries sorted set without even noticing the load, and so forth.
I fail to see how aggregated memory writes are connected to disk writes any more than the current snapshotting solution. It's the equivalent problem to what happens when you can't save a snapshot in time for the next snapshot to be taken?
>> But you can't store stuff that is bigger than RAM even though 60% of the time you use 10% of what you are storing in redis.
>
> The secret is being able to accept compromises in systems.
There are good compromises and bad compromises.
> When there is such a big bias, it is at application level that you
> need to get smarter. Use Redis as a cache, but even write against your
> cache and transfer your values when needed (I do this with good
> results for instance). But expose all this tradeoffs to the
> application.
>
> It is crystal clear that Redis data model is not compatible with on
> disk storage, at least with the limitations imposed by the OS API and
> disk controllers, where you can't have decent guarantee of consistency
> without resorting to things like fsync() or to journals. So even SSD
> are out of questions, you can't model complex data structures with
> pointers on disk and expect it to rock.
Redis is the perfect write-through cache for more complex data structures, I can see all your points regarding the complexity of implementing a disk format that allows for fast updates in place (what most other stores do is use append-only files and compaction, similar to what MVCC), but I can't see any other point about consistency, durability, tradeoffs or memory usage. Redis is in the best position to know how to manage its keys rather than the client and this is a pretty widespread usecase.
>> RDMSes have been doing point-in-time snapshots and guaranteeing persistency for a very long time.
>
> The 2x memory problem only exists when you want point-in-time on an
> external media.
> When both your "live" data and your "persistence" data are the same
> thing to have point-in-time is a non brainer and done by RDMSs
> forever.
Most RDBMSes also write the changes to a buffer and flush it when needed, it's a basic configuration parameter for all of them. There is nothing inherently wrong or slow in RDBMSes, it's the number of features that you use that makes it a problem, if you want full ACID guarantees then you are gonna be slow, relax D a bit and it will be much faster.
http://www.scribd.com/doc/31669670/PostgreSQL-and-NoSQL
> I like that we are talking about improving that part of Redis, and
> there are the tools to do that, and nice challenges as well :)
> But diskstore is not the solution for this problem... it is instead
> the start of many other issues.
diskstore is just the name of a system that behaves like a write-back cache instead of a VM.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
Please upgrade to 2.2.8, there was an important change lately that
reduced this problem a lot.
Before of that fix Redis used a lot of memory when saving sets,
hashes, or sorted sets that were not specially encoded even if there
were *no writes* going on. This new fix finally prevents this issue.
Cheers,