--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
Your question seems simple, but it's not. Unless you've used Redis to
solve a few problems (and even if you have), it's impossible to be
able to come up with a flow chart to determine when Redis is the right
answer, when a standard relational database is the right answer, when
a document database is the right answer, etc. So I'm not going to try,
and I'm going to encourage others not to try. What I am going to do is
to explain to you why your 4 point numbered list is wrong, but I'll
get to that in a bit.
If you have a specific problem: if you can explain your data model,
your access patterns, your ultimate goal, and your money/hardware
limitations, you can post the problem here and we will give your our
opinions (and potential implementation strategies) on whether or not
Redis would work well to solve the problem. If we don't think Redis
can solve the problem well, we will tell you so, and may even direct
you to another piece of software that could solve your problem well.
"My goal is to just start a conversation", always reads like bullshit,
and in this case, it's obviously so. You have opinions, and you stated
them in your post: persistence doesn't get you anything, data
structures are worthless (I'm Dr. Josiah, so thanks for the shout-out
link, but that link doesn't say what you think it does), pub/sub and
messaging is worthless, ... oh, but some server operations are useful.
Here's the thing: you are wrong. Not sort-of wrong, but really wrong
about basically everything you said except for the operations part.
But since you brought it up, let me explain.
But first, the first part of my response to #2 should have the
"condascending wonka" picture imagined, the last part of my response
to #3 should be read in the voice of Louis Black, and when you read my
response to #1 and #4, you should picture me as an overweight 55-year
old man with a gray beard (despite the reality being quite the
contrary on all points). Let it not be said that I lack a sense of
humor, even if it's hard to figure out at times.
1. Persistence is not about "warming up the cache", persistence is
about anything from soft data-persistence to hard data-persistence
requirements (incidentally, this is also why membase exists, which is
a disk-backed memcache). If I know that my dump is only ever 1 hour
old, I only need to re-process 1 hour's worth of data to get Redis
back up to date. If I use AOF, then I know (depending on my settings)
that I may only ever lose 1 second (or 0 seconds) of data (1 second
AOF syncing is effectively as fast as no-AOF). While Redis can be used
as a cache, that's not it's primary usefulness or goal (if it was,
then all you'd ever see is GET/SET). If that's all you think it is, no
wonder your misconceptions about Redis abound.
2. You can implement "zinterstore key 5 keya keyb keyc keyd keye
weights -3.25 4 -2 7 1 aggregate sum" on top of memcache? Do tell how!
I'm sure that the millions of users of memcache would *love* to have
simple transparent support for scored set intersections. Heck, I know
when I was using memcache, I'd have given a week's pay or more just to
be able to use simple atomic RPUSH and LPOP operations (sadly, Redis
didn't exist at the time, and I personally lacked the imagination to
come up with a remote data structure server).
3. Support for pubsub/messaging via lists/etc. is all about
convenience. Why should you need to install RabbitMQ (with the 150 meg
Erlang install), ActiveMQ (with the 40-50 meg Java install), or one of
the other queues, when you get a very simple 85-99% solution
(depending on your use-case) for free? Yes, out of the box Redis
doesn't support 100% of the features of those purpose-built messaging
queues, but so what? Should we remove lists because someone *may* use
them for queues, and we want them to use those other queues because
those other queues "do it better" (which I can and have argued against
in the past)? No? Okay, so if we're not going to remove lists, then we
are going to say that Redis lists can be used for queues and
messaging, because to pretend it can't, or to not highlight it as a
feature, is silly. Also, pubsub is a total bonus feature, don't look a
gift horse in the mouth.
4. Server side support for *all* operations is useful. Adding lua
scripting on the server side is another step towards basically making
Redis a generic remote-execution RPC server, not dissimilar to stored
procedures in a database. The primary difference being the support for
data structures instead of database tables. If you can't imagine a
situation where literally every operation in Redis has a use-case (or
where every operation available in a relational database has a
use-case), then you lack imagination.
Warmest Regards,
- Josiah
> >> redis-db+unsubscribe@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/redis-db?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Redis DB" group.
> > To post to this group, send email to redi...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > redis-db+unsubscribe@googlegroups.com.
it's not a problem, it's a design choice.
if you want the ultimate in performance, you need enough RAM for your
dataset. if you have that, Redis is a great alternative. if not, any
database would not be as fast, and Redis authors simply chose not to
focus on that case.
--
Javier
[snip]
> Have always been a fan for Louise Black, do you think you can do a
> George Carlin voice next time?
While I <3 George Carlin, I thought that adding a few f-bombs to stay
in-character would have been be a little rude.
>> 1. Persistence is not about "warming up the cache", persistence is
>> about anything from soft data-persistence to hard data-persistence
>> requirements (incidentally, this is also why membase exists, which is
>> a disk-backedmemcache). If I know that my dump is only ever 1 hour
>> old, I only need to re-process 1 hour's worth of data to get Redis
>> back up to date. If I use AOF, then I know (depending on my settings)
>> that I may only ever lose 1 second (or 0 seconds) of data (1 second
>> AOF syncing is effectively as fast as no-AOF). While Redis can be used
>> as a cache, that's not it's primary usefulness or goal (if it was,
>> then all you'd ever see is GET/SET). If that's all you think it is, no
>> wonder your misconceptions about Redis abound.
> By cache I was not referring to only in memory cache for disk data,
> but also as a cache of pre-computed information. You point is taken-
> if you use Redis to pre-compute/cache complex information(that takes a
> long time to compute) repopulating the cache will take a very long
> time, and require a lot of CPU power.
Even beyond that, Redis is quite useful as a primary data store for a
variety of applications. Search engines, analytics engines, web
forums, etc. A smart engineer also uses another type of storage for
backup and recovery (like an AOF slave), which makes Redis not
significantly different than a typical relational database in that
regard.
>> 2. You can implement "zinterstore key 5 keya keyb keyc keyd keye
>> weights -3.25 4 -2 7 1 aggregate sum" on top ofmemcache? Do tell how!
>> I'm sure that the millions of users ofmemcachewould *love* to have
>> simple transparent support for scored set intersections. Heck, I know
>> when I was usingmemcache, I'd have given a week's pay or more just to
>> be able to use simple atomic RPUSH and LPOP operations (sadly, Redis
>> didn't exist at the time, and I personally lacked the imagination to
>> come up with a remote data structure server).
> Makes perfect sense. So if that's one of your use cases Redis will be
> the clear winner.
With Redis' ability to read/write portions of string values in-place
with getrange/setrange and getbit/setbit, even if Redis just had
strings, it would be significantly more useful than memcached (for
example). Add in the other 4 data structures, their operations, along
with replication, persistence, etc., and to me the only reason why
memcached is still in the market is because people have used it, are
used to it, and there are commercial support plans for the more
conservative companies/engineers.
On a theoretical point, I can imagine a world where Redis hadn't been
created. And let me tell you that my life would have been far less
pleasant, and I would have spent far more time writing services to
support a subset of the operations that Redis supports out of the box.
Even those pieces of software that had existing equivalents (search
with Lucene in particular), I would have spent more time integrating
with them than I spent just building it with Redis. So, from the
perspective from someone who has literally saved months (if not years)
of his life by *not* having to implement features without Redis.
The true innovation of Redis is the *idea* itself (shared/distributed
data structure server), which has gotten many thousands of people
(myself included) thinking about problems in a different way. If you
haven't gotten to that point yet, no worries. Either you do or you
don't.
>> 3. Support for pubsub/messaging via lists/etc. is all about
>> convenience. Why should you need to install RabbitMQ (with the 150 meg
>> Erlang install), ActiveMQ (with the 40-50 meg Java install), or one of
>> the other queues, when you get a very simple 85-99% solution
>> (depending on your use-case) for free? Yes, out of the box Redis
>> doesn't support 100% of the features of those purpose-built messaging
>> queues, but so what? Should we remove lists because someone *may* use
>> them for queues, and we want them to use those other queues because
>> those other queues "do it better" (which I can and have argued against
>> in the past)? No? Okay, so if we're not going to remove lists, then we
>> are going to say that Redis lists can be used for queues and
>> messaging, because to pretend it can't, or to not highlight it as a
>> feature, is silly. Also, pubsub is a total bonus feature, don't look a
>> gift horse in the mouth.
> I am not denying the usefulness of all these features, it is just that
> I am trying to compare apples to apples, DBs to DBs. So, for example,
> if I am choosing between Redis and Cassandra I will chose based on
> performance, memory usage, best fit for my app, etc.. Everything else
> being equal I will take the one with the extra "features".
Sure, but Cassandra and Redis are pretty far apart in the world of
databases; even introducing the comparison is like comparing a BMW M5
to a Land Rover. Sure, both are vehicles (like both Redis and
Cassandra are databases), both can get you places, but their primary
uses WRT work are significantly different.
[snip]
> Another point I completely forgot to mention before: What happens to
> all the nice remote operations in Redis when Redis is clustered? A lot
> of the remote operations will require internode communication which
> will slow things down. One way to solve this is to shard the data
> manually, but that is not always possible/easy. Isn't it very similar
> to the problem you start having when you try to distribute a RDBMS?
No multi-node operations. You can manually move shards between
servers, but I suspect that will be generally seen as a "last resort"
sort of thing. Manual sharding by the use of explicit sharding
identifiers will be used primarily to ensure data is all in a related
place (something like {shard}:table:id ).
Regards,
- Josiah
So you are looking for a possible implementation?
1. Friends lists are stored as a set, with properly configured size
information to ensure that the ziplist variant/intlist variant of the
encoding is used. That gives us 800 bytes/user, at 8 gigs total for
friends lists + structure overhead of ~ 600 megs.
2. Since the average article apparently only has 1 digg (10M articles,
10M diggs), you can store the users that dugg the article as a set for
each article, which you can also tune for ziplist/intlist storage
variants, for a total of 40 megs + structure overhead of ~ 600 megs.
3. To find your friends that dug a given article, you simply intersect
your friends list with that of the list for the article, giving you
the friends that dugg the article in basically the time it takes for a
round trip between your process and Redis.
4. Diggs are also stored on a per-user basis with a set as the article
that was dugg. Because this will tend to be small (1 on average),
again use the ziplist/intlist trick to get this down to about 40 megs
+ ~600 megs for structure overhead.
5. Each user also has a "bonus" scoring based on their friends that
dugg an article. It is a hash that uses the article id as the member,
with the number of users that dugg that article. Using the ziplist
trick because this will tend to be ~ 200 items, this is by far the
largest set of structures at roughly 16 gigs + 600 megs of overhead.
6. Whenever someone diggs an article, you update the article digg set,
that user's digg set, and you increment the user's friends' hashes.
Total memory use: a little over 26 gigs. Without the search scoring
stuff, it would be a bit over 9 gigs. Regardless, neither is all that
bad, it can be trivially scaled out horizontally with read slaves,
<1ms query times to find your friends that dugg an article, and <1ms
to pull the score modifications to articles based on friends that dugg
an article.
This beats the hell out of Cassandra because given the above
description, anyone with even modest experience with Redis and their
programming language of choice could implement it in a day or two with
minimal effort, maybe a week if they need to set up all of the proper
chef/puppet commands to deploy Redis, etc. I don't think I've ever
heard of a Cassandra integration (even with someone who has used
Cassandra heavily in the past) being finished in anything close to
that timeframe, and I surely wouldn't expect Cassandra to respond to
queries in <1 ms.
Regards,
- Josiah
Thank you for the excellent post on modelling Digg in Redis. I'm quite
green in my experimentation with Redis and duly blown away by your
mention of the "ziplist/intlist trick". Can you or anyone else explain
this more in detail?
Many thanks,
~ Brice
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
Makes sense. Thanks.
I assume that the serialization of string values -> base10 integers is
to be done via the programmer (e.g. in some sort of abstration
library/client adaper), and not handled automatically by the redis
engine itself? I am hoping, however, that there are existing tools, or
perhaps the engine itself, that perform this optimization automatically?
I've read over http://redis.io/topics/memory-optimization and it
appears the general strategy is to:
1. use numeric keys in hashes/lists/sets
2. shard your hashes/lists/sets so that they contain fewer entries (e.g.
come up with a technique to keep entries below the "max-intset-entries"
and related configuration values)
3. serialize your data (key's value) into integers
Is this correct -- or does Redis automatically perform the optimizations
of compressing key values [#3 above]?
~ Brice
Here's the thing: technically speaking, Redis sets store strings. You
have a set of strings. You send strings to Redis. *If* the strings
happen to be something like "38927" or "37236458" or any one of the
2**64 reasonable values, and in a single set they are ALL of that
format, and is less than the configured size, Redis does the
optimization automatically.
> I've read over http://redis.io/topics/memory-optimization and it appears
> the general strategy is to:
>
> 1. use numeric keys in hashes/lists/sets
I believe the int trick only works for sets, but I may be wrong. For
all three, short hashes/lists/sets will use a packed representation
where values (and keys in the case of hashes) are serialized in a
single chunk of memory so that you don't have sub-structures to deal
with.
> 2. shard your hashes/lists/sets so that they contain fewer entries (e.g.
> come up with a technique to keep entries below the "max-intset-entries" and
> related configuration values)
If your only purpose is reducing data set size, then yes, you shard
your data. I have personally found that I'm too lazy to use sharding
except when size becomes a huge issue. Incidentally, it has never
become a huge issue, but that has to do with the data I've worked with
in the past.
> 3. serialize your data (key's value) into integers
If you mean that you start out with "abc", and you turn that into
6382179, which you then send to Redis with the understanding that it's
going to do some magic, then you can do that, but then you may have
representational issues with leading nulls. But that would be a cute
optimization for some short strings.
> Is this correct -- or does Redis automatically perform the optimizations of
> compressing key values [#3 above]?
I didn't quite understand what you meant in #3, could you clarify your
question and what you meant?
Regards,
- Josiah
There is a slight performance hit, but the trick is that hashes and
zsets are not stored the same way for long zsets; zsets are a hash +
skiplist, where as a hash is just a hash. Also, zsets get you an
ordering by the bonus score, which was not listed in your spec.
For this case, because the sets have grown: 100 diggs * 200 users ->
at most 20k entries, you can't get the efficient representation for
bonus sizes as simply as I show it. It can be sharded with the
hash-sharding trick, but it adds some mental overhead, and complicates
the fetching of scores for article bonuses.
Because of the scale, and the rarity of searching, it would make sense
to just calculate it on the fly on one of a few dozen slaves via:
conn.zunionstore('score-adjustments', *['votes:' + user for user in
friends], aggregate='sum')
That would be a union over ~200 sets of ~100 entries each. That
actually saves you the 1.6 TB of ram, and would likely be performed in
under 10ms. That brings the total memory use to 16 gigs. I used to run
a pair of 40-60 gig resident Redis processes on a pair of 68 gig EC2
instances.
> Another option would be to do exactly what digg did. For each user/article
> pair store that number of common friends. You hashes solution basically does
> the same, but keeps the key simple, and by reducing the number of keys allow
> for more compact memory utilization. Anything else I am missing.
>
> This solution definitely sounds really good. Why would anybody use Casandra
> than? The digg examples is considered to be one of the classical Casandra
> use cases... In your opinion, when would it be better to use Casandra? Write
> intensive applications with a large data set? Complex schemas?
I've not used Cassandra in production for many reasons. The biggest
reasons are related to the fact that any situation where I think that
Cassandra may work well, I'm doing it wrong, and have gone with plain
logs + post-processing and/or map-reduce. The only thing that
Cassandra buys you over something like PostgreSQL is that you can
reshard your data. Which is to say that you can take your data on
machine X, and divide it in half. Seriously. That's the operation. On
the other hand, Riak offers most of the same semantic access patterns
over data, but has much better resharding support and allows you to
add/remove indexes dynamically (which Cassandra doesn't allow, and
incidentally, basically locks up MongoDB while the index creation
process is occurring).
If the world were composed of 3 databases: PostgreSQL, Redis, and
Cassandra, everything I did would be squeezed into a sharded/slaved
PostgreSQL with Redis on the top. Even in the real world of variety, I
still will choose PostgreSQL + Redis over everything else. If I do
have a need for no joins, more space that will fit in memory, and a
need for easy/fast cluster expansion, etc., I'd add Riak to the mix.
If an organization is using Cassandra, they must have had good
reasons. I can't know what all of their reasons are, and they may very
well have requirements that they can't say which makes Cassandra the
right answer. But for me, I would choose another solution (and have,
multiple times).
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/ZU3FCLty9X8J.
I'm now curious how you were thinking about the deployment of this?
Were you thinking sharded redis cluster (and if so how many nodes) or
one write-master with multiple read-slaves, etc.? And what about
persistence?
Thanks,
K.
---
http://blitz.io
@k0ws1k
I wasn't thinking about deploying this at all. I'm not a Digg
engineer, nor do I plan to be (I've got my own startup I'm building
right now). This was strictly a "this is how Josiah would build Digg
if he had Redis" exploration.
To continue down the rabbit hole: I would probably put each type of
data on it's own box.
1. friends (8 gigs + overhead)
2. article diggs (4 gigs + overhead)
3. user diggs (4 gigs + overhead)
To be able to scale searches, you would just add slaves to the 'user
diggs' Redis instance. If you get to the point of needing to
continually scale it, and the master can't keep up with writing to
slaves, add some intermediate slaves in a k-branching tree formation
(where k = 4-8), only ever performing queries at leaves. If any of
your intermediate nodes go down, you reslave its slaves in a balanced
manner against the other remaining intermediate slaves.
To handle data integrity, I'd have 2 boxes slaving from each of the
above masters with appendonly-everysec, with AOF rewriting every hour
(alternating every half hour), and I'd back those AOFs up immediately
after the slaving has completed.
Base data: 3 machines with a total of around 32 gigs needed.
For data integrity: another 6 machines with 64 gigs total memory.
Add 8 gig boxes for queries as necessary.
Regards,
- Josiah
On Fri, Mar 23, 2012 at 1:51 PM, eugene miretsky
<eugene....@gmail.com> wrote:
> Thanks Josiah!
>
> I missed a couple of zeros for the number of Diggs, I meant to say 1B - 100
> diggs on average per user or article.My bad. With that number in mind
> - Sets of users that dugg each article - 4G.
> - Sets of articles dugg by each user - 4G
> - Bonus scores - 1600G (If I understood your implementation correctly ).
> That's a lot of memory, but I guess it is not that bad considering the
> problem at hand.
>
> For " bonus" scores - why not use Zsets instead of hashes? For each user
> have a zset of articles, where the score is the number of friends that dugg
> the article. I would imagine that it has the same memory requirements. They
> only tradeoff I can see is that writes are going to be slower because the
> set has to be resorted.There is a slight performance hit, but the trick is that hashes and
zsets are not stored the same way for long zsets; zsets are a hash +
skiplist, where as a hash is just a hash. Also, zsets get you an
ordering by the bonus score, which was not listed in your spec.
For this case, because the sets have grown: 100 diggs * 200 users ->
at most 20k entries, you can't get the efficient representation for
bonus sizes as simply as I show it. It can be sharded with the
hash-sharding trick, but it adds some mental overhead, and complicates
the fetching of scores for article bonuses.Because of the scale, and the rarity of searching, it would make sense
to just calculate it on the fly on one of a few dozen slaves via:
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
On Fri, Mar 23, 2012 at 3:50 PM, kowsik <kow...@gmail.com> wrote:
> Josiah,
> Awesome way to architect this with nothing but redis data types. And I
> completely agree with you on the complexity curve of operationalizing
> a cassandra cluster.
>
> I'm now curious how you were thinking about the deployment of this?
> Were you thinking sharded redis cluster (and if so how many nodes) or
> one write-master with multiple read-slaves, etc.? And what about
> persistence?I wasn't thinking about deploying this at all. I'm not a Digg
engineer, nor do I plan to be (I've got my own startup I'm building
right now). This was strictly a "this is how Josiah would build Digg
if he had Redis" exploration.To continue down the rabbit hole: I would probably put each type of
data on it's own box.
1. friends (8 gigs + overhead)
2. article diggs (4 gigs + overhead)
3. user diggs (4 gigs + overhead)
To be able to scale searches, you would just add slaves to the 'user
diggs' Redis instance. If you get to the point of needing to
continually scale it, and the master can't keep up with writing to
slaves, add some intermediate slaves in a k-branching tree formation
(where k = 4-8), only ever performing queries at leaves. If any of
your intermediate nodes go down, you reslave its slaves in a balanced
manner against the other remaining intermediate slaves.To handle data integrity, I'd have 2 boxes slaving from each of the
above masters with appendonly-everysec, with AOF rewriting every hour
(alternating every half hour), and I'd back those AOFs up immediately
after the slaving has completed.
Apples and Oranges. zipmaps only apply to short zsets, it's an
encoding similar to a ziplist.
>> For this case, because the sets have grown: 100 diggs * 200 users ->
>> at most 20k entries, you can't get the efficient representation for
>> bonus sizes as simply as I show it. It can be sharded with the
>> hash-sharding trick, but it adds some mental overhead, and complicates
>> the fetching of scores for article bonuses.
>>
>> Because of the scale, and the rarity of searching, it would make sense
>> to just calculate it on the fly on one of a few dozen slaves via:
>
> Why are you saying that the search is rare? It has to be performed every
> time the user goes on the search/index page. So for every online user you
> are gonna have to perform it at least once.
This is a situation where you would calculate it maybe once/hour for
an online user and cache it. After calculating the scores, I'd pull
what data I wanted and store it in a separate Redis instance, just for
the sake of keeping memory use down on the query slaves.
Also, if you had a zset that stored the time-sequence of all articles
(10M articles should be <2 gigs) on the same shards as where the
queries take place, you can produce a secondary time series for your
augmented scores (articles by their score changes, and articles by
their time changes). You could then just fetch and use the score
adjustments for only those articles posted in the last X hours. For X
<= 1 week, you'd probably get 99% coverage at 1% of the space of
keeping the whole set around, and could always re-query and cache
longer for those rare cases where you need the full zset.
Regards,
>> > redis-db+u...@googlegroups.com.
>> > redis-db+u...@googlegroups.com.
>> > redis-db+u...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/OyUEwT9KWXIJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
You are right, good thing that kind of thing would be discovered
during productionization ;)
Revised:
1. friends (8 gigs + overhead)
2. article diggs (4 gigs + overhead) + user diggs (4 gigs + overhead)
If someone wanted to implement this stuff using scripts instead of
client-side requests, it could be done, in which case all data would
need to be on the same machine.
The reason why I like to separate the data as much in advance as
possible is because as your data grows, any time you can pre-partition
it, the easier time you will have. *Especially* when it comes to
vertical partitioning. If your data is unrelated, start by
partitioning it by database. Then if you ever need to really move it
to another box, you just copy your dump over, flush the dbs that you
don't need, flush the db off of the old one, and you are done.
>> To be able to scale searches, you would just add slaves to the 'user
>> diggs' Redis instance. If you get to the point of needing to
>> continually scale it, and the master can't keep up with writing to
>> slaves, add some intermediate slaves in a k-branching tree formation
>> (where k = 4-8), only ever performing queries at leaves. If any of
>> your intermediate nodes go down, you reslave its slaves in a balanced
>> manner against the other remaining intermediate slaves.
>>
>> To handle data integrity, I'd have 2 boxes slaving from each of the
>> above masters with appendonly-everysec, with AOF rewriting every hour
>> (alternating every half hour), and I'd back those AOFs up immediately
>> after the slaving has completed.
>
> Might be a naive question, but - why do you need the slaves for data
> integrity? Couldn't you just set up AOF on the master? Are you doing it to
> prevent disk writes on the master?
Data integrity lesson 1: A single disk storing data is worse than 0
disks storing data. With zero disks, you know that you're screwed.
With 1 disk, you believe that you are safe, where in fact you are only
just waiting to get f***ed.
Data integrity lesson 2: Two disks storing data with RAID 1 is even
worse than 1 disk, because you have just doubled your chances of a
disk dying on you, and unless you take certain steps, data corruption
bugs can creep in and infect you when you thought you were safe.
My kingdom for ZFS-style RAID-Z pools available on every platform.
That Btrfs isn't just taking the feature set of ZFS and
re-implementing it just blows my mind. I expect that when I get to the
point of really needing to care about my data down the line, I'll be
using FreeBSD just for it's support of ZFS.
Regards,
- Josiah
>> Base data: 3 machines with a total of around 32 gigs needed.
>> For data integrity: another 6 machines with 64 gigs total memory.
>> Add 8 gig boxes for queries as necessary.
>>
>> Regards,
>> - Josiah
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/yarE0VUNJ7gJ.
Regards,
- Josiah
>> Base data: 3 machines with a total of around 32 gigs needed.
>> For data integrity: another 6 machines with 64 gigs total memory.
>> Add 8 gig boxes for queries as necessary.
>>
>> Regards,
>> - Josiah
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/yarE0VUNJ7gJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
With raid 10, you lose 2 disks, you lose your data. With raidz, that
is also the default behavior, but you can add extra disks to push that
out. Maybe it seems paranoid to go that far, but some data is worth
more disks, and having the option simplifies management.
> There is zfsonlinux w/c has been pretty good for people using it for over a
> year. (I have only used it for a few months and it does perform as
> expected)
I do like that LLNL has been working on it, though they do still seem
to have a ways to go before having an official stable release,
according to their milestones. Thank you for the reference.
There are methods of sharding your data in other ways that will allow
you to scale beyond one node, but it ultimately requires performing as
many queries as you have shard masters (you don't want to use
clustering for this, as data migration would kill you). That said, if
you are willing to leave Amazon and/or most of the other hosting
providers, I know of a hosting provider that offers a 500 gig memory
box with 48 cores custom-built for about $2k US/month. With a box at
that size, you could just put everything in one Redis instance, as
long as you weren't single-processor limited. And with a box that
size, Postgres may even be able to handle what you needed to do:
http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html
.
If you are really interested in how you would scale this across X
sharded Redis instances, ping this thread and I can explain the
details after re-reading my earlier solution (but that will have to be
tomorrow). For now; I've forgotten what I wrote 3 weeks ago, and I've
got some other writing I need to do tonight.
Regards,
- Josiah