Ryan Amos
unread,Jul 11, 2009, 1:14:40 AM7/11/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Redis DB
While Redis looks very promising, I think it's missing quite a few
features. The main one being the fact that there's no ability to
archive data.
Let me give you a quick rundown of what exactly I've been doing some
tests on. Essentially, we can think of the data structure used in the
test application is very similar to the twitter application provided
on the google code page. To simplify everything I'm going to go into
details relating everything to the twitter application. Below is the
information based on my tests.
18633 Users. This is the number returned by DBSIZE. So at this time,
I have a list for each user which stores ids based on when one of the
people they are following makes a new post.
16825 Items / List. This number is VERY rough as I only pulled a few
numbers to simply get some quick information. So essentially, every
user can have 0 to x items in their list, where x is the total number
of posts (if they were following everyone).
71MB of data on disk. After the inserts have been completed, I saved
the DB and got the size of the database on disk. This is absolutely
perfect, but the fact of the matter is, this is nothing comparable to
the size in memory.
1498124952 Bytes ~ 1428MB in memory. This is the issue at hand.
There is absolutely no way to scale this if the data really eats up
this much information in memory. I understand that you can partition
the data, but the fact of the matter is, if we were to use this in
production, we would need nearly 10 servers in order to store the data
we need.
So now that you have a run down of the tests I've ran, I'd like to
make some proposals for changes.
==========
So in this application, there are over 10,000 list items store per
user, when they will only ever want to view the latest 10 for
example. While keeping all of that information is important, there
needs to be a way to archive it. The twitter application would be
absolutely fine working with only the last 200 posts made by the
people i'm following for example. This would decrease the number of
items stored to a great extent.
While this information is important to keep, it is not vital for us to
store it in memory. There needs to be a way for us to store data on
disk, but also be able to manipulate it. If a user decided to delete
their posts, you would have to delete their reference in all of their
followers. This is fine, but if it's stored on disk there also needs
to be a way to remove the reference in the archived data on disk.
So what I propose is a command which saves and loads data on disk.
All the additional commands would work the same way on the disk based
data, however there should be a queue system as well so it doesn't
slow it down completely.
Commands:
SELECT # (disk)?
So essentially it would create a disk based database rather than
memory based.
Or even, build the data on another memory based database and simply
have a command to send it to disk.
ARCHIVE n (where n is the database number)
LOAD n
This would give the ability to save data to disk and load it back
when / if needed.
I know this is a long read, but without explaining in detail the need
for this, you would have simply said scale, but that's just not an
option and if redis doesn't have some type of additional features to
assist this issue, it simply cannot become a production based database
(at least for us).