Compression / Archiving Data

59 views

Skip to first unread message

Ryan Amos

unread,

Jul 11, 2009, 1:14:40 AM7/11/09

to Redis DB

While Redis looks very promising, I think it's missing quite a few
features. The main one being the fact that there's no ability to
archive data.

Let me give you a quick rundown of what exactly I've been doing some
tests on. Essentially, we can think of the data structure used in the
test application is very similar to the twitter application provided
on the google code page. To simplify everything I'm going to go into
details relating everything to the twitter application. Below is the
information based on my tests.

18633 Users. This is the number returned by DBSIZE. So at this time,
I have a list for each user which stores ids based on when one of the
people they are following makes a new post.

16825 Items / List. This number is VERY rough as I only pulled a few
numbers to simply get some quick information. So essentially, every
user can have 0 to x items in their list, where x is the total number
of posts (if they were following everyone).

71MB of data on disk. After the inserts have been completed, I saved
the DB and got the size of the database on disk. This is absolutely
perfect, but the fact of the matter is, this is nothing comparable to
the size in memory.

1498124952 Bytes ~ 1428MB in memory. This is the issue at hand.
There is absolutely no way to scale this if the data really eats up
this much information in memory. I understand that you can partition
the data, but the fact of the matter is, if we were to use this in
production, we would need nearly 10 servers in order to store the data
we need.

So now that you have a run down of the tests I've ran, I'd like to
make some proposals for changes.
==========

So in this application, there are over 10,000 list items store per
user, when they will only ever want to view the latest 10 for
example. While keeping all of that information is important, there
needs to be a way to archive it. The twitter application would be
absolutely fine working with only the last 200 posts made by the
people i'm following for example. This would decrease the number of
items stored to a great extent.

While this information is important to keep, it is not vital for us to
store it in memory. There needs to be a way for us to store data on
disk, but also be able to manipulate it. If a user decided to delete
their posts, you would have to delete their reference in all of their
followers. This is fine, but if it's stored on disk there also needs
to be a way to remove the reference in the archived data on disk.

So what I propose is a command which saves and loads data on disk.
All the additional commands would work the same way on the disk based
data, however there should be a queue system as well so it doesn't
slow it down completely.
Commands:

SELECT # (disk)?

So essentially it would create a disk based database rather than
memory based.

Or even, build the data on another memory based database and simply
have a command to send it to disk.

ARCHIVE n (where n is the database number)
LOAD n

This would give the ability to save data to disk and load it back
when / if needed.

I know this is a long read, but without explaining in detail the need
for this, you would have simply said scale, but that's just not an
option and if redis doesn't have some type of additional features to
assist this issue, it simply cannot become a production based database
(at least for us).

Salvatore Sanfilippo

unread,

Jul 17, 2009, 6:32:13 AM7/17/09

to redi...@googlegroups.com

On Sat, Jul 11, 2009 at 7:14 AM, Ryan Amos<amos...@gmail.com> wrote:
>
> While Redis looks very promising, I think it's missing quite a few
> features. The main one being the fact that there's no ability to
> archive data.

Hello!

I'm investigating a way to create different storage engines for Redis.
This could allow in the future to run Redis against an on-disk
database with a similar semantic to the one it has today, but with
different time complexity for the operations of course.

I'm not merging this new code into Redis Git for now, I'm hacking
against a 0.0900 tar.gz to see if I like what I get, and especially
the impact this additional layer has on the currently in-memory
implementation.

Will post more information later.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Reply all

Reply to author

Forward

0 new messages