Question about approach using redis

365 views
Skip to first unread message

Mason Jones

unread,
Jan 22, 2010, 3:45:45 PM1/22/10
to redi...@googlegroups.com
Hello,

I'm working on a new project and have been testing a few possibilities
to see which fits the problem the best... I've got a prototype working
with MongoDB, but there are some things that don't quite feel right.
So I'm trying it with Redis as well, but I thought I'd ask for an
opinion here.

I can't share the exact app details so I'll go with a similar problem.
Let's say it's a library, with books. Each book has a title and one or
more authors. Authors just have a name. My inclination was to store
each book with a key (let's say ISBN#) and its data, then create a set
for each author and add books to the sets. So for example:

SET book:12345 "Book 1"
SET book:67890 "Book 2"
SADD author:1 12345
SADD author:1 67890
SADD author:2 67890

So then I can use SMEMBERS to get all of the books by a particular
author. However, other queries don't feel right, such as: how to get
all of the authors for a given book? Likewise, how could I get a list
of all authors?

Also, if I have more metadata about a book, such as title, date,
publisher, then to get the details for an author listing, would I have
to do SMEMBERS, then iterate through the results and do a GET for each
one?

Thanks for any ideas, I appreciate it...

Александр Лозовюк

unread,
Jan 22, 2010, 3:51:44 PM1/22/10
to redi...@googlegroups.com
In simple, you may stored addition data in SET:  list of all author's, list's of all boot title. And your application must controling this data and his consistens

2010/1/22 Mason Jones <maso...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.




--
C уважением, Александр Лозовюк
Alpha-Beta-Release Blog
http://abrdev.com

Salvatore Sanfilippo

unread,
Jan 22, 2010, 4:06:37 PM1/22/10
to redi...@googlegroups.com
On Fri, Jan 22, 2010 at 9:45 PM, Mason Jones <maso...@gmail.com> wrote:
> Hello,
>
> I'm working on a new project and have been testing a few possibilities
> to see which fits the problem the best... I've got a prototype working
> with MongoDB, but there are some things that don't quite feel right.
> So I'm trying it with Redis as well, but I thought I'd ask for an
> opinion here.

This is the general "pattern" to solve your problem:

Give every book an unique ID (just INCR global:next.book.id), then
store a JSON representation of your book into the book:<id> key.

Now you can get books in O(1) by ID, with all the fields you want.

Then, for every kind of "search" or "listing" you want to support, add
a Set, a List, or a Sorted Set accordingly to your need.

For instance, want to get all the books written by a given author?
For every book you add, also add the (possibly normalized by ID)
author into a Sorted Set:

ZADD author:<id>:books <book-year-in-unix-time> <book:id>

This is more convenient than a Set, as when you paginate, you get the
books sorted by year. Why not?

Need a listing of the recently added books? Use a List. Every time you
add a book, LPUSH global:list.of.latest.books book:id

And so forth. With Redis the pattern is to add additional "views" for
every kind of query you need to perform. This will use more memory,
but will ensure queries are very, very fast.

With Virtual Memory many of this objects will get probably swapped on
disk. For instance if author ID 83424 is rarely accessed the sorted
set for this author will be registered on disk, and reloaded when
needed (this is Redis git land btw, no stable release yet, but we are
not too far. My guess is two/three months at max).

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Mason Jones

unread,
Jan 22, 2010, 5:21:06 PM1/22/10
to redi...@googlegroups.com
Thanks for your quick reply! That's the pattern that I thought would
be the best, but I still wonder about the cases where for example I
want to get all of the authors -- since an author is really a set, but
as far as I know there isn't a way for example to get a list of the
sets that match a certain pattern. Presumably then I should create
separate items representing the authors, and then add them to a list
of authors?

I think my main question then is getting the full data for an object
like a book. For example, say I get all of the books written by an
author. I'd do that with SMEMBER, which would return the book IDs. I
then need to take those IDs, turn around, and do a GET for each one to
get the full data, correct? I realize of course that the alternative
starts to become somewhat "relational", which isn't the idea! But if
there were a way to associate the results of the SMEMBER and within
redis interpret those as keys and return the results, it would be very
powerful. Let me know if that makes sense, and perhaps there is a way
to do this that I've missed.

Thanks again!

Salvatore Sanfilippo

unread,
Jan 22, 2010, 6:09:01 PM1/22/10
to redi...@googlegroups.com
On Fri, Jan 22, 2010 at 11:21 PM, Mason Jones <maso...@gmail.com> wrote:
> Thanks for your quick reply! That's the pattern that I thought would
> be the best, but I still wonder about the cases where for example I
> want to get all of the authors -- since an author is really a set, but
> as far as I know there isn't a way for example to get a list of the
> sets that match a certain pattern. Presumably then I should create
> separate items representing the authors, and then add them to a list
> of authors?

Hello,

you also need a Set with all the authors IDs. It is interesting to
note that after the VM is complete and 2.0 is released, the work will
focus on specially encoded sets of integers. So even big sets of IDs
will take just few megabytes. Don't know exactly but to give you an
idea, 1 million of elements into a set of integers could take
something like 30 MB (now much more is required), working exactly like
another set.

It's just integer encoding, that we already have. If you SET "x" to
"10" it will take a lot less memory than setting it to "x10", as
integer values are specially encoded.

Btw if your data is about a few millions of books, you probably should
just don't care a lot about this advanced stuff as a few GB of RAM are
already enough *even* without taking VM into account.

> I think my main question then is getting the full data for an object
> like a book. For example, say I get all of the books written by an
> author. I'd do that with SMEMBER, which would return the book IDs. I
> then need to take those IDs, turn around, and do a GET for each one to
> get the full data, correct? I realize of course that the alternative
> starts to become somewhat "relational", which isn't the idea! But if
> there were a way to associate the results of the SMEMBER and within
> redis interpret those as keys and return the results, it would be very
> powerful. Let me know if that makes sense, and perhaps there is a way
> to do this that I've missed.

Redis can do this already.

Imagine that "set" is a set with IDs 1,2,3,4 and you want to get this books:

$ redis-cli set book:1 "my book number one"
OK
$ redis-cli set book:2 "this is my second book"
OK
$ redis-cli set book:3 "my book number 3"
OK
$ redis-cli set book:4 "4 4 4 4 4 4 4 4 4 4 4 4"
OK
$ ./redis-cli sadd books 1
(integer) 1
$ ./redis-cli sadd books 2
(integer) 1
$ ./redis-cli sadd books 3
(integer) 1
$ ./redis-cli sadd books 4
(integer) 1
$ ./redis-cli sort books by nothing get book:*
1. my book number 3
2. 4 4 4 4 4 4 4 4 4 4 4 4
3. my book number one
4. this is my second book
$

Yes, Sort is able to GET also, and yes "by <any kind of string not
containing *>" is optimized away as it's a constant pattern and the
sort command is smart enough to understand this.

Mason Jones

unread,
Jan 22, 2010, 6:44:40 PM1/22/10
to redi...@googlegroups.com
On Fri, Jan 22, 2010 at 3:09 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Fri, Jan 22, 2010 at 11:21 PM, Mason Jones <maso...@gmail.com> wrote:
>> Thanks for your quick reply! That's the pattern that I thought would
>> be the best, but I still wonder about the cases where for example I
>> want to get all of the authors -- since an author is really a set, but
>> as far as I know there isn't a way for example to get a list of the
>> sets that match a certain pattern. Presumably then I should create
>> separate items representing the authors, and then add them to a list
>> of authors?
>
> you also need a Set with all the authors IDs. It is interesting to
> note that after the VM is complete and 2.0 is released, the work will
> focus on specially encoded sets of integers. So even big sets of IDs
> will take just few megabytes. Don't know exactly but to give you an
> idea, 1 million of elements into a set of integers could take
> something like 30 MB (now much more is required), working exactly like
> another set.

That will be interesting, yes!

> It's just integer encoding, that we already have. If you SET "x" to
> "10" it will take a lot less memory than setting it to "x10", as
> integer values are specially encoded.
>
> Btw if your data is about a few millions of books, you probably should
> just don't care a lot about this advanced stuff as a few GB of RAM are
> already enough *even* without taking VM into account.

You're correct, I don't think that the size of data is likely to be a
problem in this case, but we'll see as I begin to multiply the number
of sets, and I can see what it amounts to.

>> I think my main question then is getting the full data for an object
>> like a book. For example, say I get all of the books written by an
>> author. I'd do that with SMEMBER, which would return the book IDs. I
>> then need to take those IDs, turn around, and do a GET for each one to
>> get the full data, correct? I realize of course that the alternative
>> starts to become somewhat "relational", which isn't the idea! But if
>> there were a way to associate the results of the SMEMBER and within
>> redis interpret those as keys and return the results, it would be very
>> powerful. Let me know if that makes sense, and perhaps there is a way
>> to do this that I've missed.
>
> Redis can do this already.

Ah yes, using "sort" I forgot that it has additional options. Thank
you. I am curious, though -- is there a reason why "sort" has options
like limit/get while commands like "smember" don't have those? I would
suppose that "sort" is slower than "smember" on large sets, so there
could be some performance benefit. But perhaps underneath in the code
there's a reason why some sort-type operation is needed to support
those options?

Thank you again, this certainly works for my purposes. I will continue
with my comparison!

Salvatore Sanfilippo

unread,
Jan 22, 2010, 6:50:44 PM1/22/10
to redi...@googlegroups.com
On Sat, Jan 23, 2010 at 12:44 AM, Mason Jones <maso...@gmail.com> wrote:

> Ah yes, using "sort" I forgot that it has additional options. Thank
> you. I am curious, though -- is there a reason why "sort" has options
> like limit/get while commands like "smember" don't have those? I would
> suppose that "sort" is slower than "smember" on large sets, so there
> could be some performance benefit. But perhaps underneath in the code
> there's a reason why some sort-type operation is needed to support
> those options?

That's the trick: when SORT detects that you are sorting by "foobar"
(that is, a fixed key, and not for instance "foo:*") then it does not
sort at all. So it's as fast as SMEMBERS with LIMIT + GET more or
less. But instead to have many complex commands we have many simple
commands and a fat one, that is polymorphic as can work with sets,
lists and sorted sets, fully featured, dirty, but useful :)

Mason Jones

unread,
Jan 22, 2010, 6:54:29 PM1/22/10
to redi...@googlegroups.com
On Fri, Jan 22, 2010 at 3:50 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Sat, Jan 23, 2010 at 12:44 AM, Mason Jones <maso...@gmail.com> wrote:
>
>> Ah yes, using "sort" I forgot that it has additional options. Thank
>> you. I am curious, though -- is there a reason why "sort" has options
>> like limit/get while commands like "smember" don't have those? I would
>> suppose that "sort" is slower than "smember" on large sets, so there
>> could be some performance benefit. But perhaps underneath in the code
>> there's a reason why some sort-type operation is needed to support
>> those options?
>
> That's the trick: when SORT detects that you are sorting by "foobar"
> (that is, a fixed key, and not for instance "foo:*") then it does not
> sort at all. So it's as fast as SMEMBERS with LIMIT + GET more or
> less. But instead to have many complex commands we have many simple
> commands and a fat one, that is polymorphic as can work with sets,
> lists and sorted sets, fully featured, dirty, but useful :)

I understand. I'd definitely recommend that your explanation here
should be added to the wiki/documentation -- it's a great piece of
information! Thanks again for the quick response, this is terrific.

Alex

unread,
Jan 23, 2010, 2:13:39 AM1/23/10
to Redis DB
Hi Mason,

As you can see there are many ways to define your data. If you find
you are doing many 'relational' 'selects' on 'indexes', rather than
storing a book as JSON, XML, or CSV, you might break your book up into
the individual fields. This may reduce write collisions to difference
fields on the same book. For example, if your fields are numbers, you
can perform atomic increments without having to read the entire value
and re-serialize a new string value.

Cheers,
Alex

//
// d.f#KEY means get a field value by primary key
// d.f=VAL means get a primary key by field value
//

set "book.title=Gravitational+Radiation" A
set "book.title#A" "Gravitational Radiation"
sadd book.author#A Thorne
sadd book.author#A Hawking
sadd book.author#A Israel
set "book.year#A" 1987

set "book.title=The+Membrane+Paradigm" B
set "book.title#B" "The Membrane Paradigm"
sadd book.author#B Thorne
sadd book.author#B Price
sadd book.author#B Macdonald
set "book.year#B" 1986

set "author.name=Thorne" A
set "author.name#A" Thorne
sadd author.books#A A
sadd author.books#A B

set "author.name=Hawking" B
set "author.name#B" Hawking
sadd author.books#B A

set "author.name=Israel" C
set "author.name#C" Israel
sadd author.books#C A

set "author.name=Price" D
set "author.name#D" Price
sadd author.book#D B

set "author.name=Macdonald" E
set "author.name#E" Macdonald
sadd author.book#E B

und3f

unread,
Jan 25, 2010, 9:22:17 AM1/25/10
to Redis DB
Good day. I am new to redis and have question about your suggestion.

The problem is to keep data up-to-date. I have a project with a lot of
modifications per minute so unused data will not be stored on disk.
And keep all values sorted by all keys (what i need) could be
expensive.

What i need - to select IDs with specified coutries. Thats a rare
query so it could be better to sort data by query.
Must i do all selections on app side or you can suggest something?


On 22 янв, 23:06, Salvatore Sanfilippo <anti...@gmail.com> wrote:

> Salvatore 'antirez' Sanfilippohttp://invece.org

Aníbal Rojas

unread,
Jan 25, 2010, 9:54:41 AM1/25/10
to redi...@googlegroups.com
und3f,

rare query == unusual? When you talk about expensive you mean in
memory footprint?

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Mason Jones

unread,
Jan 26, 2010, 1:30:14 PM1/26/10
to redi...@googlegroups.com
Thanks for the thoughts on this. Yes, I'm looking at this option. I'm
also playing with the DataMapper-Redis adapter, which seems to do just
this. I tried a test object, and I see that it is creating a separate
k-v for each attribute of the object. The adapter then does the work
of "collecting" them when an object is queried. This is convenient, of
course, and matches the DataMapper/ActiveRecord style of doing things,
but of course it can be a bit less efficient; if I know that for a
particular operation I only need a couple of attributes, it would be
better to just grab those separately from the others.

Salvatore's planned Hashes will change the possibilities, of course,
so that will be very interesting indeed.

Thanks again.

Mason Jones

unread,
Jan 26, 2010, 1:38:39 PM1/26/10
to redi...@googlegroups.com
Salvatore, I have a followup question about this, just out of
curiosity. I see that when I use SORT with GET, it returns the value
of each item, but not the key. This isn't a big problem, but since one
of the common patterns is to use the key to hold an ID, it means that
the results of the SORT-GET don't include the ID information. For
example, if I have

book:12345 == "My Book Title"
book:57687 == "Another Book"

and they're in a set, and I do SORT with GET, I will have the values:

1. My Book Title
2. Another Book

But I won't know which IDs go with the books. If I expand this to
have, for example:

book:id:12345 == 12345
book:title:12345 == "My Book Title"

Then it becomes harder, because I don't know programmatically which
value is the id and which is the title. I'm not sure what the best
solution is for this, unless there is one that I've missed...

Aníbal Rojas

unread,
Jan 26, 2010, 2:21:08 PM1/26/10
to redi...@googlegroups.com
Mason,

Hashes will make modeling problems easier and simplify the code.
But meanwhile you can as an alternative store a JSON representation of
your data if it not required in a SORT or a similar condition.

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Gleicon Moraes

unread,
Jan 26, 2010, 2:28:59 PM1/26/10
to redi...@googlegroups.com
Usually I do both: store a JSON version, and put interesting data in normal SETs.
This way I can do searches using SINTER and SISMEMBER.
You can find some code I did at http://github.com/gleicon/docdb. Its part of a Information Retrieval/barebones document db using Redis as backend.

gm


2010/1/26 Aníbal Rojas <aniba...@gmail.com>



--
More cowbell, please !

Aníbal Rojas

unread,
Jan 26, 2010, 2:35:32 PM1/26/10
to redi...@googlegroups.com
Mason,

book:id:12345 == 12345 ??

You don't need this key, you are wasting memory. Or am I missing something?

Also after a SORT you can you get the numeric IDs and send a MGET command.

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Mason Jones

unread,
Jan 26, 2010, 5:45:32 PM1/26/10
to redi...@googlegroups.com
2010/1/26 Aníbal Rojas <aniba...@gmail.com>:

> Mason,
>
>    book:id:12345 == 12345 ??
>
>    You don't need this key, you are wasting memory. Or am I missing something?

My thinking was that including these would allow for a group SORT-GET
to retrieve all of the IDs. For example, to get the IDs of all of the
books by an author. However, if it is possible to get the keys, then
that could be replaced by, perhaps, a SORT-GET of the titles, and the
IDs would be embodied in the keys. That still seems inefficient,
though, since then I'd have to do the SORT-GET, then an MGET, then
parse the IDs from the keys. Getting all of the IDs with a single
SORT-GET of "book:id:*" seems very efficient.

>    Also after a SORT you can you get the numeric IDs and send a MGET command.

But if the SORT-GET returns a lot of entries, then doing an MGET turns
that into a potentially large data set through the client and then
through the socket to the redis instance, right? That doesn't seem
like a good performance choice...

Thanks for all the thoughts!

Aníbal Rojas

unread,
Jan 26, 2010, 6:45:46 PM1/26/10
to redi...@googlegroups.com
Mason,

Just to completely understand the problem: You want a list of books
from and author sorted by title name?

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Mason Jones

unread,
Jan 26, 2010, 7:28:36 PM1/26/10
to redi...@googlegroups.com
This is a bit tricky, because the book/author example is made up by me
to illustrate the real product issue, which unfortunately I have to
keep private. But essentially, it's a multi-directional problem:

- Get a list of books, sorted by title, for a given author

- Get a list of authors, sorted by name, for a given book

- Get a list of all authors in the database, sorted by name

And this is assuming many-to-many between authors and books.

Thanks!

Aníbal Rojas

unread,
Jan 27, 2010, 9:54:33 AM1/27/10
to redi...@googlegroups.com
Mason,


> This is a bit tricky, because the book/author example is made up by me
> to illustrate the real product issue, which unfortunately I have to
> keep private. But essentially, it's a multi-directional problem:

NP

Authors:

author:1:name -> 'Mark Twain'
author:2:name -> 'Herman Melville'

Lets consider the following books:

book:1:title -> 'The Adventures of Huckleberry Finn'
book:1:authors -> { 1 }

book:2:title -> 'The Adventures of Tom Sawyer'
book:2:authors -> { 1 }

book:3:title -> 'Moby Dick'
book:3:authors -> { 2 }

> - Get a list of books, sorted by title, for a given author

You will need a set of book IDs per author:

author:1:books -> { 1, 2 }
author:2:books -> { 3 }

Then issue a:

SORT author:1:books BY book:*:title

Optionally LIMIT it, as you want the list of books, you don't have to
specify a GET. This will return all the books IDs sorted by each book
title. Later you can fetch the data you want to display.

> - Get a list of authors, sorted by name, for a given book

SORT book:3:authors BY author:*:name

> - Get a list of all authors in the database, sorted by name

I would keep a global SET of authors and issue a sort:

authors -> { 1, 2 }

SORT authors BY author:*:name

> And this is assuming many-to-many between authors and books.
>
> Thanks!

You are welcome, hope this model helps

Mason Jones

unread,
Jan 27, 2010, 1:42:07 PM1/27/10
to redi...@googlegroups.com
Thanks again for these thoughts, they certainly do help! I'm curious,
if I want to get a list of books for an author, as you say I would do

SORT author:1:books BY book:*:title

Which will return the keys for the books, and then I have to do an
MGET to retrieve all of the data for each. Let's say I use something
like ISBN as the ID for each book, and I want to display that as well
as the title. My two choices are to do the SORT + MGET, so that I have
the ID (from the key) and the title (from the data); or to add the ID
to the data, which as you said before, sort of wastes memory. So
either I can perform two calls, or I can use some extra memory, which
is a normal trade-off, of course. I can do some testing to see how
things go but I'm curious if it's worth trying to avoid multiple calls
of this sort.

Thanks again.

2010/1/27 Aníbal Rojas <aniba...@gmail.com>:

Aníbal Rojas

unread,
Jan 27, 2010, 4:43:58 PM1/27/10
to redi...@googlegroups.com
Mason,

Redis is extremely fast, you will be surprised. Unless you are
recovering a _huge_ number of books at once, making multiple calls
should be not noticed.

Your other alternative is storing the whole book data as a one
key, lets say something like: book:1:bundle -> "{ id:1, title: ... }"
and considering the other fields used in sort and such as
denormalizations, so you an do only one GET, something like a SELECT *
FROM books WHERE id = 1;

Take a look at Evolving the Key/Value Programming Model to a
Higher Level: http://bit.ly/csoJL6 it addresses your concerns, its
very interesting.

Best regards,

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Mason Jones

unread,
Jan 27, 2010, 7:22:15 PM1/27/10
to redi...@googlegroups.com
Thanks for the pointer to that presentation, I'll certainly check it
out. I have the basic idea working in my prototype using Redis --
using Ezra's redis-rb, I was even able to make it work very easily
with Paginator to allow paging through the results (using SORT with
LIMIT)...very nice.

Another need that I'm curious for opinions on is: relationships
between books. Let's say a book has a bibliography, and I need to map
links between books. I'm tempted to simply create a set for each book
that contains the keys for the books that it references, but that does
create additional objects, of course. Does that seem reasonable?
Otherwise of course I'll end up putting the references into a data
structure in a book object, which I then need to parse and use to do
an MGET.

Does creating perhaps tens of thousands of sets pose a potential
problem? It's certainly a nice efficient way to get the lists of
references...

Aníbal Rojas

unread,
Jan 28, 2010, 3:20:00 PM1/28/10
to redi...@googlegroups.com
Mason,

Using a bunch of SETs with integers IDs will be very efficient due
to Redis integer encoding. Using SINTER you can find the books
referenced by two or more books also.

Best regards,

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Mason Jones

unread,
Jan 28, 2010, 5:25:35 PM1/28/10
to redi...@googlegroups.com
Great, thanks for the confirmation. I'm glad my thinking makes sense
in this case.


2010/1/28 Aníbal Rojas <aniba...@gmail.com>:

Aníbal Rojas

unread,
Jan 28, 2010, 5:40:01 PM1/28/10
to redi...@googlegroups.com
Mason,

You are welcome. Modeling problems with Redis becomes easier when
you forget normal forms and remember Data Structures classes ;)

--
Aníbal Rojas
Ruby on Rails Web Developer
http://www.google.com/profiles/anibalrojas

Reply all
Reply to author
Forward
0 new messages