Using MongoDB behind recommendation system

794 views
Skip to first unread message

bwing

unread,
Sep 23, 2011, 1:40:51 AM9/23/11
to mongodb-user
We are searching for a Non-SQL solution to build a system behind a
small e-commerce web site that pushes recommendations to our end
users. The recommendations will be computed elsewhere and loaded up
to the system on a daily basis. We are comparing with MongoDB and H-
Base as two alternative solutions. We understand that MongoDB is good
for random small reads and writes where each read better covers only a
few records. On the other hand, HBase is good at reading a large
number of records with few disk-seeks.

As it is a recommendation system for a e-commerce web site, a quick
response time is critical. We probably only have a few records to
recommend at the beginning, but we would like the system to be able to
scale. With all these said, we felt that HBase might be a good way to
go. But we would love to hear different opinions from the MongoDB
user group. Any commends are highly appreciated.
~ bwing

Karl Seguin

unread,
Sep 23, 2011, 2:44:34 AM9/23/11
to mongod...@googlegroups.com
It'd be interesting to see what your recommendations look like. Is it simply "people who bought this were also interested in X,Y,Z", or is it more tailored per user? I find it hard to give any advice on this type of topic without some vision about what the data looks like, how it's being updated (you gave us a good idea on that), and how it's being queried.

I am surprised you gave the scaling edge to HBase over MongoDB. Again, this might be because I'm thinking of a totally different model than what you guys need, but it seems like MongoDB's auto-sharding would scale well on a "product" key or something. I'm not sure that one really has the edge over another here.

I guess, I can start this conversation by giving you my *simplistic/naive* thoughts on this. Are you thinking of something like:

products: [
 {_id: 1234, name: 'peanuts', recommendations: [8484, 2323, 4883]},
 {_id: 1235, name: 'cats', recommendations: [32, 43, 123]}
 ...
]

You could expand each recommendation to be a full-fledged object:

 {_id: 1234, name: 'peanuts', recommendations: [
       {id: 8484, name: 'pistachios', weight:100},
       ...
}

The choice to embed recommendations within a product depend on a few things, like how you'll be pulling the data out, and how many/how big recommendations are. But again, you get pretty nice scalability here by sharding on the main products _id..and if you are able to store each recommendation into a single cohesive document, it's a very efficient query.

Beyond that, a key benefit of MongoDB to me has always been its easy of use, both from a practical and a more conceptual level. Visualizing this stuff is really easy (just think of a JSON document)..it's almost natural. 

Again, if you don't mind providing us with some "teaser" data and queries, it'd probably help us help you :)

Karl


bwing

unread,
Sep 23, 2011, 9:05:25 PM9/23/11
to mongodb-user
Hi, Karl,
Thank you very much for your response. I try to provide more
information regarding to our use case. At the beginning, we plan to
only have one table in the database. A simple example of what we
would like to insert into the database is

[ userid:1234, recommendations: p1, p2, p3 ]
[ userid:2345, recommendations: p3, p4, p6 ]
...

The table will be indexed based on used_id. A secondary index based
on purchase history is also built.

I want to provide a bit more information based on our evaluation of
the use case. What we are looking for are:
1) capability to serve real time query with low latency
2) a lot of frequent random reads
3) capability of building indexes for better reading performance
4) distributed key-value store
5) scale linearly. We hope to serve more traffic by adding more
servers.
6) data are replicated
7) availability (24-7)

What we do NOT really care are:
1) the ability to serve complex arbitrary queries
2) great write throughput ( this is for on line serving purpose only,
no off line computation will be done on this system.)
3) great performance of sequential reads as our query is to serve
different customers, so the it will be a lot of small random queries,
but not range scan.
4) data consistency. We don't really care if the data is fully
consistent or eventually consistent. And there will not be any writes
to the database during the day time.

With all the above said, now we tent to favor MongoDB over HBase
because of two main advantages that MongoDB has:
1) Indexing
MongoDB allows building multiple indexes on one collection. HBase
does not maintain index itself. Application has to do it. With our
use case, we do not need complex query, but we still like the database
to maintain one or two index for us.
2) Reading performance of lots of frequent small read
As it is to serve an on-line system, quick response time is the key.
We feel that we do not have a great need of sequential read. And we
do not really care about write throughput since the system is only
updated once. However, random reads is very critical.

The only remaining concern is that there have been some great
recommendation systems built on top of HBase like what Stuble Upon
has, we have not been able to find any well-known recommendation
system built on top of MongoDB or is there any? After all, unlike a
fully featured recommendation engine, we do not need any offline
analytic to perform on this system particularly, neither any smart/
real-time updates during the day-time. What we really need is a low
latency, distributed key-value store with some ability of indexing.
~bwing

bwing

unread,
Sep 23, 2011, 9:19:16 PM9/23/11
to mongodb-user
With a couple more pointers,

> The choice to embed recommendations within a product depend on a few things,
> like how you'll be pulling the data out,

The recommendation database is behind a web server. When a end user
serving the page, a query/a batch of queries will be sent to the
recommendation database.

> and how many/how big recommendations are.
At the beginning, one record is approximately 2K. We plan to only
have one table with one/two indexes for the entire table.

>But again, you get pretty nice scalability here by
> sharding on the main products _id..and if you are able to store each
> recommendation into a single cohesive document,

Suppose our indexes can fit in the memory, what is the average single
document size cap that MongoDB can afford?

Thanks again,
~bwing

On Sep 22, 11:44 pm, Karl Seguin <karlseg...@gmail.com> wrote:

Karl Seguin

unread,
Sep 23, 2011, 9:51:26 PM9/23/11
to mongod...@googlegroups.com
A document is limited to 16MB...a single collection can hold an unlimited number of collections. Given what you've described (which is close to what I was thinking), you won't run into any MongoDB limits.  Given how read-heavy your case is, both replica sets and sharding will let you scale horizontally. 

You can start off with 1 master and 2 slaves. Add a slave as your needs grow...and when you hit 6 slaves (or your working set takes up more memory than you can fit in a box), introduce a shard of the same configuration. You can repeat this pattern forever.

As long as you keep your indexes in memory, you'll get very strong read performance. It wouldnt' be too hard for you to set it up and give it a try.

You can take a look at http://www.mongodb.org/display/DOCS/Production+Deployments  and search for "recommendation"  it shows up 6 times...which is something :)


Reply all
Reply to author
Forward
0 new messages