Hi, Karl,
Thank you very much for your response. I try to provide more
information regarding to our use case. At the beginning, we plan to
only have one table in the database. A simple example of what we
would like to insert into the database is
[ userid:1234, recommendations: p1, p2, p3 ]
[ userid:2345, recommendations: p3, p4, p6 ]
...
The table will be indexed based on used_id. A secondary index based
on purchase history is also built.
I want to provide a bit more information based on our evaluation of
the use case. What we are looking for are:
1) capability to serve real time query with low latency
2) a lot of frequent random reads
3) capability of building indexes for better reading performance
4) distributed key-value store
5) scale linearly. We hope to serve more traffic by adding more
servers.
6) data are replicated
7) availability (24-7)
What we do NOT really care are:
1) the ability to serve complex arbitrary queries
2) great write throughput ( this is for on line serving purpose only,
no off line computation will be done on this system.)
3) great performance of sequential reads as our query is to serve
different customers, so the it will be a lot of small random queries,
but not range scan.
4) data consistency. We don't really care if the data is fully
consistent or eventually consistent. And there will not be any writes
to the database during the day time.
With all the above said, now we tent to favor MongoDB over HBase
because of two main advantages that MongoDB has:
1) Indexing
MongoDB allows building multiple indexes on one collection. HBase
does not maintain index itself. Application has to do it. With our
use case, we do not need complex query, but we still like the database
to maintain one or two index for us.
2) Reading performance of lots of frequent small read
As it is to serve an on-line system, quick response time is the key.
We feel that we do not have a great need of sequential read. And we
do not really care about write throughput since the system is only
updated once. However, random reads is very critical.
The only remaining concern is that there have been some great
recommendation systems built on top of HBase like what Stuble Upon
has, we have not been able to find any well-known recommendation
system built on top of MongoDB or is there any? After all, unlike a
fully featured recommendation engine, we do not need any offline
analytic to perform on this system particularly, neither any smart/
real-time updates during the day-time. What we really need is a low
latency, distributed key-value store with some ability of indexing.
~bwing