Hi guys,
I need to implement a very fast action counter system on different
objects (like each click on music video is +1 vote)
~1’000’000 different objects
~1’000 votes/sec.
I found the following solutions but neither of them is ideal :-(
1. Each vote is written in DB (not possible as appengine DB performance is too slow)
2. Shared counter - is COOL for example as WEBPAGE Hit Counter,
but not ideal when there are many objects which can be voted for.
3. Using own External WebService to count votes
- possible administration/network/ping/.... problems :-(
4. Memcache + bulk DB write in DB
- increment vote counter for each object in Memcache + write in DB every 5minute
a) during save no voting possible
b) possible loss of votes (Memcache clean Up from Google)
c) not the best performance...
There is one more important issue, we always have to know the sum of the total votes for each object as there will be an option of halving the votes.
I would appreciate any other ideas you may have?
Thanks
> 2. Shared counter - is COOL for example as WEBPAGE Hit Counter,
> but not ideal when there are many objects which can be voted for.
Why isn't that ideal? It sounds like it would be ideal for your use case.
Dave.
1,000,000 fully-consistent counters that are updated many times per
second (each!) is a big ask.
I think you will have to either compromise scale or consistency here.
Andrew
Option 1)
- Long term table with ~1e6 * (object_id, votes, timestamp)
- Recent table with N * (object_id, vote(s), timestamp)
- Background task doing:
* Read recent
* Update long term
* Remove recent
- Add vote by inserting into recent
- Obtain precise votes by summing long_term.votes + count(recent.votes
for object_id > long_term.timestamp).
Timestamp can be a vector clock as well, in case the timing tools
available don't give enough precision. There are also many other
tricks that may be used to reduce the number of rows inserted, but
these should follow up on the base plan.
Option 2)
Use MongoDB and shard enough.
Option 3)
Pay someone to do it. ;-)
--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog
-- I never filed a patent.
- We can accept 1% loss off votes... it is not a problem, BUT after each vote we need to know the actual total sum IMMEDIATELY (so background task is not a solution for us)- MongoDB....I would like find a solution on GAE(Java)
The background task performs parallel consolidation. You can know
what are the actual votes at any point in time by querying as
suggested.
Your IMMEDIATELY emphasis is quite suspicious, though. I don't know
what you're trying to do, but keep in mind that at the very instant
the number is known it's already out of date unless you're
_preventing_ others from voting at the same time, which won't work in
a distributed+scalable scenario as has been pointed out a few times.
You want to rank videos by vote?
http://googleappengine.blogspot.com/2009/01/google-code-jams-ranking-library.html
> 4. Memcache + bulk DB write in DB
You could use backends, which unlike memcache are not flushed:
Sounds like an easy problem to solve, even at that scale, and the
suggestions in this thread should put you in a good path. What's left
to do is sit down and actually figure the details and tests. You'll
need someone comfortable with that kind of problem, or that is willing
to bang head against the problem for a while. No magic answers.
FWIW, I suggested MongoDB not because of the flushing semantics. Even
if votes wait until fsync happens across several replicas, MongoDB
makes this a pretty simple problem because it has atomic increments
and standard sharding logic which enables distributing the workload
across several replica sets until it fits the problem size.
You can give your backends public web addresses and then call them
directly from javascript on the client, skipping the need for urlfetch
from a frontend server.