high performance scalable counter

380 views
Skip to first unread message

Hunka

unread,
Jul 30, 2011, 2:21:38 AM7/30/11
to google-ap...@googlegroups.com

Hi guys,


I need to implement a very fast action counter system on different

objects (like each click on music video is +1 vote)

~1’000’000 different objects

~1’000 votes/sec.


I found the following solutions but neither of them is ideal :-(


1. Each vote is written in DB (not possible as appengine DB performance is too slow)


2. Shared counter - is COOL for example as WEBPAGE Hit Counter,

but not ideal when there are many objects which can be voted for.

 

3. Using own External WebService to count votes

- possible administration/network/ping/.... problems :-(


4. Memcache + bulk DB write in DB

- increment vote counter for each object in Memcache + write in DB every 5minute


a)    during save no voting possible

b)    possible loss of votes (Memcache clean Up from Google)

c)    not the best performance...


There is one more important issue, we always have to know the sum of the total votes for each object as there will be an option of halving the votes.

 

I would appreciate any other ideas you may have?

Thanks

David Symonds

unread,
Jul 30, 2011, 2:47:17 AM7/30/11
to google-ap...@googlegroups.com
On Sat, Jul 30, 2011 at 4:21 PM, Hunka <dev1...@gmail.com> wrote:

> 2. Shared counter - is COOL for example as WEBPAGE Hit Counter,
> but not ideal when there are many objects which can be voted for.

Why isn't that ideal? It sounds like it would be ideal for your use case.


Dave.

Han

unread,
Jul 30, 2011, 10:21:44 AM7/30/11
to google-ap...@googlegroups.com
~1’000’000 different objects......for each object Shared counter?

and "There is one more important issue, we always have to know the sum of the total votes for each object as there will be an option of halving the votes."

j_al...@rocketmail.com

unread,
Jul 30, 2011, 10:27:35 AM7/30/11
to google-ap...@googlegroups.com
Consider using sharded counters as described here: http://code.google.com/appengine/articles/sharding_counters.html  You can extend the counter examples described on that page to support more than one object by setting the counter key to the concatenation of the object identifier, a delimiter and a random number. 

Han

unread,
Jul 31, 2011, 2:52:33 AM7/31/11
to google-ap...@googlegroups.com
- NUM_SHARDS = 50
- 1'000'000 objects
= 50'000'000 counters

1. I think to get the actual ranking in 1second is impossible.
   - get the SUM for each object and SORT
2.Get the actual ranking by filter (country,author,...) ?
  in Datastore easy :-) "select ... where .... order by vote"

Han

unread,
Jul 31, 2011, 7:46:18 AM7/31/11
to google-ap...@googlegroups.com
+ reset counters for each object (50'000'000 counters) every week....long jobs :-(

Han

unread,
Aug 1, 2011, 3:03:57 AM8/1/11
to google-appengine-go
summary - Shared counter is not useful in this situation:

1) after each individual vote, I need the exact total of votes for
each voted object

2) NUM_SHARDS = 50, if 1'000'000 objects -> 50'000'000 counters in DB
- to get the actual ranking in 1 sec = impossible
- reset counters for each object (50'000'000 counters) every day =
impossible
- To filter actual rankings in 1 sec = impossible
(like in DB select ... where .... order by vote)

Any other idea? Thx

Fabs

unread,
Aug 2, 2011, 7:48:08 AM8/2/11
to google-appengine-go
You can just use a mapper job to halve all of the votes? I'm not sure
if mapper is available for go but you can easily deploy a python app
with a different application id, and use it to run these jobs. It also
means while the map job is running the number of votes will be
meaningless but for a system as big as you are designing this is
surely an acceptable loss.

For resetting your votes you can just change a key or something and
have a new set of counter entities each week, then in the background
you can have a job deleting the old entities.

Do these ideas sound helpful?

Han

unread,
Aug 2, 2011, 9:23:13 AM8/2/11
to google-ap...@googlegroups.com
Yes, right, but create 50milion new counter after each reset is impossible(expensive + performance problem)
- and all the other problems below stay the same :-(
but thx for idea :-)

Andrew Gerrand

unread,
Aug 2, 2011, 1:24:08 PM8/2/11
to Han, google-appengine-go
On 1 August 2011 00:03, Han <dev1...@gmail.com> wrote:
> 1) after each individual vote, I need the exact total of votes for
> each voted object

1,000,000 fully-consistent counters that are updated many times per
second (each!) is a big ask.

I think you will have to either compromise scale or consistency here.

Andrew

Gustavo Niemeyer

unread,
Aug 2, 2011, 1:47:24 PM8/2/11
to google-ap...@googlegroups.com
> I would appreciate any other ideas you may have?

Option 1)

- Long term table with ~1e6 * (object_id, votes, timestamp)
- Recent table with N * (object_id, vote(s), timestamp)
- Background task doing:
* Read recent
* Update long term
* Remove recent
- Add vote by inserting into recent
- Obtain precise votes by summing long_term.votes + count(recent.votes
for object_id > long_term.timestamp).

Timestamp can be a vector clock as well, in case the timing tools
available don't give enough precision. There are also many other
tricks that may be used to reduce the number of rows inserted, but
these should follow up on the base plan.

Option 2)

Use MongoDB and shard enough.

Option 3)

Pay someone to do it. ;-)

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I never filed a patent.

Han

unread,
Aug 3, 2011, 2:56:36 AM8/3/11
to google-ap...@googlegroups.com
- We can accept 1% loss off votes... it is not a problem, BUT after each vote we need to know the actual total sum IMMEDIATELY (so background task is not a solution for us)

MongoDB....I would like find a solution on GAE(Java)


Rob 'Commander' Pike

unread,
Aug 3, 2011, 4:58:44 AM8/3/11
to google-ap...@googlegroups.com

On 03/08/2011, at 4:56 PM, Han wrote:

- We can accept 1% loss off votes... it is not a problem, BUT after each vote we need to know the actual total sum IMMEDIATELY (so background task is not a solution for us)

MongoDB....I would like find a solution on GAE(Java)

You can have any pair of:

- accurate
- fast
- large

but you're asking for all three. That can't be done in a distributed system.

-rob

Gustavo Niemeyer

unread,
Aug 3, 2011, 6:25:35 AM8/3/11
to google-ap...@googlegroups.com
> - We can accept 1% loss off votes... it is not a problem, BUT after each
> vote we need to know the actual total sum IMMEDIATELY (so background task is
> not a solution for us)

The background task performs parallel consolidation. You can know
what are the actual votes at any point in time by querying as
suggested.

Your IMMEDIATELY emphasis is quite suspicious, though. I don't know
what you're trying to do, but keep in mind that at the very instant
the number is known it's already out of date unless you're
_preventing_ others from voting at the same time, which won't work in
a distributed+scalable scenario as has been pointed out a few times.

Stephen

unread,
Aug 3, 2011, 6:38:22 AM8/3/11
to google-ap...@googlegroups.com
On Sat, Jul 30, 2011 at 7:21 AM, Hunka <dev1...@gmail.com> wrote:
>
> I need to implement a very fast action counter system on different
> objects (like each click on music video is +1 vote)


You want to rank videos by vote?

http://googleappengine.blogspot.com/2009/01/google-code-jams-ranking-library.html


> 4. Memcache + bulk DB write in DB

You could use backends, which unlike memcache are not flushed:

http://code.google.com/appengine/docs/java/backends/

Ikai Lan

unread,
Aug 3, 2011, 2:08:17 PM8/3/11
to google-ap...@googlegroups.com
MongoDB only does this because the increments are flushed to disk regularly. You can implement a similar mechanism via Memcache's increment operation with periodic flushing.

I suspect, however, you are solving the wrong problem. What is the voting system supposed to do? What is the user supposed to experience? For instance, take a look at Google search results when you look for something long tail. The number of search results is a guess, but it provides the user with a rough guess of how many total results there are, though the number can often be inaccurate.

--
Ikai

Han

unread,
Aug 3, 2011, 4:32:21 PM8/3/11
to google-ap...@googlegroups.com
Imagine a competition between two music videos, and the number of votes will decide which the winner is.

Example:
 Music video 1 has 23’500 votes
 Music video 2 has 23’300 votes

In the lead is Music video 1 with 200 more votes compared to Music video 2. This means, we cannot ‘guess’ the 
number of votes displayed to the users, as this may influence the future incoming votes.

However, we can accept up to 20% of possible loss of votes in extreme case (for example 10’000 votes/sec...)

- ACCURATE (we can deal with up to 20% loss of votes, but cannot accept just a rough estimate)
- FAST (it is relative, displayed response *including updated votes sum of voted Music video* within 5 sec should not be a problem)
- LARGE (this is priority 1, we cannot compromise on this)

Gustavo Niemeyer

unread,
Aug 3, 2011, 4:46:35 PM8/3/11
to google-ap...@googlegroups.com
> Imagine a competition between two music videos, and the number of votes will
> decide which the winner is.
(...)

Sounds like an easy problem to solve, even at that scale, and the
suggestions in this thread should put you in a good path. What's left
to do is sit down and actually figure the details and tests. You'll
need someone comfortable with that kind of problem, or that is willing
to bang head against the problem for a while. No magic answers.

Ikai Lan

unread,
Aug 3, 2011, 5:58:20 PM8/3/11
to google-ap...@googlegroups.com
I agree with Gustavo. If you can lose 20% of votes, that's much easier, and if you can have up to a 5 second delay, that's REALLY easy. I'm all for Memcache increments with flushing every few seconds. You're very unlikely to lose 20% of votes. With a 5 second flush interval, the most you are likely to lose is a few pockets 5 seconds worth of votes of Memcache is flushed or some other catastrophic error occurs.

--
Ikai

Gustavo Niemeyer

unread,
Aug 3, 2011, 6:45:13 PM8/3/11
to Ikai Lan, google-ap...@googlegroups.com
> MongoDB only does this because the increments are flushed to disk regularly.
> You can implement a similar mechanism via Memcache's increment operation
> with periodic flushing.

FWIW, I suggested MongoDB not because of the flushing semantics. Even
if votes wait until fsync happens across several replicas, MongoDB
makes this a pretty simple problem because it has atomic increments
and standard sharding logic which enables distributing the workload
across several replica sets until it fits the problem size.

Han

unread,
Aug 4, 2011, 4:21:28 AM8/4/11
to google-appengine-go
- Memcache increments with flushing every few seconds
- 5 sec ~ save 50’000 votes ~ save 5’000 object vote counter
- 5’000 object update in DB every 5 sec, I don’t think that Datastore
can deal with it
- Possible same object update again and again every 5 sec
- DB re-indexing




On 3 Aug., 23:58, Ikai Lan <i...@google.com> wrote:
> I agree with Gustavo. If you can lose 20% of votes, that's much easier, and
> if you can have up to a 5 second delay, that's REALLY easy. I'm all for
> Memcache increments with flushing every few seconds. You're very unlikely to
> lose 20% of votes. With a 5 second flush interval, the most you are likely
> to lose is a few pockets 5 seconds worth of votes of Memcache is flushed or
> some other catastrophic error occurs.
>
> --
> Ikai
>

Fabs

unread,
Aug 5, 2011, 3:30:06 AM8/5/11
to google-appengine-go
If it's a competition between videos, you can either count results
every few minutes and cache those results (since while the votes are
happening the results will always be inconsistent anyway), or you can
end voting at a certain time, preventing further votes and then tally
everything precisely. You could use both approaches, showing a live
approximation of the results, and it won't affect the final precise
outcome. You know what you are trying to achieve and should be able to
work something out. Best of luck.

Han

unread,
Aug 8, 2011, 1:32:15 PM8/8/11
to google-appengine-go
"You could use backends, which unlike memcache are not flushed"

Communication between default Application and Backend - URL Fetch
1. limits: http://code.google.com/appengine/docs/quotas.html#UrlFetch
2. Outgoing Bandwidth (billable) http://code.google.com/appengine/docs/java/urlfetch/overview.html

If you have a milion of memcache operations per day..... :-(







On Aug 3, 12:38 pm, Stephen <sdeasey+gro...@gmail.com> wrote:
> On Sat, Jul 30, 2011 at 7:21 AM, Hunka <dev133...@gmail.com> wrote:
>
> > I need to implement a very fast action counter system on different
> > objects (like each click on music video is +1 vote)
>
> You want to rank videos by vote?
>
>  http://googleappengine.blogspot.com/2009/01/google-code-jams-ranking-...

Stephen

unread,
Aug 8, 2011, 3:30:01 PM8/8/11
to google-appengine-go
On Mon, Aug 8, 2011 at 6:32 PM, Han <dev1...@gmail.com> wrote:
> "You could use backends, which unlike memcache are not flushed"
>
> Communication between default Application and Backend - URL Fetch
> 1. limits:   http://code.google.com/appengine/docs/quotas.html#UrlFetch
> 2. Outgoing Bandwidth (billable)    http://code.google.com/appengine/docs/java/urlfetch/overview.html
>
> If you have a milion of memcache operations per day..... :-(


You can give your backends public web addresses and then call them
directly from javascript on the client, skipping the need for urlfetch
from a frontend server.

Reply all
Reply to author
Forward
0 new messages