Re: [spain-scalability] unique visitors in a website

9 views
Skip to first unread message

Pere Ferrera

unread,
Mar 18, 2013, 8:02:12 AM3/18/13
to spain-scala...@googlegroups.com
This is a well-known Big Data issue, and I recommend you to read this article: http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html

In the article they explain different approximate methods that have a very small memory footprint, and these methods have an interesting property: because they are based in maps of bits, you can merge two maps and still get the same accuracy for the addition of several partial counters (which I think is what you need for your use case).

On Sat, Mar 16, 2013 at 12:58 PM, carlos Hernandez <carl...@gmail.com> wrote:
How would you (incrementally) calculate the unique visitors of a website, during a certain period of time?

Just imagine you have precalculated (in a batch process, using hadoop); that:

 on monday, you had 130 000 000 unique visitors  
 on tuesday, you  had 140 000 011 unique visitors
 on wennesday, you had 143 000 222 unique visitors

And you are asked to be able to determine how many unique visitors you've had in the 3 days (obviously, the number of visitors you've had in the 3 days is not the addition of the number of visitors each day) without recalculating everything again; it shouldn't take more than... 2-3 seconds.

I already have some good ideas about how to do this, but, due to the huge amount of unique visitors, it is hard to represent (and store in a database table), a set of 130 000 000 elements, even if it is compressed, (and then adding it to another set of 140 000 011 elements)

Any comments, ideas, suggestions, etc, would be very appreciated.
Thank you very much.

--
Has recibido este mensaje porque estás suscrito al grupo "spain-scalability-users" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a spain-scalability...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages