How would you (incrementally) calculate the unique visitors of a website, during a certain period of time?
Just imagine you have precalculated (in a batch process, using hadoop); that:
on monday, you had 130 000 000 unique visitors
on tuesday, you had 140 000 011 unique visitors
on wennesday, you had 143 000 222 unique visitors
And you are asked to be able to determine how many unique visitors you've had in the 3 days (obviously, the number of visitors you've had in the 3 days is not the addition of the number of visitors each day) without recalculating everything again; it shouldn't take more than... 2-3 seconds.
I already have some good ideas about how to do this, but, due to the huge amount of unique visitors, it is hard to represent (and store in a database table), a set of 130 000 000 elements, even if it is compressed, (and then adding it to another set of 140 000 011 elements)
Any comments, ideas, suggestions, etc, would be very appreciated.
Thank you very much.
--
Has recibido este mensaje porque estás suscrito al grupo "spain-scalability-users" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a spain-scalability...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.