Re: How to store unique visitors in cassandra

Alain RODRIGUEZ Jan 19, 2012 6:25 AM
Thanks aaron, I already paid attention to these slides and I just looked at them again.

I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently.

I could easily count them per hour, day, week, month... But it's a bit harder to give this statistic between 2 unknown dates as explained at the start of this thread.

Am I missing any clue in these slides ?

2012/1/19 aaron morton <>
Some tips here from Matt Dennis on how to model time series data 

On 19/01/2012, at 10:30 PM, Alain RODRIGUEZ wrote:

Hi thanks for your answer but I don't want to add more layer on top of Cassandra. I also have done all of my application without Countandra and I would like to continue this way.

Furthermore there is a Cassandra modeling problem that I would like to solve, and not just hide.


2012/1/18 Lucas de Souza Santos <>
Why not

On Wed, Jan 18, 2012 at 3:23 PM, Alain RODRIGUEZ <> wrote:
I'm wondering how to modelize my CFs to store the number of unique visitors in a time period in order to be able to request it fast.

I thought of sharding them by day (row = 20120118, column = visitor_id, value = '') and perform a getcount. This would work to get unique visitors per day, per week or per month but it wouldn't work if I want to get unique visitors between 2 specific dates because 2 rows can share the same visitors (same columns). I can have 1500 unique visitors today, 1000 unique visitors yesterday but only 2000 new visitors when aggregating these days.

I could get all the columns for this 2 rows and perform an intersect with my client language but performance won't be good with big data.

Has someone already thought about this modelization ?

Thanks for your help ;)