Thanks aaron, I already paid attention to these slides and I just looked at them again.
I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently.
I could easily count them per hour, day, week, month... But it's a bit harder to give this statistic between 2 unknown dates as explained at the start of this thread.
Am I missing any clue in these slides ?
2012/1/19 aaron morton <aa...@thelastpickle.com>
Some tips here from Matt Dennis on how to model time series data
Cheers
----------------- Aaron Morton Freelance Developer @aaronmorton
On 19/01/2012, at 10:30 PM, Alain RODRIGUEZ wrote: Hi thanks for your answer but I don't want to add more layer on top of Cassandra. I also have done all of my application without Countandra and I would like to continue this way.
Furthermore there is a Cassandra modeling problem that I would like to solve, and not just hide.
Alain 2012/1/18 Lucas de Souza Santos <luca...@gmail.com>
Why not http://www.countandra.org/
Lucas de Souza Santos (ldss)
On Wed, Jan 18, 2012 at 3:23 PM, Alain RODRIGUEZ <arod...@gmail.com> wrote:
I'm wondering how to modelize my CFs to store the number of unique visitors in a time period in order to be able to request it fast.
I thought of sharding them by day (row = 20120118, column = visitor_id, value = '') and perform a getcount. This would work to get unique visitors per day, per week or per month but it wouldn't work if I want to get unique visitors between 2 specific dates because 2 rows can share the same visitors (same columns). I can have 1500 unique visitors today, 1000 unique visitors yesterday but only 2000 new visitors when aggregating these days.
I could get all the columns for this 2 rows and perform an intersect with my client language but performance won't be good with big data.
Has someone already thought about this modelization ?
Thanks for your help ;)
Alain
|