|How to store unique visitors in cassandra||Alain RODRIGUEZ||1/18/12 9:23 AM|
I'm wondering how to modelize my CFs to store the number of unique visitors in a time period in order to be able to request it fast.
I thought of sharding them by day (row = 20120118, column = visitor_id, value = '') and perform a getcount. This would work to get unique visitors per day, per week or per month but it wouldn't work if I want to get unique visitors between 2 specific dates because 2 rows can share the same visitors (same columns). I can have 1500 unique visitors today, 1000 unique visitors yesterday but only 2000 new visitors when aggregating these days.
I could get all the columns for this 2 rows and perform an intersect with my client language but performance won't be good with big data.
Has someone already thought about this modelization ?
Thanks for your help ;)
|Re: How to store unique visitors in cassandra||Lucas de Souza Santos||1/18/12 9:32 AM|
Why not http://www.countandra.org/
Lucas de Souza Santos (ldss)
|Re: How to store unique visitors in cassandra||Alain RODRIGUEZ||1/19/12 1:30 AM|
Hi thanks for your answer but I don't want to add more layer on top of Cassandra. I also have done all of my application without Countandra and I would like to continue this way.
Furthermore there is a Cassandra modeling problem that I would like to solve, and not just hide.
2012/1/18 Lucas de Souza Santos <luca...@gmail.com>
|Re: How to store unique visitors in cassandra||aaron morton||1/19/12 2:31 AM|
Some tips here from Matt Dennis on how to model time series data
|Re: How to store unique visitors in cassandra||Alain RODRIGUEZ||1/19/12 6:25 AM|
Thanks aaron, I already paid attention to these slides and I just looked at them again.
I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently.
I could easily count them per hour, day, week, month... But it's a bit harder to give this statistic between 2 unknown dates as explained at the start of this thread.
Am I missing any clue in these slides ?
2012/1/19 aaron morton <aa...@thelastpickle.com>
|Re: How to store unique visitors in cassandra||Tyler Hobbs||1/19/12 1:05 PM|
Sometimes you will be fetching slices of multiple rows.
Basically, here's the procedure, given a start time t1 and and end time t2:
1. Determine all buckets (row keys) that hold data between t1 and t2. Usually this means finding the bucket that t1 falls in, the bucket that t2 falls in, and then all buckets inbetween.
2. Use t1 as the column slice start, t2 as the column slice end, and multiget all of the buckets that you just calculated.
3. Merge the results by concatenating the rows in order.
Note that the only rows where you will end up getting a partial slice are the first and last row. For all of the rows inbetween, you will end up fetching the entire row. This is fine, because t1 will be less than all of the columns in those rows, and t2 will be greater than all of the columns in those rows.
|Re: How to store unique visitors in cassandra||Milind Parikh||1/19/12 1:21 PM|
You might want to look at the code in countandra.org; regardless of whether you use it. It use a model of dynamic composite keys (although static composite keys would have worked as well). For the actual query,only one row is hit. This of course only works bc the data model is attuned for the query.
On Jan 19, 2012 1:31 AM, "Alain RODRIGUEZ" <arod...@gmail.com> wrote: