That said, if you're not able to provision a larger Redis, it may be
easier to keep your uniques on a daily basis in Redis, then blasting
them out to a relational database, indexed by uid, then performing a
'SELECT count(unique uid) FROM uniques' (assuming that the table only
contains the data relevant to your time range). That could be trimmed
down to hourly, or whatever reasonable resolution, then calculated and
cached offline. Sadly, it's not as clean as a Redis-only solution,
but most RDBs would be able to handle that query pretty well (it's a
scan of the index, which doesn't require creating a hash/btree for
counting uniqueness).
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
Are these sets necessary? Do you need to aggregate these results by
ad? Do you need to aggregate these results by location? Do you need
to aggregate results by (ad,location)? Because of the way you are
storing your data, I would imagine the answer for all of these is yes.
If that is the case, and you are trying to explore all of the things
that you can do right now with what you have, I would bet that you can
aggregate over everything for 30 days, aggregate over ads for 30 days,
and aggregate over location within your existing memory limits.
The aggregation keys for all/ad/location aggregation are obvious.
In terms of being able to count uniques on a daily/monthly basis for
(ad,location) pairs, that's something that could/should be sharded
based on $/gig for memory (High Memory Extra Large instances at Amazon
are the current winner here, and come with 17 gigs), and/or
pre-sharded out to logfiles by (ad,location,day) for fast low-memory
processing (Syslog-ng and/or Flume are good on the logging side of
things, with hooks for automatically counting the uniques for the
largest output files first), and/or stuffed into a database (covering
indexes would work great here).
As another idea to try out, if you are okay with having a secondary
lookup database, hashing your ~74-byte keys down to even 8 bytes may
help reduce memory use depending on your number of keys, but as your
system grows, that (ad,location) breakdown is going to grow very
quickly.
Regards,
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>