Efficiency in querying? Or efficiency in data storage?
Either way, the difference is relatively minimal. My suggestion is to
explicitly keep hourly, daily, weekly, and monthly ZSETs that are kept
updated in real time. You can set expirations of 48 (or 72) hours on
your hourly ZSETs. Why? Because with a pipelined request, adding to 4
ZSETs is not significantly slower than adding to 1 ZSET. The only
thing you need to be careful about is not writing to your hourly ZSETs
when they are more than 48 hours old.
The only question you really have to answer is, do you mean "calendar
week" and "calendar month", or do you mean "weeks since epoch" and
"30-day months since epoch", because that will determine how
pre-aggregate your data. For the weeks since epoch and 30-day months
since epoch, I've done the following quite a few times (in Python):
AGGS = [
('month', 30*86400),
('week', 7*86400),
('day', 86400),
('hour', 3600),
]
CUTOFF = 2*86400
def update_aggregates(conn, entities, timestamp):
when = []
for label, duration in AGGS:
when.append("%s:%s"%(label, int(timestamp / duration)))
if time.time() - timestamp > CUTOFF:
when.pop()
pipe = conn.pipeline(False)
for agg in when:
for entity in entities:
pipe.zincrby(agg, entity, 1)
if len(when) == 4:
pipe.expire(when[-1], CUTOFF)
pipe.execute()
> Is there an upper bound of arguments that can be fed to the ZUNIONSTORE
> command ?
I've not had any issues feeding a few hundred ZSETs into the
ZUNIONSTORE command. That said, I wouldn't use it in this scenario
unless absolutely necessary. Depending on how many named entities you
have (1k? 10k? 100k?), you could find yourself waiting for a while for
Redis to calculate the result while your writing of named entities
gets backed up. You will want to do some testing, and perhaps write to
the master with queries against a slave (remember to turn off
read-only mode on the slaves if you are using Redis >= 2.6).
Regards,
- Josiah