Hello Pedro, Sripathi,
thank you both for explaining use cases and patterns. As a community
we need to develop a "culture" around the new tool we have, Streams in
that case, so this process is very useful.
Pedro is perfectly correct that in his use case, the messages would be
processed out of order using a single key, because indeed a single
stream with an associated consumer group only cares about load
balancing efficiently.
It's useless to repeat that, but I just want to stress how much
superior is this approach (later on the solutions about ordering): not
only single consumers can fail (this is a huge advantage IMHO) and the
system is still able to recover, but this is also a special case of
the general case of consumers processing at a difference pace.
I also completely agree with Sripathi about the fact that partitions
in Kafka are a lot more like just different keys in Redis Streams,
even if, to be honest, in that case the user is left handling things a
bit more manually. The consumer groups as a distribution mechanism is
*very* server-side. While orchestrating the partitioning manually
requires the logic to be at least partially on the client.
However note how this design is somewhat imposed by the nature of
Redis: anyway at some point you want to partition to N keys because
otherwise you are hitting always the same server, which is definitely
a problem when it comes to scalability.
However Sripathi here introduces a concept that I used in the past but
I never figured it could be useful in this context, which is "pre
sharding". This looks like an interesting idea for real world designs
and operations. Btw listening to a few tens of keys per client is
perfectly supported and is not going to be a CPU issue. The work Redis
needs to do when serving a consumer is not *directly* related to the
number of streams a given consumer is monitoring. However an O(N)
(very small in constant times) cost will be payed every time we
re-block for N keys, and unblock back, but while we are blocked the
cost is the same as blocking for a single key from the POV of the
producer. In general, it is possible to block for 30, 40 or 100 keys
without issues, unless there are 100k consumers connected in the same
instance...
About the question asked by Sripathi: yes, within the same stream,
there is the guarantee of ordering. In case of reboots or failover
however, while Redis propagates everything to replicas and AOF as it
happens, all the guarantees are limited to the ones that Redis can
provide from the POV of the data consistency. So for instance if we
are using just RDB files and we reboot the instance, the consumer
group state may be stale.
I've the feeling that this discussion should be reprocessed to be
inserted inside the documentation, specifically in the Streams
tutorial. I'll try to do that.
Thank you for this thread,
Salvatore