We have a use case for which a stream sounds like the right solution (e.g. based on feedback in
this thread), but the strictly non-destructive semantics of streams is an issue for us. Here's the scenario...
Customers
of our platform send in bursty REST requests that are published to RMQ,
and we have two consumers: one that processes and loads the request
into our Postgres database and one that sends it to elasticsearch. We
need REST requests of various types to be reliably processed in the
order they were sent by the customer, and we are relying on a single
queue per consumer or a single stream to enable us to serialize the
requests for processing. We need to consume every message that is
published to the queue -- i.e. we don't want to drop messages if the
queue gets too large. We need durability and redundancy (quorum).
We
could find ourselves with backlogs of 10M+ messages at a time in both
queues / the stream when the processing and loading for Postgres is
slower than bursty REST requests. We expect the consumers to catch up
when the REST requests slow (and the consumers will be optimized to
prefetch and group consecutive messages of the same type for bulk
upsert).
Our use case appears to fit
streams well EXCEPT for the fact that the only way things get deleted
from the stream is if the stream exceeds size or age limits. We will set
a high threshold on both of these things that we don't expect to exceed
because we should not drop these messages (TBs of disk if needed).
However, that means that a stream would ALWAYS take up that much disk
space after it initially reaches it, and that could get expensive.
What
we want is to be able to mark messages in a stream as "fully consumed"
and have RMQ delete from disk chunks when all the messages in the chunk
are fully consumed (as opposed to deleting them based on queue size or
age limits). This way we would have stream scale and semantics but
without a piece of infrastructure that permanently takes up TBs of disk
space in multiple data centers.
Is this
something that you all have considered? Is there a way to get that
behavior with the core or plug-in stream implementations (we're in Python)? Thanks!