My preferred approach is to store large data in a object or blob store,
like S3, which is usually an easy HTTP PUT or similar, and then send
the URL only via message queue. This gives you the best of both worlds.
In the past, we stored large files in the file system. This also works
very well ;-).
That said, there's nothing inherently wrong with sending "large data"
over AMQP, and no need to add an additional system like Kafka. You
just need to bear a few things in mind.
https://github.com/rabbitmq/rabbitmq-server/pull/1812
in deps/rabbit_common/include/rabbit.hrl#L256
-define(MAX_MSG_SIZE, 536870912).
- max message size is limited to (currently) 0,5GiB and defaults to
~ 128MiB
- very large files can be split into chunks and sent individually
- you can set queue length and size restrictions to avoid blowing
your system up, but ensure producers use publisher confirms
- use a lazy queue, the data will be stored on disk, not memory
- use a dedicated queue just for these messages, or even a dedicated
rabbitmq instance
- don't use clustering or mirrored queues as the large data needs to
be copied everywhere which is very inefficient and expensive, and
blocks up the distribution channels
If your volume of messages is both low, and infrequent, you will
get away with this just fine, without adding another tech stack.
A+
Dave