Which to use for scraping pipeline: NATS or STAN?

177 views
Skip to first unread message

Joshua Gardner

unread,
Jan 26, 2021, 6:53:26 PM1/26/21
to nat...@googlegroups.com
I just read this pros/cons list of NATS vs STAN semantics. [0]

I'm not sure which approach best applies to my application, which is a
web-scraper (I know, I know, 😢) implemented similarly to an ETL
pipeline. I'm not even sold 100% on NATS. I need observability of each
stage in the pipeline, and while replay would be nice a lot of the data
in question is too large to fit in a 1MB message so will end up in
bucket storage anyway. Reply/request semantics at the edge might be
useful.

I was planning to use AWS EventBridge bound to SQS for each stage in the
pipeline, also binding to Kinesis Firehose then storing raw data in an
S3 bucket for long term analysis. I'm pivoting away from an AWS-specific
product to Kubernetes-native. I think I can have a similar experience
with either one of NATS or STAN seeing as it's a Pub-Sub system but can
be treated with queue semantics too.

I still want to know if anyone has experience or opinions as to whether
plain NATS or Streaming is better for initial development.

[0] https://docs.nats.io/developing-with-nats-streaming/streaming

Colin Sullivan

unread,
Jan 26, 2021, 11:35:03 PM1/26/21
to nats
Thanks for looking at NATS!

We recommend avoiding persistence when possible - highly resilient and scalable systems can be built without it.  As you progress you'll soon know whether or not you'll need persistence, and can decide from there.

I highly recommend looking into NATS Jetstream.  It will be GA early next months with go and java client support, shortly followed by node, java, and others.  It's built into core NATS so there's easy deployment, and we'll have a k8s deployment available.  NATS streaming will be supported for quite awhile with bug / security fixes, but new development will occur in Jetstream moving forward.

Reply all
Reply to author
Forward
0 new messages