Prometheus using AWS Timestream

695 views
Skip to first unread message

ellis...@googlemail.com

unread,
Nov 24, 2020, 4:53:43 AM11/24/20
to Prometheus Users
Hi all, 

Is anyone using Prometheus in AWS to monitor and if so have you thought about using Timestream as a remote storage solution?

Stuart Clark

unread,
Nov 24, 2020, 7:47:12 AM11/24/20
to ellis...@googlemail.com, Prometheus Users
I can see that there is a remote write adapter available at
https://github.com/dpattmann/prometheus-timestream-adapter but is anyone
aware of a remote read adapter?

ellis...@googlemail.com

unread,
Nov 24, 2020, 9:06:53 AM11/24/20
to Prometheus Users
the guy that wrote the adapter suggests that a Grafana plugin would be used to read the information from Timestream in AWS. 

Stuart Clark

unread,
Nov 24, 2020, 11:40:03 AM11/24/20
to ellis...@googlemail.com, Prometheus Users
On 24/11/2020 14:06, 'ellis...@googlemail.com' via Prometheus Users wrote:
> the guy that wrote the adapter suggests that a Grafana plugin would be
> used to read the information from Timestream in AWS.

Yes, but that doesn't help for alerting, recording rules, etc. which are
in Prometheus.

Ryan Booz

unread,
Nov 25, 2020, 11:27:08 AM11/25/20
to Prometheus Users
As the makers of Promscale, we're very attuned to the needs of effective Prometheus deployments. With that in mind, one thing to consider with Timestream is that ingest performance from a single client seems to be a current limitation. The creator of this adaptor doesn't mention his his setup or how many metrics he was trying to ingest per minute or second.

In recent benchmarks by CrateDB (https://crate.io/a/amazon-timestream-first-impressions/) and Timescale (not yet published), it appears that Timestream only achieves a consistent ingest rate in the range of 500-800 metrics/second from a single client, especially when lots of attributes are involved.. Higher throughput is achieved using a streaming service (ie. Kinesis) or adding more clients. In our tests using the open-source Time-series Benchmarking Suite (https://github.com/timescale/tsbs), we ended up using 10 EC2 clients to import data for about 36 hours and were only able to achieve (effectively) 5,000 metrics/sec, meaning clients averaged ~550 metrics/sec. So, definitely test your throughput and make sure the system can keep up with ingesting data.

Also, while storage is really cheap, queries are billed based on the number of GB scanned, so your bill is likely to grow over time unless you're really efficient with removing older data.

Stuart Clark

unread,
Nov 25, 2020, 11:36:36 AM11/25/20
to Ryan Booz, Prometheus Users
On 25/11/2020 16:27, Ryan Booz wrote:
> As the makers of Promscale, we're very attuned to the needs of
> effective Prometheus deployments. With that in mind, one thing to
> consider with Timestream is that ingest performance from a single
> client seems to be a current limitation. The creator of this adaptor
> doesn't mention his his setup or how many metrics he was trying to
> ingest per minute or second.
>
> In recent benchmarks by CrateDB
> (https://crate.io/a/amazon-timestream-first-impressions/) and
> Timescale (not yet published), it appears that Timestream only
> achieves a consistent ingest rate in the range of 500-800
> metrics/second from a single client, especially when lots of
> attributes are involved.. Higher throughput is achieved using a
> streaming service (ie. Kinesis) or adding more clients. In our tests
> using the open-source Time-series Benchmarking Suite
> (https://github.com/timescale/tsbs), we ended up using 10 EC2 clients
> to import data for about 36 hours and were only able to achieve
> (effectively) 5,000 metrics/sec, meaning clients averaged ~550
> metrics/sec. So, definitely test your throughput and make sure the
> system can keep up with ingesting data.

Was that metrics per second or time series per second? If metrics, how
many labels were there & how many time series did that equate to?

Ryan Booz

unread,
Nov 25, 2020, 12:04:52 PM11/25/20
to Prometheus Users
I can't speak to CrateDB's tests, but the article I linked to said it took them 3 days to load ~3 billion metrics using 20 clients, which is on-par with our findings too.

For our tests, we used TSBS and used the "cpu-only" use case to simulate 100 hosts. That test creates 1,000 time-series across 10 metrics every 10 seconds.

Ryan Booz

unread,
Nov 25, 2020, 12:07:47 PM11/25/20
to Prometheus Users
Sorry - just realized I mistyped that second sentence in the midst of trying to spell it out and crossed terminology.  It should have read:

For our tests, we used TSBS and used the "cpu-only" use case to simulate 100 hosts. That test creates 10 CPU time-series for 100 hosts, every 10 seconds - essentially 1,000 samples every 10 seconds (6,000/minute).

ellis...@googlemail.com

unread,
Nov 26, 2020, 2:28:40 AM11/26/20
to Prometheus Users
I agree. I'm not too concerned about the alerting side of things as this is covered by Grafana, the recording rules help filter out the noise where possible. I've had a bash at altering the dashboard in Prometheus but it isn't that user friendly to configure, hence the swap over to Grafana.

ellis...@googlemail.com

unread,
Nov 26, 2020, 2:49:39 AM11/26/20
to Prometheus Users
and thanks Ryan. I am just in the process of building the adapter into a container at the moment and haven't tested throughput. Good to know.
Reply all
Reply to author
Forward
0 new messages