Hi All,
I am reaching out to gather some quantitative insights and experiences regarding the scalability of single prometheus instance. I understand that performance and scalability can vary significantly based on different aspects of infrastructure like whether the backend storage is local disk of NFS, network bandwidth , number of targets and metrics per targets, scrape interval etc.
For the following queries you can assume ideal conditions I.e. Optimal hardware , Maximum network bandwidth , Local disk ( say SSD) Or
if you can share the information on the hardware and the performance metrics , that would help .
Here are the questions:
Q1. What is the practical limit on the number of active series which prometheus can handle or,
What is the maximum number of active series to which prometheus can scale?
Q2: What are the practical limits for storage and data retention in a single instance.
Q3: What is the highest number of targets and total metrics per target that can be efficiently scraped by single instance?
Q4: How does query performance ( latency) scale with increasing number of metrics and targets?
Any shared experiences , benchmarks or references to relevant documentation would help .
Thank you
Regards,