Hi,
Basically, we have large networking setup with 10k devices. we are hitting 1M metrics every second from 20 % of devices itself, so we have 5 prom instances and one global proemtheus which uses remote read to handle alert rule evaluations and thanos querier for visualisation on grafana.
We have segregated devices with specific device ip ranges to each Prometheus instances.
So, we have one aggregator which is using remote read from all the individual prom instances through remote read
1. will the remote read cause an issue w.r.t loading the large time series over wire every 1 min ?
2. Is it CPU or memory intensive ?
What is best design strategy to handle these scale and alerting across the devices or metrics ?
Regards,
Rajesh