You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
Hi,
I'm trying to get a sense of what is "normal" behavior on Prometheus 2.0 with respect to startup time. I've set up an experiment with a large number of metrics on pretty beefy machines, and I'm seeing that Prometheus takes fairly long (15-30 minutes, depending on the machine specs) to fully restart (where "TSDB started" appears in the logs and queries are accepted) with the existing data set. Here are some details:
I've done similar experiments with Prometheus 1.8, with 1.5M metrics generated over the course of several days, and never saw the startup time get higher than a couple minutes. I realize that Prometheus 2.0 loads more metrics in memory in order to be less disk-intensive, which probably account for this. But have others been seeing similar results, and is this expected? I am trying to gauge whether to move to Prometheus 2.0, and whether this behavior will be acceptable: tolerating that kind of downtime whenever I need to restart Prometheus.
Thanks in advance for any help on this,
Dan
aaro...@gmail.com
unread,
Nov 27, 2017, 3:52:59 PM11/27/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
I bumped into similar issue while trying the Prometheus 2.0. In general, I like the 2.0 features but hesitate to upgrade my current system because the restart time is longer than usual and it scares me off. Is the long start time is expected behavior?
Dan Simone
unread,
Nov 28, 2017, 1:32:59 PM11/28/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
Some addition data here, on the non-NVMe machine:
Starting from a Prometheus data directory with no /wal, but only blocks:
* 3GB, 1 block - 8 seconds to load
* 10.7GB, 2 blocks - 55 seconds to load
* 18.5GB, 3 blocks - 109 seconds to load
* 41.5GB, 4 blocks - 174 seconds to load
* 94.5GB, 7 blocks - 220 seconds to load
When a 90GB /wal directory is present, it the last experiment above takes 1260 seconds instead of 220. So the bulk of the startup time appears to be dealing with /wal.
isha girdhar
unread,
Nov 29, 2017, 3:59:06 AM11/29/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
We are facing the similar thing, We have already moved to Prometheus 2.0 but restart time is longer than 1.8 for sure. Not sure if that's expected behaviour.