my prometheus crashed and keep restarting all the time

郑恺

unread,

Jun 20, 2017, 4:38:25 AM6/20/17

to Prometheus Users

here is the config

"cpus": 0.25,
"mem": 1024,
"disk": 0,
"instances": 1,

"args": [
  "-storage.local.memory-chunks=256000",
  "-storage.local.max-chunks-to-persist=128000",
  "-config.file=/etc/prometheus/prometheus.yml",
  "-log.level=debug",
  "-alertmanager.url=http://XXXXXXX:9093/",
  "-web.external-url=http://XXXXXXX:30090/"
]

prometheus version 1.5.2

log details:

time="2017-06-20T08:26:07Z" level=info msg="Check for series without series file complete." source="crashrecovery.go:130"

time="2017-06-20T08:26:07Z" level=info msg="Cleaning up archive indexes." source="crashrecovery.go:402"

time="2017-06-20T08:26:07Z" level=info msg="Clean-up of archive indexes complete." source="crashrecovery.go:493"

time="2017-06-20T08:26:07Z" level=info msg="Rebuilding label indexes." source="crashrecovery.go:501"

time="2017-06-20T08:26:07Z" level=info msg="Indexing metrics in memory." source="crashrecovery.go:502"

time="2017-06-20T08:26:07Z" level=info msg="10000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="20000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="30000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="40000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="50000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="60000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="70000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="80000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="90000 metrics queued for indexing." source="crashrecovery.go:507"

time="2017-06-20T08:26:07Z" level=info msg="Indexing archived metrics." source="crashrecovery.go:510"

time="2017-06-20T08:26:07Z" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:529"

time="2017-06-20T08:26:07Z" level=info msg="Checkpointing fingerprint mappings..." source="persistence.go:1480"

time="2017-06-20T08:26:07Z" level=info msg="Done checkpointing fingerprint mappings in 1.395266ms." source="persistence.go:1503"

time="2017-06-20T08:26:07Z" level=warning msg="Crash recovery complete." source="crashrecovery.go:152"

time="2017-06-20T08:26:07Z" level=info msg="93337 series loaded." source="storage.go:378"

time="2017-06-20T08:26:07Z" level=info msg="Listening on :9090" source="web.go:259"

time="2017-06-20T08:26:07Z" level=info msg="Starting target manager..." source="targetmanager.go:61"

time="2017-06-20T08:26:08Z" level=warning msg="Storage has entered rushed mode." chunksToPersist=126367 maxChunksToPersist=128000 maxMemoryChunks=256000 memoryChunks=173663 source="storage.go:1660" urgencyScore=0.9872421875

time="2017-06-20T08:26:28Z" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=128037 maxChunksToPersist=128000 maxToleratedMemChunks=281600 memoryChunks=177136 source="storage.go:927"

Ben Kochie

unread,

Jun 20, 2017, 5:03:49 AM6/20/17

to 郑恺, Prometheus Users

Your Prometheus server is entering rushed mode, probably because you have not allocated enough memory. You have 90k series, but only 256k chunks in memory. I would recommend increasing your memory allocation by 2x in order to have a safe amount of memory for your server. Of course, also increase the memory-chunks related flags to 2x as well.

You may want to consider upgrading to 1.7.x, there is improved memory handling, where you set a target heap size, simplifying allocation.

See: https://groups.google.com/forum/#!topic/prometheus-users/As-edxwxw38

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/996a75d5-4c02-4d66-b54b-aa82779e45e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

h...@post-quantum.com

unread,

Jul 18, 2017, 3:43:56 AM7/18/17

to Prometheus Users

Hi, quick note - I've experienced similar issues and have been able to trace the issues back to the external URL setting - not sure about the exact cause / fix yet.

https://github.com/coreos/prometheus-operator/issues/407

Reply all

Reply to author

Forward