we are facing OOM issue , once after upgrading prometheus to 2.18.0.
All of sudden , we are seeing spike in memory and reaching more than limit whatever we have specified.
Not seeing any error in the logs , other than restart logs.
What could be the reason or how to debug this issue? Please help us here.
EKS version -- 1.14
Prometheus version -- 2.18.0
Below are the logs :
level=info ts=2020-10-27T06:31:17.396Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.0, branch=HEAD, revision=a12e96299dcd159ea09b260f1a21e7e4b86e011d)"
level=info ts=2020-10-27T06:31:17.396Z caller=main.go:338 build_context="(go=go1.14.2, user=root@7fbcff55abdb, date=20200505-14:26:04)"
level=info ts=2020-10-27T06:31:17.396Z caller=main.go:339 host_details="(Linux 4.14.181-140.257.amzn2.x86_64 #1 SMP Wed May 27 02:17:36 UTC 2020 x86_64 prometheus-prod-prometheus-server-5688bd7769-xrsrt (none))"
level=info ts=2020-10-27T06:31:17.397Z caller=main.go:340 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-10-27T06:31:17.397Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-10-27T06:31:17.398Z caller=query_logger.go:79 component=activeQueryTracker
level=info ts=2020-10-27T06:31:17.399Z caller=main.go:677 msg="Starting TSDB ..."
level=info ts=2020-10-27T06:31:17.399Z caller=web.go:523 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-10-27T06:31:17.402Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1588053600000 maxt=1588636800000 ulid=01E7H5KPWT66H8PK549SPHDMJ7
level=info ts=2020-10-27T06:31:17.403Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1588636800000 maxt=1589220000000 ulid=01E82HS5EJ9WW34XGN83WPEQRX
level=info ts=2020-10-27T06:31:17.404Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1589220000000 maxt=1589803200000 ulid=01E8KXZBSMVA7D592S051QNQH6
level=info ts=2020-10-27T06:31:17.406Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1589803200000 maxt=1590386400000 ulid=01E95A5E1Q8KCQKPG0ZKB1QRP6
level=info ts=2020-10-27T06:31:17.407Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1590386400000 maxt=1590969600000 ulid=01E9PPBGNYR73Z97V1V38B7SV5
level=info ts=2020-10-27T06:31:17.408Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1590969600000 maxt=1591552800000 ulid=01EA82H5042JTYAB8Z24RY2G9E
level=info ts=2020-10-27T06:31:17.409Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591552800000 maxt=1592136000000 ulid=01EASEQ60C4J84ZEY9R1QSASKS
level=info ts=2020-10-27T06:31:17.410Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1592136000000 maxt=1592719200000 ulid=01EBATWN0KEX63VMFH12XGV83M
level=info ts=2020-10-27T06:31:17.411Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1592719200000 maxt=1593302400000 ulid=01EBWE35327YZVVB1KGTJ9NPJX
level=info ts=2020-10-27T06:31:17.412Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1593302400000 maxt=1593885600000 ulid=01ECDT970HG6WBZTQVKJFKPRAE
level=info ts=2020-10-27T06:31:17.414Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1593885600000 maxt=1594468800000 ulid=01ECZ6GREX8ZPWJB00K1BJ74BX
level=info ts=2020-10-27T06:31:17.415Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1594468800000 maxt=1595052000000 ulid=01EDGJSABWS48E4GWZ5WBVHAQM
level=info ts=2020-10-27T06:31:17.416Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=159505