Prometheus reports Out of memory while loading segment

643 views
Skip to first unread message

Naveen Badam

unread,
Jun 1, 2021, 12:10:07 PM6/1/21
to Prometheus Users
Hi Team,

I found prometheus started reporting "fatal error: runtime: out of memory" recently.

the only change i did was, we have added few bunch of new scrappers to prometheus, probably that is causing issues loading the segment data?

below is the error am getting while prometheus startup:

Appreciate your help on fixing this issue.


level=info ts=2021-06-01T15:55:00.619Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=17932 maxSegment=17936
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x29cfda8, 0x16)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sysMap(0xc338000000, 0x4000000, 0x4201698)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc6
runtime.(*mheap).sysAlloc(0x41e5d40, 0x400000, 0x7fffffffffff, 0x47191c)
        /usr/local/go/src/runtime/malloc.go:727 +0x1e5
runtime.(*mheap).grow(0x41e5d40, 0x1, 0x0)
        /usr/local/go/src/runtime/mheap.go:1344 +0x85
runtime.(*mheap).allocSpan(0x41e5d40, 0x1, 0xc2c0141e00, 0x42016a8, 0x7f231a7ae438)
        /usr/local/go/src/runtime/mheap.go:1160 +0x6b6
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:907 +0x65
runtime.systemstack(0xc000da38f0)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1116

goroutine 137 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc000bf5090 sp=0xc000bf5088 pc=0x46e260
runtime.(*mheap).alloc(0x41e5d40, 0x1, 0x11e, 0x0)
        /usr/local/go/src/runtime/mheap.go:901 +0x85 fp=0xc000bf50e0 sp=0xc000bf5090 pc=0x427b85
runtime.(*mcentral).grow(0x41f7ff8, 0x0)
        /usr/local/go/src/runtime/mcentral.go:506 +0x7a fp=0xc000bf5128 sp=0xc000bf50e0 pc=0x418d7a
runtime.(*mcentral).cacheSpan(0x41f7ff8, 0x7f231a7ae4c0)
        /usr/local/go/src/runtime/mcentral.go:177 +0x3e5 fp=0xc000bf51a0 sp=0xc000bf5128 pc=0x418b05
runtime.(*mcache).refill(0x7f449c478e98, 0x1e)
        /usr/local/go/src/runtime/mcache.go:142 +0xa5 fp=0xc000bf51c0 sp=0xc000bf51a0 pc=0x4184a5
runtime.(*mcache).nextFree(0x7f449c478e98, 0x1e, 0x28, 0x26bfd60, 0x0)
        /usr/local/go/src/runtime/malloc.go:880 +0x8d fp=0xc000bf51f8 sp=0xc000bf51c0 pc=0x40d44d
runtime.mallocgc(0xe0, 0x28fa440, 0x1, 0xc337dfe600)
        /usr/local/go/src/runtime/malloc.go:1061 +0x834 fp=0xc000bf5298 sp=0xc000bf51f8 pc=0x40de34
runtime.newobject(0x28fa440, 0xc337dfe600)
        /usr/local/go/src/runtime/malloc.go:1195 +0x38 fp=0xc000bf52c8 sp=0xc000bf5298 pc=0x40e2d8
github.com/prometheus/prometheus/tsdb.newMemSeries(...)
        /app/tsdb/head.go:1931
github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID(0xc000a7a000, 0x4e10079, 0x86070cba00fdc483, 0xc337df2540, 0x7, 0x7, 0x0, 0x500, 0x0, 0x0)
        /app/tsdb/head.go:1681 +0xef fp=0xc000bf5370 sp=0xc000bf52c8 pc=0xbe5ccf
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL(0xc000a7a000, 0xc3291da000, 0xc000bf5a80, 0xc0024f3110, 0x0, 0x0)
        /app/tsdb/head.go:523 +0xdfa fp=0xc000bf56a8 sp=0xc000bf5370 pc=0xbdc81a
github.com/prometheus/prometheus/tsdb.(*Head).Init(0xc000a7a000, 0x179c7040500, 0x0, 0x0)
        /app/tsdb/head.go:707 +0x9c5 fp=0xc000bf5b18 sp=0xc000bf56a8 pc=0xbddc05
github.com/prometheus/prometheus/tsdb.open(0x7ffcc4bcfcd8, 0xb, 0x30867c0, 0xc000da39b0, 0x30c18c0, 0xc00004c780, 0xc000d90f50, 0xc000c6d040, 0x4, 0xa, ...)
        /app/tsdb/db.go:660 +0x6fa fp=0xc000bf5d10 sp=0xc000bf5b18 pc=0xbce63a
github.com/prometheus/prometheus/tsdb.Open(0x7ffcc4bcfcd8, 0xb, 0x30867c0, 0xc000da39b0, 0x30c18c0, 0xc00004c780, 0xc000d90f50, 0xc000dafe48, 0x7f67a2, 0xc0005af8c0)
        /app/tsdb/db.go:530 +0xbc fp=0xc000bf5d88 sp=0xc000bf5d10 pc=0xbcdddc
main.openDBWithMetrics(0x7ffcc4bcfcd8, 0xb, 0x30867c0, 0xc000c8e360, 0x30c18c0, 0xc00004c780, 0xc000d90f50, 0x0, 0x0, 0x0)
        /app/cmd/prometheus/main.go:805 +0x10d fp=0xc000bf5e58 sp=0xc000bf5d88 pc=0x21cdf2d
main.main.func20(0x0, 0x0)
        /app/cmd/prometheus/main.go:718 +0x1ff fp=0xc000bf5f98 sp=0xc000bf5e58 pc=0x21d2b3f
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000d94700, 0xc000da38f0)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27 fp=0xc000bf5fc8 sp=0xc000bf5f98 pc=0x9102e7
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000bf5fd0 sp=0xc000bf5fc8 pc=0x46fea1
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 1 [chan receive, 1 minutes]:
github.com/oklog/run.(*Group).Run(0xc000bf1b60, 0xc000da1030, 0x8)
        /app/vendor/github.com/oklog/run/group.go:43 +0xed
main.main()
        /app/cmd/prometheus/main.go:797 +0x71c8

goroutine 47 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:147 +0x9d
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
        /usr/local/go/src/os/signal/signal.go:150 +0x45

goroutine 83 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000268300)
        /app/vendor/go.opencensus.io/stats/view/worker.go:276 +0x105
created by go.opencensus.io/stats/view.init.0
        /app/vendor/go.opencensus.io/stats/view/worker.go:34 +0x68

goroutine 89 [select]:
github.com/prometheus/prometheus/pkg/logging.(*Deduper).run(0xc000535f00)
        /app/pkg/logging/dedupe.go:75 +0x1e5
created by github.com/prometheus/prometheus/pkg/logging.Dedupe
        /app/pkg/logging/dedupe.go:61 +0xcf

goroutine 96 [chan receive]:
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).run(0xc000cee6c0)
        /app/storage/remote/write.go:93 +0xb6
created by github.com/prometheus/prometheus/storage/remote.NewWriteStorage
        /app/storage/remote/write.go:86 +0x2d8

goroutine 130 [select, 1 minutes]:
main.main.func6(0xc0000657b8, 0x8fc209)
        /app/cmd/prometheus/main.go:560 +0xff
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc0000a3080, 0xc0004d2db0)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 131 [chan receive, 1 minutes]:
github.com/prometheus/prometheus/discovery.(*Manager).Run(0xc0000b8f00, 0x0, 0x0)
        /app/discovery/manager.go:142 +0x74
main.main.func8(0x0, 0x0)
        /app/cmd/prometheus/main.go:580 +0x4e
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc0003e9860, 0xc0003e9880)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 132 [chan receive, 1 minutes]:
github.com/prometheus/prometheus/discovery.(*Manager).Run(0xc0000b9040, 0x0, 0x0)
        /app/discovery/manager.go:142 +0x74
main.main.func10(0x0, 0x0)
        /app/cmd/prometheus/main.go:594 +0x4e
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc0003e98e0, 0xc0003e9920)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 133 [chan receive, 1 minutes]:
main.main.func12(0x0, 0x0)
        /app/cmd/prometheus/main.go:612 +0x6c
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc00037bf80, 0xc0003e9940)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 134 [chan receive, 1 minutes]:
main.main.func14(0x0, 0x0)
        /app/cmd/prometheus/main.go:636 +0xa5
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000da62a0, 0xc000da0ff0)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 135 [select, 1 minutes]:
main.main.func16(0x0, 0x0)
        /app/cmd/prometheus/main.go:669 +0x105
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000da6300, 0xc000da1000)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 136 [chan receive, 1 minutes]:
main.main.func18(0x0, 0x0)
        /app/cmd/prometheus/main.go:697 +0x3f
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000da4a20, 0xc000da1010)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 138 [select, 1 minutes]:
github.com/prometheus/prometheus/web.(*Handler).Run(0xc000ced680, 0x30d8180, 0xc0004660c0, 0x0, 0x0)
        /app/web/web.go:560 +0xbc5
main.main.func22(0x0, 0x0)
        /app/cmd/prometheus/main.go:765 +0x45
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000da4a40, 0xc000da1020)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 139 [chan receive, 1 minutes]:
main.main.func24(0x0, 0x0)
        /app/cmd/prometheus/main.go:786 +0x6c
github.com/oklog/run.(*Group).Run.func1(0xc000da6360, 0xc000da3920, 0xc000da1030)
        /app/vendor/github.com/oklog/run/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /app/vendor/github.com/oklog/run/group.go:37 +0xbb

goroutine 140 [select]:
github.com/prometheus/prometheus/discovery.(*Manager).sender(0xc0000b8f00)
        /app/discovery/manager.go:234 +0x125
created by github.com/prometheus/prometheus/discovery.(*Manager).Run
        /app/discovery/manager.go:141 +0x45

goroutine 141 [select]:
github.com/prometheus/prometheus/discovery.(*Manager).sender(0xc0000b9040)
        /app/discovery/manager.go:234 +0x125
created by github.com/prometheus/prometheus/discovery.(*Manager).Run
        /app/discovery/manager.go:141 +0x45

goroutine 75 [IO wait]:
internal/poll.runtime_pollWait(0x7f4475607ee8, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:220 +0x55
internal/poll.(*pollDesc).wait(0xc00010a218, 0x72, 0x0, 0x0, 0x29afa8f)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc00010a200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:394 +0x1fc
net.(*netFD).accept(0xc00010a200, 0x1, 0xc000103100, 0x100000001)
        /usr/local/go/src/net/fd_unix.go:172 +0x45
net.(*TCPListener).accept(0xc0003e9980, 0xc000aae060, 0xc000ab4cf0, 0x2)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc0003e9980, 0x30a5901, 0x30a5980, 0x13, 0xc328041c50)
        /usr/local/go/src/net/tcpsock.go:261 +0x65
golang.org/x/net/netutil.(*limitListener).Accept(0xc000ab8150, 0x0, 0xc000ab4e00, 0x46d05b, 0x3fd961d4756)
        /app/vendor/golang.org/x/net/netutil/listen.go:48 +0x4e
github.com/mwitkow/go-conntrack.(*connTrackListener).Accept(0xc0003e9a40, 0xc000ab4e78, 0x18, 0xc000001500, 0x7138ac)
        /app/vendor/github.com/mwitkow/go-conntrack/listener_wrapper.go:100 +0x7f
net/http.(*Server).Serve(0xc000aec000, 0x30c1800, 0xc0003e9a40, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2937 +0x266
github.com/prometheus/prometheus/web.(*Handler).Run.func2(0xc000aea1e0, 0xc000aec000, 0x30c1800, 0xc0003e9a40)
        /app/web/web.go:557 +0x3f
created by github.com/prometheus/prometheus/web.(*Handler).Run
        /app/web/web.go:556 +0xaf4

goroutine 156 [select, 1 minutes]:
github.com/prometheus/prometheus/tsdb/wal.(*WAL).run(0xc000cef8c0)
        /app/tsdb/wal/wal.go:322 +0xdb
created by github.com/prometheus/prometheus/tsdb/wal.NewSize
        /app/tsdb/wal/wal.go:291 +0x327

goroutine 336 [chan receive]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc000a7a000, 0x179c7040500, 0xc2c0140a80, 0xc2c0140a20, 0x0)
        /app/tsdb/head.go:357 +0x197
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func5(0xc000a7a000, 0xc32917c868, 0xc32917c870, 0xc2c0140a80, 0xc2c0140a20)
        /app/tsdb/head.go:465 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /app/tsdb/head.go:464 +0x385

goroutine 157 [IO wait]:
internal/poll.runtime_pollWait(0x7f4475607e08, 0x72, 0x308ba60)
        /usr/local/go/src/runtime/netpoll.go:220 +0x55
internal/poll.(*pollDesc).wait(0xc000206018, 0x72, 0x308ba00, 0x412b740, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000206000, 0xc0002e7000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc000206000, 0xc0002e7000, 0x1000, 0x1000, 0x7f4475607e08, 0x320, 0xc000acda00)
        /usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00009a018, 0xc0002e7000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:182 +0x8e
net/http.(*connReader).Read(0xc001d7c030, 0xc0002e7000, 0x1000, 0x1000, 0xc000acdc10, 0x90efda, 0xc001cfc060)
        /usr/local/go/src/net/http/server.go:798 +0x1ad
bufio.(*Reader).fill(0xc000c1a000)
        /usr/local/go/src/bufio/bufio.go:101 +0x105
bufio.(*Reader).Peek(0xc000c1a000, 0x4, 0x615f535941, 0x41cd320, 0x0, 0x0, 0x41cd320)
        /usr/local/go/src/bufio/bufio.go:139 +0x4f
net/http.(*conn).serve(0xc0000b8000, 0x30d8180, 0xc0001bf100)
        /usr/local/go/src/net/http/server.go:1950 +0xa14
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2969 +0x36c

goroutine 158 [IO wait]:
internal/poll.runtime_pollWait(0x7f4475607d28, 0x72, 0x308ba60)
        /usr/local/go/src/runtime/netpoll.go:220 +0x55
internal/poll.(*pollDesc).wait(0xc000206098, 0x72, 0x308ba00, 0x412b740, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000206080, 0xc000b41000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc000206080, 0xc000b41000, 0x1000, 0x1000, 0x7f4475607d28, 0x40, 0xc000079a00)
        /usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00009a118, 0xc000b41000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:182 +0x8e
net/http.(*connReader).Read(0xc001cfc210, 0xc000b41000, 0x1000, 0x1000, 0xc000079c10, 0x90efda, 0xc001cfc180)
        /usr/local/go/src/net/http/server.go:798 +0x1ad
bufio.(*Reader).fill(0xc000a24000)
        /usr/local/go/src/bufio/bufio.go:101 +0x105
bufio.(*Reader).Peek(0xc000a24000, 0x4, 0x5ad40204b4, 0x41cd320, 0x0, 0x0, 0x41cd320)
        /usr/local/go/src/bufio/bufio.go:139 +0x4f
net/http.(*conn).serve(0xc0000b8140, 0x30d8180, 0xc0000a26c0)
        /usr/local/go/src/net/http/server.go:1950 +0xa14
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2969 +0x36c

goroutine 385 [runnable]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc000a7a000, 0x179c7040500, 0xc2c0140b40, 0xc2c0140ae0, 0x0)
        /app/tsdb/head.go:362 +0x271
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func5(0xc000a7a000, 0xc32917c868, 0xc32917c870, 0xc2c0140b40, 0xc2c0140ae0)
        /app/tsdb/head.go:465 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /app/tsdb/head.go:464 +0x385

goroutine 335 [runnable]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc000a7a000, 0x179c7040500, 0xc2c01409c0, 0xc2c0140960, 0x0)
        /app/tsdb/head.go:362 +0x271
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func5(0xc000a7a000, 0xc32917c868, 0xc32917c870, 0xc2c01409c0, 0xc2c0140960)
        /app/tsdb/head.go:465 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /app/tsdb/head.go:464 +0x385

goroutine 386 [runnable]:
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6(0xc2c0140840, 0xc3291da000, 0x41fede8, 0xc3291d4a80, 0xc00ef3a820, 0xc00ef3a840, 0xc3291d4ab0, 0xc3291d4ae0)
        /app/tsdb/head.go:487 +0x752
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /app/tsdb/head.go:471 +0x4eb

goroutine 334 [runnable]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc000a7a000, 0x179c7040500, 0xc2c0140900, 0xc2c01408a0, 0x0)
        /app/tsdb/head.go:362 +0x271
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func5(0xc000a7a000, 0xc32917c868, 0xc32917c870, 0xc2c0140900, 0xc2c01408a0)
        /app/tsdb/head.go:465 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /app/tsdb/head.go:464 +0x385
level=warn ts=2021-06-01T15:55:07.254Z caller=main.go:304 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2021-06-01T15:55:07.254Z caller=main.go:353 msg="Starting Prometheus" version="(version=2.22.1, branch=HEAD, revision=00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)"
level=info ts=2021-06-01T15:55:07.254Z caller=main.go:358 build_context="(go=go1.15.3, user=root@516b109b1732, date=20201105-14:02:25)"
level=info ts=2021-06-01T15:55:07.254Z caller=main.go:359 host_details="(Linux 4.14.231-173.361.amzn2.x86_64 #1 SMP Mon Apr 26 20:57:08 UTC 2021 x86_64 ddd803d69a93 (none))"
level=info ts=2021-06-01T15:55:07.254Z caller=main.go:360 fd_limits="(soft=1024, hard=4096)"
level=info ts=2021-06-01T15:55:07.254Z caller=main.go:361 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-06-01T15:55:07.257Z caller=main.go:712 msg="Starting TSDB ..."
level=info ts=2021-06-01T15:55:07.257Z caller=web.go:516 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-06-01T15:55:07.260Z caller=repair.go:56 component=tsdb msg="Found healthy block" mint=1619935200000 maxt=1620129600000 ulid=01F4VYFJM9Y9QWPHTC2596NSXQ

Stuart Clark

unread,
Jun 1, 2021, 12:15:40 PM6/1/21
to Naveen Badam, Prometheus Users
On 2021-06-01 17:09, Naveen Badam wrote:
> Hi Team,
>
> I found prometheus started reporting "fatal error: runtime: out of
> memory" recently.
>
> the only change i did was, we have added few bunch of new scrappers to
> prometheus, probably that is causing issues loading the segment data?
>
> below is the error am getting while prometheus startup:
>
> Appreciate your help on fixing this issue.
>

If extra targets to scrape is the only recent difference (the other big
user of memory would be queries) then that is likely to be what is
causing you to run out of memory. You either need to increase the amount
of memory available or reduce the number of targets/metrics/time series
being scraped.

--
Stuart Clark

Naveen Badam

unread,
Jun 1, 2021, 7:02:08 PM6/1/21
to Stuart Clark, Prometheus Users
Thanks Stuart.

That would be the obvious thing to do.

Is there any way we can manage the memory segment size limits? I believe below err represents the issue loading specific segment or it exhaust after loading x number of segment since it exceeded the memory limit?


Stuart Clark

unread,
Jun 2, 2021, 2:03:25 AM6/2/21
to Naveen Badam, Prometheus Users
On 02/06/2021 00:01, Naveen Badam wrote:
> Thanks Stuart.
>
> That would be the obvious thing to do.
>
> Is there any way we can manage the memory segment size limits? I
> believe below err represents the issue loading specific segment or it
> exhaust after loading x number of segment since it exceeded the memory
> limit?

Memory usage is controlled via the number of time series being scraped
and the queries performed (there are a couple of settings to reject
large queries), so in this case you could reduce the number of targets
or use the configuration to reject some of the metrics.

--
Stuart Clark

Naveen Badam

unread,
Jun 2, 2021, 10:44:45 AM6/2/21
to Stuart Clark, Prometheus Users
Thanks Stuart. Yeap, I have excluded couple of target scraps now all good.

Just want to understand, is there any way we can determine the limit on number of instances/targets the Prometheus could scrap? Or we just need to gradually add and adjust?

Stuart Clark

unread,
Jun 2, 2021, 10:54:21 AM6/2/21
to Naveen Badam, Prometheus Users
On 02/06/2021 15:44, Naveen Badam wrote:
> Thanks Stuart. Yeap, I have excluded couple of target scraps now all
> good.
>
> Just want to understand, is there any way we can determine the limit
> on number of instances/targets the Prometheus could scrap? Or we just
> need to gradually add and adjust?
It is really based on the number of time series, so 100 targets
ingesting a single metric with no labels would be a lower impact than a
single target which has hundreds of metrics, each with a selection of
labels.

--
Stuart Clark

Naveen Badam

unread,
Jun 2, 2021, 8:03:19 PM6/2/21
to Stuart Clark, Prometheus Users
Cool, thanks! 
Reply all
Reply to author
Forward
0 new messages