Prometheus snapshot restore doesn't show the old metrics

105 views
Skip to first unread message

KARIM MANAOUIL

unread,
Apr 24, 2020, 8:29:11 AM4/24/20
to Prometheus Users
Hi all,

I am doing analysis of a Kubernetes cluster and because I don't have permanent access to it, I thought about this idea of importing Prometheus TSDB and analyze the data using a local Prometheus instance. I succeeded to do that using snapshots as told here. I dumped the snapshot into my local machine, spawned up a Prometheus container setting tsdb.path to the snapshot's path and it worked great, I can see the old data and perform queries. One night after, I woke up, uncompressed the snapshot again and launched a fresh new Prometheus instance but unfortunately now I can't see the old data. I have no idea why.

The documentation says something about heap-size and memory-chunks but that's doesn't work and arguments are invalid when I tried them.

I noticed after the server is launched, the head is garbage collected and maybe this is the reason but honestly I have no idea.

So please is there anyway to correct this issue and be able to use my old restored snapshot ?

Thanks

sayf eddine Hammemi

unread,
Apr 24, 2020, 8:37:24 AM4/24/20
to KARIM MANAOUIL, Prometheus Users
Hello, from the article you linked, older blocks are only snapshotted as hardlinks while open blocks are dumped. Maybe you didn't copy the hardlinks correctly? check the size of your folder

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/de43a1a9-d2d9-4c4a-ae1f-e16850a4da54%40googlegroups.com.

Julius Volz

unread,
Apr 24, 2020, 9:51:43 AM4/24/20
to KARIM MANAOUIL, Prometheus Users
Did you set the retention (--storage.tsdb.retention.time) to long enough on your new Prometheus instance, so that the old data doesn't slide out of the retention window?

On Fri, Apr 24, 2020 at 2:29 PM KARIM MANAOUIL <fk_ma...@esi.dz> wrote:
--
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Julius Volz

unread,
Apr 25, 2020, 5:47:48 AM4/25/20
to KARIM MANAOUIL, Prometheus Users
The log suggests that the one found TSDB block ends at timestamp 1587686596918 (2020-04-24T00:03:16+00:00), while the log timestamps (and thus I guess your query timestamp +%s) are from 2020-04-24T02:45:51.557Z, a couple of hours after the last block. Still, you might expect the WAL to contain the data you're looking for at that timestamp? Are you sure you are not querying for data even outside the WAL range? First, maybe try changing your "query" parameter to "apiserver_request_count[999d]" to see if you get anything back at all when looking further back.

On Sat, Apr 25, 2020 at 11:36 AM KARIM MANAOUIL <fk_ma...@esi.dz> wrote:
Sayf, everything is well copied, the total size uncompressed is around 55MB, the same on the server. Moreover, that shouldn't be a problem cause the whole thing initially worked the night.

Julius, I set retention time to 30 days, still the same. I tricked the system time, I set it to the time of the collection (basically the container gets the same time as the host). Now the garbage collector is not called anymore but I still can't see the data. Prometheus returns with status:"success" but the result vector is empty.

Here is the process log by the way (jsony.sh --prometheus is just a wrapper around the docker command):

$ ./jsony.sh --prometheus 20200424T000317Z-64e116ebd138b60/
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:298 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:333 msg="Starting Prometheus" version="(version=2.17.2, branch=HEAD, revision=18254838fbe25dcc732c950ae05f78ed4db1292c)"
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:334 build_context="(go=go1.13.10, user, date=20200420-08:27:08)"
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:335 host_details="(Linux 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 30d5e4b721e8 (none))"
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:336 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-24T02:45:51.551Z caller=main.go:337 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-24T02:45:51.552Z caller=main.go:667 msg="Starting TSDB ..."
level=info ts=2020-04-24T02:45:51.552Z caller=web.go:515 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-04-24T02:45:51.553Z caller=repair.go:59 component=tsdb msg="found healthy block" mint=1587683361392 maxt=1587686596918 ulid=01E6MQXRNVTXTBM3CQ5BJX0BKY
level=info ts=2020-04-24T02:45:51.556Z caller=head.go:575 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-04-24T02:45:51.556Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-04-24T02:45:51.556Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=147.345µs
level=info ts=2020-04-24T02:45:51.557Z caller=main.go:683 fs_type=9123683e
level=info ts=2020-04-24T02:45:51.557Z caller=main.go:684 msg="TSDB started"
level=info ts=2020-04-24T02:45:51.557Z caller=main.go:788 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-04-24T02:45:51.557Z caller=main.go:816 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-04-24T02:45:51.557Z caller=main.go:635 msg="Server is ready to receive web requests."

Server response on my request:

$ curl -X GET -s "http://localhost:9090/api/v1/query?query=apiserver_request_count&time=$(date +%s)" && echo
{"status":"success","data":{"resultType":"vector","result":[]}}
Reply all
Reply to author
Forward
0 new messages