Data backfilling issue.

332 views
Skip to first unread message

Yagyansh S. Kumar

unread,
Oct 30, 2020, 4:46:20 AM10/30/20
to victorametrics-users
Hi. I have a Prometheus instance that is writing data to VM using remote_write. In my Grafana, VM is added as a datasource. Now, for testing purpose, I stopped the VM agent for 5 minutes and started it back. But now, I am observing that all my dashboards are showing No Data even after the VM agent is UP. Why is this happening? Am I missing something?
Also, in this case, VM won't have that 5 minute data, is there any way to backfile the data from Prometheus instance to cover that data gap?

Thanks in advance.

hage...@gmail.com

unread,
Oct 30, 2020, 6:03:48 PM10/30/20
to victorametrics-users
Hi Yagyansh,

It is unclear from your message if Prometheus writes directly to VM or to vmagent, which then writes to VM. Would be great to have clarity hear.
Nevertheless, very important to understand if Prometheus pushed the data for those 5mins or no. You can verify it by checking Prometheus metrics with prefix `prometheus_remote_storage_`. If, for example, metric `prometheus_remote_storage_dropped_samples_total` increased during the test - then Prometheus dropped data for remote storage and wouldn't retry it.
Regarding VM, it is recommended to configure monitoring for VM components - pls see how to do this here https://github.com/VictoriaMetrics/VictoriaMetrics#monitoring. Having monitoring and Grafana dashboards in place would help you to understand the state of components like vmagent or VM itself.
I'd also recommend to read Troubleshooting section for VM here - https://github.com/VictoriaMetrics/VictoriaMetrics#troubleshooting
And backfilling tips here - https://github.com/VictoriaMetrics/VictoriaMetrics#backfilling. It could be that you need just to reset VM cache after backfilling event.

> Also, in this case, VM won't have that 5 minute data, is there any way to backfile the data from Prometheus instance to cover that data gap?

Yes, you can use vmctl to backfill data from Prometheus to VM - see https://github.com/VictoriaMetrics/vmctl#migrating-data-from-prometheus. Via flags `--prom-filter-time-start` and `--prom-filter-time-end` you can backfill the missing part of data.

Yagyansh S. Kumar

unread,
Oct 30, 2020, 6:27:05 PM10/30/20
to hage...@gmail.com, victorametrics-users
Hi,



It is unclear from your message if Prometheus writes directly to VM or to vmagent, which then writes to VM. Would be great to have clarity hear.
    >> Data is being directly written to VM from Prometheus. 

Nevertheless, very important to understand if Prometheus pushed the data for those 5mins or no. You can verify it by checking Prometheus metrics with prefix `prometheus_remote_storage_`. If, for example, metric `prometheus_remote_storage_dropped_samples_total` increased during the test - then Prometheus dropped data for remote storage and wouldn't retry it.
    >> So far I see that there have been no dropped samples. I stopped VM 2-3 times and every time I didn't see any gaps in the data. When does Prometheus start dropping the samples? Couldn't find suitable documentation for the same.

Regarding VM, it is recommended to configure monitoring for VM components - pls see how to do this here https://github.com/VictoriaMetrics/VictoriaMetrics#monitoring. Having monitoring and Grafana dashboards in place would help you to understand the state of components like vmagent or VM itself.
I'd also recommend to read Troubleshooting section for VM here - https://github.com/VictoriaMetrics/VictoriaMetrics#troubleshooting
And backfilling tips here - https://github.com/VictoriaMetrics/VictoriaMetrics#backfilling. It could be that you need just to reset VM cache after backfilling event.
    >> Thanks, will go through these. I guess I missed these while going through the VM documentation.


Okay, while further testing I have noticed 2 issues. One is that once the VM is UP again after a downtime period, I see that it takes a little time(5-10 minutes) to populate the data in Grafana. For the first 5-10 minutes, I don't see any current data. After that initial period, all the data populates without any gaps. Is this the expected behaviour that VM takes a little time to populate the data?

Secondly, sometimes I see that when I refresh the Dashboard or change the time frame suddenly the panels for which I have selected "Instant" value in Grafana they stop showing the data at all. I would refresh 2-3 times and the data would display again and again it would go off for a while on the next refresh. Could this be solved by changing some command-line flags? Attaching the snapshot for the same.

image.png






> Also, in this case, VM won't have that 5 minute data, is there any way to backfile the data from Prometheus instance to cover that data gap?

Yes, you can use vmctl to backfill data from Prometheus to VM - see https://github.com/VictoriaMetrics/vmctl#migrating-data-from-prometheus. Via flags `--prom-filter-time-start` and `--prom-filter-time-end` you can backfill the missing part of data.

On Friday, October 30, 2020 at 8:46:20 AM UTC Yagyansh S. Kumar wrote:
Hi. I have a Prometheus instance that is writing data to VM using remote_write. In my Grafana, VM is added as a datasource. Now, for testing purpose, I stopped the VM agent for 5 minutes and started it back. But now, I am observing that all my dashboards are showing No Data even after the VM agent is UP. Why is this happening? Am I missing something?
Also, in this case, VM won't have that 5 minute data, is there any way to backfile the data from Prometheus instance to cover that data gap?

Thanks in advance.

--
You received this message because you are subscribed to a topic in the Google Groups "victorametrics-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/victorametrics-users/B27uuLL1eCA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to victorametrics-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/victorametrics-users/34921fcb-3231-4b2b-a98a-5216f813410en%40googlegroups.com.

Roman Khavronenko

unread,
Nov 2, 2020, 3:39:07 AM11/2/20
to Yagyansh S. Kumar, victorametrics-users
    >> So far I see that there have been no dropped samples. I stopped VM 2-3 times and every time I didn't see any gaps in the data. When does Prometheus start dropping the samples? Couldn't find suitable documentation for the same.

It starts to drop samples after 2h of remote storage being unresponsive - see https://prometheus.io/docs/practices/remote_write/#remote-write-characteristics

Okay, while further testing I have noticed 2 issues. One is that once the VM is UP again after a downtime period, I see that it takes a little time(5-10 minutes) to populate the data in Grafana. For the first 5-10 minutes, I don't see any current data. After that initial period, all the data populates without any gaps. Is this the expected behaviour that VM takes a little time to populate the data?

Datasources do not populate Grafana with data. Grafana acts like a proxy sending queries from the dashboards to the configured datasources and plotting received responses. The VM itself should start up quickly, you can verify it by looking into logs. And delay of data being visible may depend on lag between Prometheus and VM. I'd recommend to configure monitoring for VM so you can see when it starts up and receives data -  https://github.com/VictoriaMetrics/VictoriaMetrics#monitoring. If you use single version of VM just set `-selfScrapeInterval=10s` flag and configure the official dashboard https://grafana.com/grafana/dashboards/10229. This should provide some visibility of what happens.

Secondly, sometimes I see that when I refresh the Dashboard or change the time frame suddenly the panels for which I have selected "Instant" value in Grafana they stop showing the data at all. I would refresh 2-3 times and the data would display again and again it would go off for a while on the next refresh. Could this be solved by changing some command-line flags? Attaching the snapshot for the same.

Can you confirm which version of VM do you run? Some instant query bug fixes were included into 1.45.0 - would be great if you can check if the issue still exists for you.

Yagyansh S. Kumar

unread,
Nov 2, 2020, 4:54:34 AM11/2/20
to Roman Khavronenko, victorametrics-users
Hi Roman,

Thanks for the detailed explanation.

I have already set up the VM monitoring and I have observed from the logs as well as the dashboards that the data is being ingested to VM once it is UP & that is why it's more of a concern. Would be happy to make the VM down again to capture fresh stats if you like.

And as for the instant query bug, I am currently running 1.44.0. I'll update the version and post here if the issue still persists.
Reply all
Reply to author
Forward
0 new messages