Your problem is this: suppose you're recording blackbox_exporter output, and for simplicity I'll choose probe_success, which looks something like this (1 for OK, 0 when there's a problem):
---------_-------_----------------------_-------------------------_----------------
You're then viewing it in Grafana across a very wide time range, which picks out individual data points for each pixel:
- - - - - - - - _ - - - - - - - -
If you zoom out a long way, you can see it is likely to skip over points where the value was zero. This is bound to happen when taking samples in this way.
In an ideal world, you'd make each failure event increment a counter:
_________________
_______________________--------------------------
_________--------
Then when you look over any time period, you can see how many failures occurred within that window. I think that's the best way to approach the problem. Since blackbox_exporter doesn't expose a counter like this, you'd have to synthesise one, e.g. using a recording rule.
Assuming you only have the existing timeseries, then as a workaround for probe_success, you could try using something like this:
min_over_time(probe_success[$__interval])
$__interval is the time span in grafana of one data point (and changes with the graph resolution). With this query, it "looks back" in time before each point, and if *any* of the data points is zero, the result will be zero for that point; if they are all 1 then the result will be 1. But you may find that if you zoom in too close, you get gaps in your graph.
Or you can use:
avg_over_time(probe_success[$__interval])
In this case, if one point covers 4 samples, and the samples were 1 1 0 1, then you will get a data point showing 0.75 as the availability.
Now, that isn't going to work for probe_httpd_status_code, which has values like 200 or 404 or 503; an "average" of these isn't helpful. But you could do:
max_over_time(probe_httpd_status_code{instance="
https://xxxxxxx",job=blackbox-generic-endpoints"}[$__interval])
Then you'll get whatever is the highest status code over that time range. That is, if the results for the time window covered by one point in the graph were 200 200 404 200 503 200, then you'll see 503 for that point. That may be good enough for what you need.