I'm merely a user and sometimes contributor, but I happen to disagree with Brian on this particular issue.
First of all, to get this out of the way, it's true you can't get 100% accurate values from a metrics-based system. There are many reasons for that, but I'll only touch on 2:
(1) Metrics-based systems necessarily use sampling, and you may not be able to have 2 samples that are exactly 30 days apart (there will always be a few seconds or minutes of slip). You can either interpolate/extrapolate (which is what Prometheus does) or take an exact difference, but which doesn't cover _exactly_ 30 days.
(2) When a monitored process gets restarted, the counter drops to zero (and you can account for that), but it's very likely that a handful more events/messages/whatever happened between the moment of the last scrape and the moment the process actually terminated. Those events/messages/whatever never happened from the point of view of Prometheus.
You could argue that the same kind of issues exist with logs, i.e. it's possible that the timestamp of log events is skewed (because the clock on one machine is off by a few seconds/minutes), so you may end up including/excluding events that happened (strictly speaking) outside/inside the exact 30 day range. Similarly, when a machine crashes and burns, it's possible that some of the events it has logged locally just before the crash are lost because they were never pushed into/ingested by the logs analytics system. Less of an effect, but still not 100% by any stretch of the imagination.
Now going back to your question: I am taking your statement that "graph [...] results will be very accurate" to mean that you/your users are perfectly happy with the accuracy of the numbers provided by Prometheus and you're simply looking for a way of getting one number for the last 30 days, accurate to the extent that Prometheus' numbers are.
The simplest way you can get what you want is to simply do:
sum(increase(messages_received_total[30d]))
and represent that in Grafana as a singlestat panel with "Instant" checked (you only need the last value, and it's expensive to compute on the fly anyway). There will be some minor artifacts due to the fact that Prometheus takes the values strictly within the interval and extrapolates, but that's an error proportional to scrape_interval / 30d (it's much worse for very short ranges), but unless you run into performance issues with Prometheus, you're all set.
The other alternative, which you hint at, is to compute the increase over shorter ranges (say 5 minutes) as a recorded rule. Then sum those values over the last 30 days to get the total number of messages. There are a couple of pitfalls here, though:
(1) If one of your systems is not scraped for longer than 5 minutes (or whatever your choice of interval is) then you'll lose all increases from that system for that period.
(2) As noted above, Prometheus takes the values falling strictly within the range and extrapolates to the whole range. This estimation error will be much more visible here (just as it is on your graphs), as it's now proportional to scrape_interval / 5m. To work around that, what I'm doing is I'm actually recording
increase(metric[5m + scrape_interval]) * 5m / (5m + scrape_interval)
with all the values hardcoded. :(
(3) You will either have to make sure the rule is evaluated every 5 minutes, or, if evaluated more frequently you'll have to divide the final number by the ratio of 5m / eval_interval (because you're counting every message multiple times).
Cheers,
Alin.