The reason this came up at all is that grafana was crashing and wouldn't load 12 hours of data since I set this up yesterday afternoon.
Grafana always point their finger at "too many data points in your TSDB" when people say they are slow.
Running the metric query on the prometheus side shows that indeed I have the same metric every 10 seconds even though the metric is stale and so something around 17k metrics from across my 4 test machines for the last 12 hours. Of course most of those metrics are duplicates...which is why I was wondering what the point of the timestamp is in the exposition format if not to indicate to prometheus when this data is from in order to aid de-duplication. Of course half the point of this thread is that the timestamp in the exposition format isn't allowed by the textfile collector.
Perhaps 17k data points are just too many for grafana to deal with or perhaps the performance problem lies elsewhere. My first instinct realizing that many of these data points are duplicates due to oversampling was that this is a performance problem related to oversampling un-updated datapoints. Without a timestamp on each metric, no way for the exporter to indicate to prometheus that this is the same metric already scraped...not an updated metric that happens to have the same value. Right?
I don't want to write custom exporters and open/track more metric ports for every little thing (e.g. cron job status, security updates available, etc). Of course for our own edge applications we will do custom instrumentation. For well known services that have a specialized exporter, of course we will use that. But getting basic server/service health monitoring in place is something that should work fairly easily out of the box I would think.
Perhaps in real world deployments people are still using nagios/naemon for all that kind of basic redlight/greenlight stuff and just using prometheus for more advanced whitebox monitoring? Or perhaps most companies using prometheus have large dedicated devops teams, everything runs in a DC behind firewalls and they just open a large number of ports and manage the corresponding metric exporters installs/configs on the servers.
As I said, I am new to Prometheus and I am sure you guys have many more years/months of thought about how prometheus should work and why. So I defer to your expertise...which is why I am asking questions...to better understand the hows/whys so my intuition about how to do things gets better. The more I explain how/why I expect it to work, the easier it is for you to point out exactly what the problems are with my line of thinking.
Thanks for your reply. Happy to hear any further insight or suggestions you may have.