Preamble: So just started using Prometheus. Got started because my DBA wanted Percona setup and it came with all this jazz. When I saw the dashes and so forth I liked it. So now I am setting it up in a testbed/POC we are doing at the moment in AWS.
Before continuing,
i) the AWS setup is in a different account [with VPCpeering] and pretty restrictive security group/ports/inbound/outbound rules, so the simpler I can make this the better.
ii) I don't have a huge amount of time to spend on this atm, so easy wins help until I can revisit at a later point in the year.
iii) Autodiscovery looks cool. However we have just started moving to AWS so it is a slow process :P
1. I liked the look of the Percona version of the Grafana Dashes [Sytem Overview etc]. I originally hope to use telegraf as the remote for metrics as it seemed modular and I could just add whatever I wanted. However the labels are obviously different and I just can't set the time to even look at it. I wanted to use the Percona dashes so that we could have some homogenization and make it easy for users to get used to it.
Q: Unless I am overlooking something simple? If i use node_exporter I get the machines into these pre-created dashes, but is there any simple setup for rename_metric_label to convert from telegraf to node_exporter? Or am I being a moron?
2. Currently I am adding via static_config pointed at <host>:<ip>. I know the percona stuff does autodiscovery via consul, but once again time is against me so can't delve too deep into attempting this, although I want to.
Q: I attempted to rename the __address__ and remove the port [currenlty using 42000 as that is what PMM uses, and for simplicity it was easy to re-use], so just the hostname is visible. However this didn't work at all. I was kinda hoping I could pull "nodename" from the OS metric outputs and slap that into "target_label: instance" but went down a rabbit hole on that one.
3. When I get more au fait with the system, I plan on using DNS discovery later, with DNS peering across VPCs.
Q: Any gotcha's or opinions on this? [Our build scripts populate route53 with our instance dns info]
4. In my environment, I wont *really* be paying attention to these metrics on a 10s [default I belive] scrape setting. They are most likely to be used after the fact or if needed in real time, it would be known in advance allowing me change the scrape config, and reload. However 10s data points would be handy after the fact.
Q:Is it possible to set up the node_exporter to collect every 10s, and collect the data from the node every minute/2 mins etc? Is there much point to this as the data size would be the same, other than the number of network connections?
5. Q: Can i collect java info from the regular node_exporter - ie use the client_java [i just discovered on the github page] to correlate [we currently use Nagios/Cacti on-prem to montior heap usage, gc etc] this data into prometheus, and just have a new dash for java related stuff? Or will i have to open another port to allow this data be collected? Creating a single page for each account for the devs to see java stats in a nice graph/dash would be good :)
Or should I be using the jmx_exporter [ill probbaly be looking at apache_exporter and ha_exporter also when I get time, so extra ports is not a massive issue]
6. Q: [kinda related to 4]Cost in AWS: We use multiple accounts, with a central management account. All instances are in the same region, across 3 AZs, all on private IPs using VPC peering]. I understand that non AZ data transfer is 0.01€/GB. If i have say 30 instances scraped, with 10 in each AZ, we are only looking at a couple of bytes a scrape.
7. Q: Is there any easy way to visually browse the incoming data? I fluted about with chronograf when I was looking at influxdb for a different project and found it handy enough. Actually might look at that tomorrow. :)
I understand some of these questions are probably rudimentary, and I have read the manual, and tried a few things, but I could spend a lot of time barking up the wrong tree, when a few questions here could point me in the right direction.
Thanks for any help,
B
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1541038a-3fd2-48ba-b1b4-cce2067d16b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Wow Ben, thanks for all that!
>We recommend running a Prometheus instance inside each VPC, this makes security simpler, and also makes sure that VPC networking isn't a problem for monitoring.
Good to know, and we will explore this. We did want to be able to correlate data across accounts, but can investigate this method.
>We don't recommend telegraf, because it doesn't fit with the Prometheus best practices.
I read that article and understand. Thanks for the info
>Instance labels are designed to be unique identifiers of a target (in combination with job). So keeping the port there is important. However many setups use relabel configs to add a `node` label for connivence. This usually works quite well.
Ok. In the end I have actually just used the pmm-admin tool to put my non db instances into consul, so they will all have the same labels. Easy win for me, even if it is probably not the best way of doing things in the long run :) Ill revisit when I have time to learn more about Prometheus
>You will likely want a very short TTL, say 5 seconds, for this to behave well.
We have pretty static inventory currently. I just want to be able minimize the work required, so once we stand up a set of systems, they
>They have no concept of internal polling intervals. This makes them much easier to deploy, work well with HA or ad-hoc polling.Unsderstood. Only thing here is CPU was running extremely hot in my tests, and then when I checked, the scrapes were default set to 1s in PMM. :)
>You will want to have every application process expose a port for metrics, either via client_java or the jmx_exporter agent.
Got it. Hoping to look at a java exporter today if i get a chance.
Thanks again for everything, I have enough to work with :)
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2c58d604-b8d9-4902-b97f-98267962b473%40googlegroups.com.