Hi all,
Does anyone know a good way to do continuous performance monitoring using JFR (JDK8)? I am interested in using this on some apache data pipeline projects (Spark, Flink etc). I have used JFR for perf profiling with fixed duration before. Continuous monitoring would be quite different.
The ideal scenario is to set up JFR to write to UDP <ip:port> destinations with configurable update frequencies. Obviously that is not supported by JFR as it stands today. So I tried setting up continuous JFR with maxage=30s and running JFR.dump every 30s, to my surprise the time range covered by the dumped jfr files does NOT correspond to the maxage parameter I gave. Instead the time ranges (FlightRecordingLoader.loadFile(new File("xyz.jfr")).timeRange) from successive JFR.dump can be overlapping and much bigger than maxage.
So couple of questions for those experienced users of JFR:
-- What exactly is the semantics of maxage?
I imagined that maxage has 2 effects: discarding events older than maxage and aggregating certain metrics (like stacktrace sample counts) over the time interval. It appears my understanding was way off.
-- How does the event pool/buffer under consideration for next JFR.dump get reset?
I was hoping every JFR.dump would reset the pool and allow the next JFR.dump to output non-overlapping time range. I was also wrong here.
-- Is there any way to do continuous perf monitoring with JFR with a configured aggregation and output interval?
One thing I did notice is that JFR would periodically (default seems 60s) flush to chunk files and then rotate chunk files according to maxchunksize param. I could use that mechanism to inotify-watch the repository dir and just read and parse the chunk files. However there are a few things missing if I wanted to go down this route: there is no way to set "maxchunkage" (would like to be able to set one as low as 10s), I will need to write some custom chunk file parser, not sure if chunk files have all the symbols to resolve the typeids.
Thanks!