Hi everybody,
after getting a bunch of feedback in the last 6 months (and also after coming back to m2.0 with a fresher and more critical pair of eyes, reviewing the spec/docs i've written, as well as re-reading the threads here) I've realized that there's some valid criticism and a few things need rethinking.
While the tagging and standardisation ideas enable a lot of useful features, the question of how do we define the tags, how do we format
our metric identifiers etc needs to be re-evaluated. Some of the ideas (metric id = set of unordered key-value pairs only, unit tag, what tag, etc) haven't really changed since the beginning, and
* they are suboptimal: awkward (especially to newcomers. too much of a paradigm shift compared to graphite/opentsdb/etc where you have a "key"), sometimes too verbose, needs more support for things that are not key=value pairs, such as regular words, and order (for natural language or to describe a logical ordering)
* I've also noticed that much of this syntax doesn't really matter much towards the end goal (as long as some key tags/values ultimately get assigned so you can leverage them, it doesn't matter much how the metric was emitted and made its way into the system).
When defining how metrics should look like, we should also differentiate between the ingest phase (coming out of your apps and agents), and potentially we can even support multiple formats/protocols for this, vs how should they look like in the index (coming out of carbon-tagger, structured-metrics, etc, i.e. the metrics with metadata to be used for querying etc)
When working all this out, in the discussions we often like to use a few examples that demonstrate a certain idea, but whatever formats we come up with, they *have* to apply to a broad range of metrics, so I want to start a body of known different things we want to measure, describe each, and then see how they would look like according to different formats, and how well each format fares across the various cases.
So let's take a step back and first identify which properties of *indexed* metrics are useful, we can then have another look at how do we format the metrics on ingest and process them, to enable this.
1 Interoperability. ability to switch out different agents, aggregators or dashboards with no or minimal need to change metric names or graph definitions.
2 self-describing, ability to see a metric and understand exactly what it means. ability for aggregation/render to leverage metadata when consolidating or rendering. (technically, the metric id could also be a short key, and correspond to an external piece of information that describes it more fully)
3 ability to search/filter metrics by any of these words/tags. (greatly reinforced by point 1)
4 tag keys (server=.. etc). i.e. a named dimension for words. so that you can do 'sum by', 'group by', 'avg by' etc (note that in many cases, auto-assigned keys such as n1, n2, n3 work well too, as long as the assignemnts are reliable)
5 automatic unit/scale conversion in visualisations (convert scales (prefixes) like G to k to T, but also automatically derive/integrate like MB to MB/day or Mbps to Mb), and both combined
6 dashboards knowing automatically what to put in legends (properties that the rendered things don't have in common) vs y-axis label and graph title (the stuff they have in common, splitting what goes on y-axis vs in title can be a bit tricky though sometimes. for now it's unit, type, target_type on y-axis label, rest in title)
7 finding duplicate metrics (i.e. different things sending the same metric shows that something might be wrong in your setup, or that metrics are not descriptive enough to differentiate themselves from each other). this is all about the fine art of choosing what are intrinsic words/tags, which are metadata, does order matter for the storage key, etc.
8 validate correctness of keys, where validate means more like "a UI that shows you which are non-standard, which can be totally fine but may give a clue about typos or bad formats or things you should try to add to the spec) and for some keys (like unit, with current spec), we can do the same for values.
9 ability to graph/aggregate/correlate very different metrics together by getting them from different areas of the metric space (unlike the tree model where you're limited), but whatever syntax is used it can be hard to pin down the exact metrics you want based on search terms, so strong what/unit/key information is useful here.
10 metric types, although the main useful thing is that something with type=counter can be shown derived by default, that's about it. aggregators like statsd use types to instruct which operations/statistical summaries to perform, but that's something else.
11 expressing equivalence (metric for all cores is equivalent to sum(all metrics for each core))
there's a lot of possible implementations for a format/protocol/spec and there's a bunch of subtle implementation details.
I have some fresh ideas, some people have suggested some good stuff via earlier threads or in person.
I'll try to gather them in a followup post. In the meantime, feel free to let me know your thoughts.