Extending metrics to support tags...

1,406 views
Skip to first unread message

Kevin Burton

unread,
Sep 17, 2014, 3:26:15 PM9/17/14
to metric...@googlegroups.com
Afternoon.  We use KairosDB internally to host our metrics backend.  It's a time series database similar to OpenTSDB.

A number of modern metrics system support pivoting on tags.  This includes the older OpenTSDB and the newer KairosDB but also hosted analytics / metrics platforms like DataDog.

We're considering supporting DataDog as they do a good job at machine stats...

The problem is that metrics-datadog , which I forked here:


from bazaarvoice, supports tags, but in a completely incompatible way from the way I supported them in metrics-kairosdb-burtonator.

The approach I took, was to encode them in the metric name.

For example


Where the tags here are foo=bar and cat=dog  and the name of the metric is mymetric.name

The problem is that the metrics-datadog took a completely different approach to encoding the metrics in the name.

They use:

mymetric.name[foo:bar,cat:dog]

I propose that we extend the MetricRegistry to assign tags to metrics in some sort of standardize way.

I think this is going to come up more and more... so might as well start resolving it now.

Thoughts?

Ryan Tenney

unread,
Sep 17, 2014, 9:01:43 PM9/17/14
to metric...@googlegroups.com
Hi Kevin,

It is imperative that some form of tagging be introduced in v4.  A very interesting proposal for metric tags was submitted as a pull request not too long ago: https://github.com/dropwizard/metrics/pull/561

I haven't had the luxury of devoting any time to v4 just yet, having just gotten v3.1 out the door and actively working on v3.1.1.

I'd encourage everyone to share their use cases for metric tags in this thread.


Ryan

Kevin Burton

unread,
Sep 18, 2014, 6:55:57 PM9/18/14
to metric...@googlegroups.com
Thanks Ryan.

Here's my implementation of metrics tags:


it supports kairosdb.

Here are my takeaways:

- it might require a singificant refactor which might end up with lots of breakage.  

To avoid that it might be possible to just encode the tags in the metric name.  For example ?foo=bar as a suffix.  

- many systems like opentsdb/kairosdb take advantage of sparse tags.  So you might emit a metric +tag once.. .and NEVER emit it again.  

The problem here is that you're eventually going to run out of heap memory. I implemented a GC mechanism whereby if a metric doesn't change then I just remove it from the registry.  It's not pretty as there's a race between removal and re-adding it... 

I think this could reasonably be added by having a timestamp updated on each metric.

Salil Surendran

unread,
Jun 10, 2015, 1:14:44 PM6/10/15
to metric...@googlegroups.com
In this particular PR the name has been replaced with a class MetricName. Instead of this wouldn't it be simpler to have tags supplied when one creates a counter, gauge or timer etc via the MetricsRegistry.counter, .gauge() etc. This would lead to far fewer changes in the overall source code.

Ryan Rupp

unread,
Jun 11, 2015, 11:16:11 AM6/11/15
to metric...@googlegroups.com
I don't think it would be simpler, a good amount of those code changes are because anywhere you see a "String name" needs to be replaced with MetricName, anything that does reading or any of the reporters would need to be updated. So, even if instead you used a String name + Map<String, String> tags (which is effectively what the MetricName class is), you would still have to hand out that map of tags to all the reporters/anyone reading from the metric registry so regardless it's going to propagate itself quite a bit inside of the code base just because it's a fundamental part of the library. It makes sense then to encapsulate this inside of the MetricName class. Looking at this proposed pull request - https://github.com/dropwizard/metrics/pull/561/files - the MetricRegistry still supports registering metrics with just a String name (they just internally build into a MetricName with no tags). 

I think moving to a MetricName being a key + tags probably makes sense given this is what tools like OpenTSDB, InfluxDB, Datadog etc. are going towards to allow for better slicing/dicing of metrics stored in a timeseries database like one of those tools. Netflix's Servo library already uses this concept see - https://github.com/Netflix/servo/blob/master/servo-core/src/main/java/com/netflix/servo/monitor/MonitorConfig.java#L148 - I think one potential issue is that anything that is locally aggregated e.g. percentiles for a timer, isn't that useful for slicing and dicing because you end up aggregating twice e.g. percentiles are aggregated locally on the JVM then reported to a timeseries database, then from there if you aggregate across a cluster you're taking an aggregate of an already aggregated value, there's an interesting post on Netflix Servo github about why they use bucket timers for this currently - https://github.com/Netflix/servo/issues/321#issuecomment-101115722

Russell Allen

unread,
Jan 21, 2016, 2:02:12 PM1/21/16
to metrics-user
Here's an additional use case for tags:

I have an AOP pointcut that applies a Timer around the execution of methods at key layers of the code (controller, service, repository.)  It works great for reporting performance... except when there is an exception thrown.  In that case, the timer is still updated (the pointcut advise uses a try-finally block) but the timing value reflects the time to exception, not the time to completion.  From a statistics point of view, this pollutes the data.  I'd like to be able to tag the timer update call as "exception" or not.

Ryan Rupp

unread,
Jan 21, 2016, 11:00:44 PM1/21/16
to metrics-user
With Metrics 3 you could do that by appending ".exception" to the name or something along those lines e.g.:

mymethod.success // timer name for successful calls
mymethod.exception // timer name for exception calls

Stuffing dimensions into the name itself is sort of messy though vs in metrics 4 you could have:

mymethod;status=success
mymethod;status=exception

but yeah logically you'll still end up with the same thing in 3 vs 4 (two different timers just naming styles are different). Not sure if maybe you meant using one timer where the "update" method call would take some tag context around it which would drive the aggregation?
Reply all
Reply to author
Forward
0 new messages