Required keys

79 views
Skip to first unread message

Florian Forster

unread,
May 30, 2014, 7:08:08 PM5/30/14
to metr...@googlegroups.com
TL,DR: Proposal: Instead of the "unit" tag, make the "what" tag
mandatory. Make a recommendation when and how to set the "unit" tag and
split out a "scale" tag.

Hello,

I've actually thought about starting something along the lines of
"Metrics 2.0" myself, so thanks for getting this discussion started!

I wanted to discuss the current proposal of having one required key
only, and that key being "unit". In particular, I think that "unit" is
not important enough to be made mandatory and I want to argue that
having a small set of mandatory descriptive tags, for example "what" and
"server", would be very convenient.

* First, having an "unknown" unit for use with legacy clients already
sidesteps the "mandatory" requirement.

Think about it this way: If you really, absolutely want to make "unit"
mandatory, you would have to remove overly generic "legacy" values,
like "unknown". This means that a lot of existing projects out there
will likely not be able to use the "Metrics 2.0" schema correctly. The
question thus is: Are units really so important that we would rather
that existing clients not implement our schema at all than clients
sending data without a "unit" tag. If your answer is "yes", I anxious
to hear your argument :)

* Personally, I don't think that "units" are as important as that (the
above argument). The "unit" is not required for processing the metric.
When looking at the data interactively, e.g. when debugging, more
often than not is it possible to derive the unit from more descriptive
tags. For example, "server=foo what=processes type=blocked" can easily
be understood without a "unit" tag.

That said, I do think that units are important. For example
"server=foo what=network_traffic direction=out" would greatly benefit
from either "unit=byte" or "unit=bit". Still, having unit-less data
is, if in doubt, better than having no data at all.

* The spec throws "unit" and "order of magnitude" into the same tag. If
you want to write a graphing front-end based on this schema, you will
end up parsing the string provided in the "unit" tag, for example to
determine if the leading "m" is part of "ms" (milliseconds) or just
coincidence ("mo" for month, for example). I think having a separate
label would simplify this dramatically. The easiest to use from within
a computer program would be "e=-3" for "milli", or "scale=1024" or
"scale=2^10" for "Mi". Whatever the notation, the separation from
"unit" is what's important to me.

If your metric is tagged with "what=latency", it's a reasonably save
bet that the unit is going to be "seconds". It's the order of
magnitude that's really in question here, for example whether this
time is measured in "µs" or "100 ns", which is mostly not obvious. [*]

* Last, there are unit-less metrics, for example "percent" or "ratio".
Assigning them a "unit" feels wrong.

I think that there are good reasons to make the "what" tag mandatory. It
might also make sense to ask for a second mandatory tag.

* A metric without a "what" tag is very hard to interpret, to say the
last. For example, what is "server=foo unit=s"? The question ("What
is?") already hints at what's missing. With the "what=network_rtt" tag
you can take much more informed guesses what this metric is tracking.

* A lot of the proposed units would be a better fit for the "what" tag,
for example the "Err", "Conn", "Process" and "Ticket" units. I think a
better schema would be to make "what" mandatory and ask for "unit" if
it is different from "number of ${what}s".

For example:
* what=errors
(implied unit: number of errors)
* what=response http_code=5xx
(implied unit: number of HTTP 5xx responses)
* what=traffic interface=eth0 unit=byte
(we're not counting "traffics" => specify "unit")
* what=physical_memory type=available unit=byte
(we're not counting "physical memories" => specify "unit")

* When aggregating, you typically want to aggregate metrics that
represent one type of value, for example the sum over requests handled
by all of your web servers. Adding the sent network traffic and
available physical memory is much less common, even though both share
the same unit (bytes). So having a "what" label is very useful in that
case. This assumes, of course, that metrics with the same "what" label
implicitly share the same unit.

* In my experience, when aggregating metrics you most often don't want
to aggregate them "globally", i.e. over all machines in your fleet.
It is more useful to group metrics logically, for example the I/O load
of "all database machines".

I don't have a good proposal for a tag name, unfortunately. The best I
can come up with is "server_type" which is not quite appropriate for
routers and PDUs.

So, what are your thoughts? I'm happy to update the web page once a
consensus has been reached, I just thought that an email thread serves
this discussion better than a pull request.

Best regards,
—octo

P.S.: I also have some thoughts on (some) of the proposed unit values
and serialization, but I'll save that for a separate post to keep
this mail focused on one issue.


[*] Kudos for using "µ", by the way.
--
collectd – The system statistics collection daemon
Website: http://collectd.org
Google+: http://collectd.org/+
GitHub: https://github.com/collectd
Twitter: http://twitter.com/collectd
signature.asc

Dieter Plaetinck

unread,
Jun 1, 2014, 5:50:50 PM6/1/14
to metr...@googlegroups.com
Hi Florian,

interesting read, you're thinking deep about all the things I'm thinking of, and I'm happy to meet such people :)

just so you know, orginally, "what" was mandatory and there was no "unit" tag.  But the value of "what" was often enough a unit (mostly because i decided to standardize things like "processes" into a standardized unit like "Process"), so i basically ended up renaming the tag.
In this sense, a lot of what you want to use "what" for, is basically a unit, right now.  I think you already know this, but still an example:


> For example, "server=foo what=processes type=blocked" can easily
> be understood without a "unit" tag.

with the current spec we would just say "server=foo unit=Process type=blocked", so it's basically the same.
The idea is that it should be able to add a whole lot of custom units.
Right now the spec has a unit for processes, messages in a queue, queries, etc.  A lot of things that following your reasoning would go in "what", now go in "unit", which is basically equivalent.
As people need more different units for different things, we would add them to the spec.  And for things that only apply to 1 environment, you could specify a few of your own units that don't necessecarily make it into the spec.  An advantage here is that you can make compound units, like if i want to know how many processes are forked per second I can query for "Process/s". (see also compound units below), and if it's in the thousands, the dashboard could automatically put kProcess/s on the y-axis label.


> A metric without a "what" tag is very hard to interpret, to say the
> last. For example, what is "server=foo unit=s"? The question ("What
> is?") already hints at what's missing. With the "what=network_rtt" tag
> you can take much more informed guesses what this metric is tracking.

Metrics should be self-describing.  I.e. when looking at a metric it should be clear what it means.  It's the responsability of the person who adds new metrics to make sure they are self-describing by adding sensible tags.
Often adding a "what" tag is appropriate and the right choice.  In some cases it seems there are tag keys that you can use that are more descriptive than "what", and filling in their values makes the metric self-describing, in which case "what" is no longer needed.  For now I think we should make these nuances a bit more explicit, so that people know better know how to construct their metrics the proper way.
Either way, once we have more experience with the metrics out there, we can make "what" mandatory if that's the right way forward, but in that case it wouldn't change anything, because if the instructions and process are clear enough they would already result in the best metrics that case they would already happen to have the "what" tag all the time.

I often use this metric in my presentations:

{ server: dfvimeodfsproxy5, http_method: GET, http_code: 200, unit: ms, metric_type: gauge, stat: upper_90, swift_type: object }

Can you tell what it means?

It means response time in ms, but that's not very explicit.  There might be other things here expressed in ms.  I think in this case adding what=response_time is the right way.  but "time" is already implicit via the unit tag.  so we could make it what=response.  But "response" doesn't fully answer the question "*what* is this?", so maybe "type" is a better key, i.e. type=response.  But type is such a generic key, that is useful to describe a bunch of different dimensions, so i often try to pick a different name for my tags until i hit a property for which it's really hard to find a fitting keyword, in that case i'll use type (or no key, which is supported too)

In this case I think what=response_time is the best, even though the "time" is a bit redundant, this overlap doesn't really hurt, it makes the metric key a bit longer but that shouldn't make a difference.
(many systems right now are optimized for short, "easy to memorize/type" keys, but i think it's much better to have a solid, fully descriptive, longer key, as you won't be typing them from scratch anyway, but use advanced UI's instead)


> Think about it this way: If you really, absolutely want to make "unit"
>  mandatory, you would have to remove overly generic "legacy" values,
>  like "unknown". This means that a lot of existing projects out there
>  will likely not be able to use the "Metrics 2.0" schema correctly. The
>  question thus is: Are units really so important that we would rather
>  that existing clients not implement our schema at all than clients
>  sending data without a "unit" tag. If your answer is "yes", I anxious
>  to hear your argument :)


So you're thinking of making "what" mandatory and optionally adding a unit to further clarify/describe the metric.  I'm thinking of using "unit" for some of the cases where you suggest "what" (see above), and optionally using "what" to further clarify/describe the metric.  So basically we're talking about the same ideas, but only a practical difference. (see further down)


> I think a
>  better schema would be to make "what" mandatory and ask for "unit" if
>  it is different from "number of ${what}s".

why would this be better?
The main reason I can come up with now to prefer "unit", is compound units (see below), which easily allow automatic conversions (e.g. a metric with unit=B, query for unit Mb/s or MiB/d etc which does scaling, deriving, integration as needed)

I want to strongly encourage people to have a unit tag.   "unit tag is mandatory" which as it is right now, is ambiguous (my mistake).  The data model requires a unit tag
but you're right, unit=unknown is basically for scenarios where specificying a "real" unit is too time consuming or too hard.  (which will hopefully never happen, except when mass importing lots of legacy metrics, so some of the tooling actually does allow unset units and then just stores them as unit=unknown).  This still gives a bunch of advantages because you can query for unit=Munknown/s and it will do the scaling and derivatives automatically.

Projects that wish to support metrics 2.0 will need updates anyway, it's not like dropping the unit mandatory restriction will make things easier.  We still need well-formed, self-describing metrics, using carefully chosen tag keys and values.  And to do this you need to study the spec.  How to implement the spec is about implementation details and different implementations like tradeoffs wrt unit/what/scale, lead to different pros and cons wrt metrics usability, don't make adopting metrics 2.0 for things that submit metrics, any easier or harder.


> In my experience, when aggregating metrics you most often don't want
>  to aggregate them "globally", i.e. over all machines in your fleet.
>  It is more useful to group metrics logically, for example the I/O load
>  of "all database machines".
>  I don't have a good proposal for a tag name, unfortunately. The best I
>  can come up with is "server_type" which is not quite appropriate for
>  routers and PDUs

I've never seen this turn into an issue.
For me at vimeo it's easy, because our hostname contain "web", "db", etc and I can query on that.  if hostnames of all machines look similar, you would probably have extra metadata tags that describe the functionality the server performs, which can get use-case-specific, or you might have puppet/chef integration or something. Or maybe you run containers on a server.  There's a bunch of different deployment paradigms out there and I'm also not sure yet which tags to recommend for this, luckily i haven't needed to, so far.


> The spec throws "unit" and "order of magnitude" into the same tag. If
>  you want to write a graphing front-end based on this schema, you will
>  end up parsing the string provided in the "unit" tag, for example to
> determine if the leading "m" is part of "ms" (milliseconds) or just
>coincidence ("mo" for month, for example). I think having a separate
> label would simplify this dramatically. The easiest to use from within
> a computer program would be "e=-3" for "milli", or "scale=1024" or
> "scale=2^10" for "Mi". Whatever the notation, the separation from
> "unit" is what's important to me.

> If your metric is tagged with "what=latency", it's a reasonably save
> bet that the unit is going to be "seconds". It's the order of
> magnitude that's really in question here, for example whether this
> time is measured in "µs" or "100 ns", which is mostly not obvious. [*]

>  Last, there are unit-less metrics, for example "percent" or "ratio".
>  Assigning them a "unit" feels wrong.


as per the SI standard, things like "ms" (millisecond) fall well within the scope of units and compound units. (see also http://physics.nist.gov/Pubs/SP811/sec04.html).  I want to stick with standardized terminology here.  As for percent, I'm not entirely sure how to deal with it.  http://physics.nist.gov/Pubs/SP811/sec07.html applies a bit here.  Also section 4.2 (see above) mentions "The symbols for derived units are obtained by means of the mathematical operations of multiplication and division" so maybe we could include % in the unit.
as per the SI standard, scale and unit go hand in hand.  (note also that the SI standard unit for weight is kg)
Note that it's possible to have an empty unit tag, because some things inherently don't have a unit.  Like a ratio.  Just like %, i'm not sure how to deal with this yet.

It is true that separating scale (~ si prefix) from base unit (except for kg) would make it a bit easier so software doesn't have to parse.  But OTOH parsing a unit is not that complex, there's some rules and exceptions but it's still easy stuff. The benefit of having them together (besides being a more natural extension to the SI standard) is that it's a shorter notation which can be very powerfull (i.e. a metric stored with unit=MB can be queried for as unit=GiB/d and have scaling and deriving done automatically)
It's perhaps a little unfortunate that there's only 1 SI unit that has a SI prefix in the unit itself, kg, and this unit might seem not very relevant to the world of IT operations, but actually, I might want to track my rack weight in graphite, for example.


> Adding the sent network traffic and
> available physical memory is much less common, even though both share
> the same unit (bytes)

I think the bottom line for this (and the other examples you mention), at least according to my current perspective on things, is that such metrics will have tags that separate them enough from each other (and that might very well be a what=physical_memory vs what=network_traffic).  I'm not 100% sure yet it will always be the "what" tag, maybe they'll have the same information but with a different tag key.  We could standardize on making "what" mandatory (in addition to mandatory "unit", which can possibly be empty or "unknown"), which might introduce redundancy, although probably not much that it would become an issue, I'm just not 100% sold yet on using what for all information that describes the "what is this" aspect, sometimes other tags could be more fitting, or there could be several attributes that describe the "what is this", which can be better of in multiple, different tags.  I don't have concrete examples for this yet, but I can start actively looking for them, and if i don't find any within a certain timeframe, we can make "what" mandatory ;-)

John-John Tedro

unread,
Jun 2, 2014, 12:22:36 AM6/2/14
to Dieter Plaetinck, metr...@googlegroups.com
FTR: In the systems we are implementing, we don't use a 'what' tag, instead we've inherited KairosDB and OpenTSDB terminology with a 'key'.
This key is user-defined, and the only restrictions we set is that each unique key should be it's own 'thing' and of the same 'unit'.

I do however support the notion of a unit tag, but would like to avoid the standardization effort of it.
An organization is bound to always have a unit which is not covered by the standard and either; a) list all possible cases or b) semantics that inherently covers all cases or an a + b hybrid, would currently put unneeded strain on implementing systems.

I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.


--
You received this message because you are subscribed to the Google Groups "metrics2.0" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metrics20+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
John-John Tedro - udoprog
Hero at Spotify

Dieter Plaetinck

unread,
Jun 2, 2014, 9:39:16 AM6/2/14
to metr...@googlegroups.com
hey udoprog!

"key" doesn't seem like a good tag key to me. here's why:
"unit" conveys "the value of this tag is the unit the metric is in"
"what" conveys "the value of this tag describes what the metric is in"
"server" conveys "the value of this tag describes the server this metric relates to"
"key" conveys... that it's a key, but every tag key is a key? So why would you pick the word "key" to use as tag key, and not something more clear?


> I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.

and the unit standardisation just tries to capture that.
To unsubscribe from this group and stop receiving emails from it, send an email to metrics20+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

John-John Tedro

unread,
Jun 2, 2014, 10:50:11 AM6/2/14
to Dieter Plaetinck, metr...@googlegroups.com
On Mon, Jun 2, 2014 at 3:39 PM, Dieter Plaetinck <dieterp...@gmail.com> wrote:
hey udoprog!

"key" doesn't seem like a good tag key to me. here's why:
"unit" conveys "the value of this tag is the unit the metric is in"
"what" conveys "the value of this tag describes what the metric is in"
"server" conveys "the value of this tag describes the server this metric relates to"
"key" conveys... that it's a key, but every tag key is a key? So why would you pick the word "key" to use as tag key, and not something more clear?


It is a common, required, and unique identifier defined by the user of the system meant to encapsulate what makes sense to them.
It embraces that users are uneducated on the subject of semantic tagging.

We implement a soft enforcing of the structure of the key, so that;
  <service> <dot> <timeseries-name>
... is defined as the key, which does not exclude that service=<service> can also be a tag.

We also actively look at which keys are defined by the users, and if they contain too much information where they would benefit from using tags instead, we inform them that they should use tags for.

So the primary difference is that we see the identity of a timeseries as key and all possible permutations of tags where key is both a catch-all for old time-series specifications (like those provided by collectd, graphite, etc...) and an end user simplification to lower the barrier of entry into the system.


> I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.

and the unit standardisation just tries to capture that.


But it also establishes that any system implementing this specification has to handle the units available in it.
Any changes to the specification would therefore have to be reflected in these systems.
By making unit recommended instead of mandatory and link to external specifications - like one specifying the way graph-explorer treats units - implementing systems would not be required to care.

Units could be seen as a possible extension to the specification.
 
To unsubscribe from this group and stop receiving emails from it, send an email to metrics20+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dieter Plaetinck

unread,
Jun 11, 2014, 8:58:06 AM6/11/14
to metr...@googlegroups.com, dieterp...@gmail.com
On Monday, 2 June 2014 10:50:11 UTC-4, John-John Tedro wrote:
On Mon, Jun 2, 2014 at 3:39 PM, Dieter Plaetinck <dieterp...@gmail.com> wrote:
hey udoprog!

"key" doesn't seem like a good tag key to me. here's why:
"unit" conveys "the value of this tag is the unit the metric is in"
"what" conveys "the value of this tag describes what the metric is in"
"server" conveys "the value of this tag describes the server this metric relates to"
"key" conveys... that it's a key, but every tag key is a key? So why would you pick the word "key" to use as tag key, and not something more clear?


It is a common, required, and unique identifier defined by the user of the system meant to encapsulate what makes sense to them.
It embraces that users are uneducated on the subject of semantic tagging.


one of the core ideas of  metrics 2.0 is that it's well worth it to spend time familiarizing yourself with the spec, recommended tags, and esp. units, and putting a bit more thought into the metric when you're adding it to your code.  Because this pays off in the long run (compatibility across tools and people, clearer metrics, ability to generate graphs/dashboards, etc)

I think it's as reasonable to expect modern programmers to familiarize themselves with instrumentation, just as they also typically know SQL (as opposed to just using an ORM) for example.




> I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.

and the unit standardisation just tries to capture that.


But it also establishes that any system implementing this specification has to handle the units available in it.
Any changes to the specification would therefore have to be reflected in these systems.By making unit recommended instead of mandatory and link to external specifications - like one specifying the way graph-explorer treats units - implementing systems would not be required to care.
Units could be seen as a possible extension to the specification.

No,
a system that implements metrics 2.0 can be agnostic of which are the different kinds of unit values.  If it wants to convert units (like graph-explorer does) than it needs to be aware of prefixes (M, Ki, etc) and suffixes (/s, ps, /d) and how to convert between them, but even in this case, for a unit "M<base-unit>/s" it doesn't need to care about what the values of base-unit can be. Same idea for statsdaemon which basically just adds "/s" for rates per second and stat=<statistic> for statistical aggregates (timers),

John-John Tedro

unread,
Jun 11, 2014, 2:57:51 PM6/11/14
to Dieter Plaetinck, metr...@googlegroups.com


On 11 Jun 2014 14:58, "Dieter Plaetinck" <dieterp...@gmail.com> wrote:
>
> On Monday, 2 June 2014 10:50:11 UTC-4, John-John Tedro wrote:
>>
>> On Mon, Jun 2, 2014 at 3:39 PM, Dieter Plaetinck <dieterp...@gmail.com> wrote:
>>>
>>> hey udoprog!
>>>
>>> "key" doesn't seem like a good tag key to me. here's why:
>>> "unit" conveys "the value of this tag is the unit the metric is in"
>>> "what" conveys "the value of this tag describes what the metric is in"
>>> "server" conveys "the value of this tag describes the server this metric relates to"
>>> "key" conveys... that it's a key, but every tag key is a key? So why would you pick the word "key" to use as tag key, and not something more clear?
>>>
>>
>> It is a common, required, and unique identifier defined by the user of the system meant to encapsulate what makes sense to them.
>> It embraces that users are uneducated on the subject of semantic tagging.
>
>
>
> one of the core ideas of  metrics 2.0 is that it's well worth it to spend time familiarizing yourself with the spec, recommended tags, and esp. units, and putting a bit more thought into the metric when you're adding it to your code.  Because this pays off in the long run (compatibility across tools and people, clearer metrics, ability to generate graphs/dashboards, etc)
>
> I think it's as reasonable to expect modern programmers to familiarize themselves with instrumentation, just as they also typically know SQL (as opposed to just using an ORM) for example.
>

Indeed. And they will.

It is however possible to build a system that scale with the experience of user. I imagine the "what" tag will be widely (ab)used for this same reason.

>
>
>>
>> > I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.
>>
>> and the unit standardisation just tries to capture that.
>>
>
>> But it also establishes that any system implementing this specification has to handle the units available in it.
>>
>> Any changes to the specification would therefore have to be reflected in these systems.By making unit recommended instead of mandatory and link to external specifications - like one specifying the way graph-explorer treats units - implementing systems would not be required to care.
>>
>> Units could be seen as a possible extension to the specification.
>
>
> No,
> a system that implements metrics 2.0 can be agnostic of which are the different kinds of unit values.  If it wants to convert units (like graph-explorer does) than it needs to be aware of prefixes (M, Ki, etc) and suffixes (/s, ps, /d) and how to convert between them, but even in this case, for a unit "M<base-unit>/s" it doesn't need to care about what the values of base-unit can be. Same idea for statsdaemon which basically just adds "/s" for rates per second and stat=<statistic> for statistical aggregates (timers),
>

I re-read the spec and I feel that what you write above has another tone about the requirements surrounding unit.

Indeed it can be wholly transparent to various parts of the system which is something I feel should be encouraged at this phase of the development. I see "unit" as always having strong dependencies to the specific components in use until there is one definition that is a clear winner (I'm not even sure it's possible to get here). Due consider that this definition will have to be implemented by multiple people and organizations for this to take traction and the momentum of a large group will always be difficult to change if it is at some point needs to.

I do see your arguments, and the effort is worthwhile. But I do feel it is worth considering not making the definitions around how units are structured part of the core spec or looser. The biggest gain for me with semantic time series was always tags, not units.

Dieter Plaetinck

unread,
Jun 18, 2014, 2:27:35 PM6/18/14
to metr...@googlegroups.com, dieterp...@gmail.com


On Wednesday, 11 June 2014 14:57:51 UTC-4, John-John Tedro wrote:


On 11 Jun 2014 14:58, "Dieter Plaetinck" <dieterp...@gmail.com> wrote:
>
> On Monday, 2 June 2014 10:50:11 UTC-4, John-John Tedro wrote:
>>
>> On Mon, Jun 2, 2014 at 3:39 PM, Dieter Plaetinck <dieterp...@gmail.com> wrote:
>>>
>>> hey udoprog!
>>>
>>> "key" doesn't seem like a good tag key to me. here's why:
>>> "unit" conveys "the value of this tag is the unit the metric is in"
>>> "what" conveys "the value of this tag describes what the metric is in"
>>> "server" conveys "the value of this tag describes the server this metric relates to"
>>> "key" conveys... that it's a key, but every tag key is a key? So why would you pick the word "key" to use as tag key, and not something more clear?
>>>
>>
>> It is a common, required, and unique identifier defined by the user of the system meant to encapsulate what makes sense to them.
>> It embraces that users are uneducated on the subject of semantic tagging.
>
>
>
> one of the core ideas of  metrics 2.0 is that it's well worth it to spend time familiarizing yourself with the spec, recommended tags, and esp. units, and putting a bit more thought into the metric when you're adding it to your code.  Because this pays off in the long run (compatibility across tools and people, clearer metrics, ability to generate graphs/dashboards, etc)
>
> I think it's as reasonable to expect modern programmers to familiarize themselves with instrumentation, just as they also typically know SQL (as opposed to just using an ORM) for example.
>
Indeed. And they will.

It is however possible to build a system that scale with the experience of user. I imagine the "what" tag will be widely (ab)used for this same reason.

>
>
>>
>> > I am confident that a reasonable scheme of units would eventually grow organically from the best implementation around.
>>
>> and the unit standardisation just tries to capture that.
>>
>
>> But it also establishes that any system implementing this specification has to handle the units available in it.
>>
>> Any changes to the specification would therefore have to be reflected in these systems.By making unit recommended instead of mandatory and link to external specifications - like one specifying the way graph-explorer treats units - implementing systems would not be required to care.
>>
>> Units could be seen as a possible extension to the specification.
>
>
> No,
> a system that implements metrics 2.0 can be agnostic of which are the different kinds of unit values.  If it wants to convert units (like graph-explorer does) than it needs to be aware of prefixes (M, Ki, etc) and suffixes (/s, ps, /d) and how to convert between them, but even in this case, for a unit "M<base-unit>/s" it doesn't need to care about what the values of base-unit can be. Same idea for statsdaemon which basically just adds "/s" for rates per second and stat=<statistic> for statistical aggregates (timers),
>
I re-read the spec and I feel that what you write above has another tone about the requirements surrounding unit.


what do you mean with this? maybe because in the spec I emphasize that one should pick a fitting unit, and talk about the prefixes and unit conversion,
whereas in this context i was talking more about most tools in the pipeline don't need explicit support for additional baseunits. (i.e. it's only the sending side pretty much that needs to pay much attention to the correct baseunit).  Is there any other bigger difference you noticed?
 

Indeed it can be wholly transparent to various parts of the system which is something I feel should be encouraged at this phase of the development. I see "unit" as always having strong dependencies to the specific components in use until there is one definition that is a clear winner (I'm not even sure it's possible to get here). Due consider that this definition will have to be implemented by multiple people and organizations for this to take traction and the momentum of a large group will always be difficult to change if it is at some point needs to.

I do see your arguments, and the effort is worthwhile. But I do feel it is worth considering not making the definitions around how units are structured part of the core spec or looser. The biggest gain for me with semantic time series was always tags, not units.

You say the biggest gain for you is tags or units, do you mean you more commonly leverage tags (to group by <key>, avg by <key> , whatever by <key>) than leveraging unit information (for say, conversion)? If so, yes me too.
So what precisely do you mean with "how units are structured"?  What do you think we should make looser (or optional?)

The main thing that often bothers me about the current 2.0 spec, is that it can be mentally taxing to try to put all the words that describe the metric into key-value tags.
Often enough, it seems, there's interdependence between words, and the pieces to describe a metric can follow more of a natural language grammar and it can get awkward to split them into tags.  So usually that information goes into "what".  So my main take on it, is that we should support the "natural text" way a bit better, and not try to turn everything into a tag, because the value decreases when you push the idea too far.
So when you consider "what" becomes a very common thing, but "unit" is useful too because it allows conversions (even when not done commonly), there is a certain overlap between them.

maybe the unit can be annotated piece of information within the "what" tag. hmmm.
Reply all
Reply to author
Forward
0 new messages