special characters in metric names?

2,949 views
Skip to first unread message

bernd

unread,
May 21, 2013, 3:02:18 PM5/21/13
to open...@googlegroups.com
we currently write an application where users can define metrics, tagv, tagv and found out that special characters like "ä, ö, ü, ß" or blancs in metric names, tagv or tagk are not persisted correctly.
I read the documentation but i can't find something about that topic.
Is there a limitation of chars which can be used in metric names, tagv  or tagk? 

ManOLamancha

unread,
May 21, 2013, 5:42:04 PM5/21/13
to open...@googlegroups.com

I'll add it to the documentation but right now the metric, tagk and tagv names are all limited to a-z, A-Z, 0-9, "-", "_", "." and "/" characters.Some folks have asked to support Unicode and I'll look into what changes we'd have to make to do that correctly.

ff_angwonson

unread,
May 28, 2013, 7:19:56 PM5/28/13
to open...@googlegroups.com
Any update on this? I have run into a similar problem with tag values. Tag names are easier to control since we don't allow just anyone to create a tag, but the values should support any unicode value.

On a related note, what would be the best practice for sending double quote in tag value? like using quotemeta before sending data to mysql? For example:

somemetric 1369783067 2320616 hostname=a1.test.com stringtag="some value contains a \" quote character"

Shrek Fan

unread,
Jul 4, 2013, 4:37:40 AM7/4/13
to open...@googlegroups.com
Have the same requirement.Need the metric names can be unicode.

Vasiliy Kiryanov

unread,
Jul 4, 2013, 2:05:55 PM7/4/13
to open...@googlegroups.com
Hello Chris.

We already have it in documentation. http://opentsdb.net/metrics.html (Precisions on metrics and tags).

--
Vasiliy Kiryanov | http://twitter.com/vasiliykiryanov

ManOLamancha

unread,
Jul 5, 2013, 12:00:45 PM7/5/13
to open...@googlegroups.com
On Thursday, July 4, 2013 2:05:55 PM UTC-4, Vasiliy Kiryanov wrote:
Hello Chris.

We already have it in documentation. http://opentsdb.net/metrics.html (Precisions on metrics and tags).

Oh cool, I missed that part. It's also now in http://opentsdb.net/docs/build/html/user_guide/writing.html

ManOLamancha

unread,
Jul 5, 2013, 12:10:31 PM7/5/13
to open...@googlegroups.com
On Tuesday, May 28, 2013 7:19:56 PM UTC-4, ff_angwonson wrote:
Any update on this? I have run into a similar problem with tag values. Tag names are easier to control since we don't allow just anyone to create a tag, but the values should support any unicode value.

On a related note, what would be the best practice for sending double quote in tag value? like using quotemeta before sending data to mysql? For example:

somemetric 1369783067 2320616 hostname=a1.test.com stringtag="some value contains a \" quote character"

Anyway to follow up for everyone, a few reasons why it's currently limited to ASCII:

- It's much easier to create a simple white-list character filter with ASCII than with Unicode. If someone wants to investigate doing so with UTF-8 that be great. It must be performant though to keep up to current write speeds.
- Tsuna wanted to be able to use special characters in the future for queries (like {}+-, etc)

So if someone wants to investigate what it would take to implement a performant UTF parser that only allows secure characters on write, we'd appreciate it. Otherwise a work around is to store an ASCII value as your metric or tag name/value, and store the unicode value in the Meta data "displayName" field.

* ps right now quotes aren't allowed so there's no escaping.

Vasiliy Kiryanov

unread,
Jul 10, 2013, 11:36:36 AM7/10/13
to open...@googlegroups.com
Hello Chris.

In following naive character filter we have 0.120ms vs 0.010ms(current validateString)
for 70 symbols string for validation, and it is a problem to improve performance. How much performance we ready to loose to support Unicode? /** * Ensures that a given Unicode string is a valid metric name or tag name/value. * @param what A human readable description of what's being validated. * @param s The string to validate. * @throws IllegalArgumentException if the string isn't valid. */ static void validateUnicodeString(final String what, final String s) { if (s == null) { throw new IllegalArgumentException("Invalid " + what + ": null"); } final int n = s.length(); for (int i = 0; i < n; i++) { final Character c = s.charAt(i); if (!((Character.isLetterOrDigit(c) || c.equals('-') || c.equals('_') || c.equals('.') || c.equals('/') ))) { throw new IllegalArgumentException("Invalid " + what + " (\"" + s + "\"): illegal character: " + c); } } }

--
Vasiliy

Vasiliy Kiryanov

unread,
Jul 10, 2013, 12:46:42 PM7/10/13
to open...@googlegroups.com
Using of char instead of char wrapper gives 0.076 vs 0.010(current validateString):


  /**
   * Ensures that a given Unicode string is a valid metric name or tag name/value.
   * @param what A human readable description of what's being validated.
   * @param s The string to validate.
   * @throws IllegalArgumentException if the string isn't valid.
   */
  static void validateUnicodeString(final String what, final String s) {
    if (s == null) {
      throw new IllegalArgumentException("Invalid " + what + ": null");
    }
    final int n = s.length();
    for (int i = 0; i < n; i++) {
      final char c = s.charAt(i);
      if (!((Character.isLetterOrDigit(c) || c == '-' ||
            c == '_' || c == '.' || c == '/' ))) {

        throw new IllegalArgumentException("Invalid " + what
            + " (\"" + s + "\"): illegal character: " + c);
      }
    }
  }

--
Vasiliy

ManOLamancha

unread,
Jul 10, 2013, 7:32:12 PM7/10/13
to open...@googlegroups.com
On Wednesday, July 10, 2013 12:46:42 PM UTC-4, Vasiliy Kiryanov wrote:
Using of char instead of char wrapper gives 0.076 vs 0.010(current validateString):

Wow, thanks for checking those Vasiliy. Did you run the comparison multiple times to get a good average? I know folks will complain about micro-benchmarking. With differences that large, if folks really want unicode, we could *maybe* add a config flag that would perform the different checks.

tsuna

unread,
Jul 11, 2013, 1:33:07 AM7/11/13
to ManOLamancha, open...@googlegroups.com
On Wed, Jul 10, 2013 at 4:32 PM, ManOLamancha <clars...@gmail.com> wrote:
> Wow, thanks for checking those Vasiliy. Did you run the comparison multiple
> times to get a good average? I know folks will complain about
> micro-benchmarking. With differences that large, if folks really want
> unicode, we could *maybe* add a config flag that would perform the different
> checks.

My main concern here isn't so much the performance of the check, but
it's just that it's going to make things a whole lot more complicated
for us down the road because we'll have to support people who turned
the unicode support on.

The main reason why the character set allowed is fairly restrictive
was that it meant that
1. no escaping was needed anywhere
2. it left the door open to future extensions, including the
possible creation of a query language

If we start allowing all sorts of UTF-8 or unicode characters, I
believe it will be much harder to introduce additional notations in
the future. For example if "+" is allowed and in the future we wanna
be able to say "appserver.requests{status=internal_error}+appserver.requests{status=request_timeout}"
then things become ambiguous because "+" could be part of the metric
name, or it could be a binary arithmetic operator, for example. Of
course this example is trivial, and even for a program it wouldn't be
too hard to disambiguate, but things can easily get highly ambiguous
for programs.

We could maybe turn this around and declare that there is a reserved
set of characters that we specifically do not want to allow (which
today would already need to include "{}=:-" amongst possibly other
ones) so we could exclude all the things like "!@#$%^&*()+[]:;'\"<>/?"
and such.

--
Benoit "tsuna" Sigoure

Vasiliy Kiryanov

unread,
Jul 11, 2013, 8:27:27 AM7/11/13
to open...@googlegroups.com, ManOLamancha

Hello Folks.


As you see we use white-list filter that allow in unicode only: letters, digits, - + . / (not all unicode symbols)

So we just need to remove + – / from white-list to disallow them. As for backslash (\) we allowed it recently as it is useful for disks names.


For performance testing I use java profiler so 0.076 is good average for 70 symbols string.


--

Vasiliy Kiryanov

John A. Tamplin

unread,
Jul 13, 2013, 9:49:35 AM7/13/13
to open...@googlegroups.com, ManOLamancha
On Thursday, July 11, 2013 1:33:07 AM UTC-4, tsuna wrote:
We could maybe turn this around and declare that there is a reserved
set of characters that we specifically do not want to allow (which
today would already need to include "{}=:-" amongst possibly other
ones) so we could exclude all the things like "!@#$%^&*()+[]:;'\"<>/?"
and such.

Ouch - we heavily use dashes in metrics now, so it would be a major pain if the new version outlawed them.

Surely there are other solutions for allowing arbitrary expressions on metrics, such as requiring spaces around operators, writing it as sum(m1, m2, ...), etc.

--
John Tamplin

tsuna

unread,
Jul 13, 2013, 2:26:51 PM7/13/13
to John A. Tamplin, open...@googlegroups.com, ManOLamancha
On Sat, Jul 13, 2013 at 6:49 AM, John A. Tamplin <j...@jaet.org> wrote:
> Ouch - we heavily use dashes in metrics now, so it would be a major pain if
> the new version outlawed them.

Sorry I forgot that dash was currently allowed. We obviously don't
wanna "outlaw" anything that is currently legal. See also the discussion at
https://github.com/OpenTSDB/opentsdb/pull/205#issuecomment-20923563

--
Benoit "tsuna" Sigoure

ManOLamancha

unread,
Jul 22, 2013, 10:08:43 PM7/22/13
to open...@googlegroups.com, ManOLamancha
On Thursday, July 11, 2013 8:27:27 AM UTC-4, Vasiliy Kiryanov wrote:
As you see we use white-list filter that allow in unicode only: letters, digits, - + . / (not all unicode symbols)

So we just need to remove + – / from white-list to disallow them. As for backslash (\) we allowed it recently as it is useful for disks names.


Thanks Vasiliy! Patch pulled into the "next" branch so Unicode is supported (the dash remains).

Shrek Fan

unread,
Jul 23, 2013, 2:03:03 AM7/23/13
to open...@googlegroups.com, ManOLamancha
Hi,tsuna

If can I use Chinese for the Metric name ?

tsuna

unread,
Jul 23, 2013, 3:35:08 AM7/23/13
to Shrek Fan, open...@googlegroups.com, ManOLamancha
On Mon, Jul 22, 2013 at 11:03 PM, Shrek Fan <fanb...@gmail.com> wrote:
> If can I use Chinese for the Metric name ?

Yes.

--
Benoit "tsuna" Sigoure

Jerome Wu

unread,
Mar 1, 2016, 6:10:34 AM3/1/16
to OpenTSDB, fanb...@gmail.com, clars...@gmail.com
  I  can understand your solution do, thanks. 

在 2013年7月23日星期二 UTC+8下午3:35:08,tsuna写道:

redpep...@gmail.com

unread,
Sep 12, 2017, 5:02:12 AM9/12/17
to OpenTSDB
Hello Friend,

I facing issue while inserting the special character(#,%,°C) in OpenTSDB. I am using OpenTSDB 2.2.0 and using import command on CLI to insert data through the .gz file.

.gz files contains data as mentioned below, 

matric_name 1356998400 42.5 host=webserver01 special=%/H

Thanks in advance

ManOLamancha

unread,
Jan 29, 2018, 8:27:38 PM1/29/18
to OpenTSDB
On Tuesday, September 12, 2017 at 2:02:12 AM UTC-7, redpep...@gmail.com wrote:
Hello Friend,

I facing issue while inserting the special character(#,%,°C) in OpenTSDB. I am using OpenTSDB 2.2.0 and using import command on CLI to insert data through the .gz file

Unfortunately we have a built-in limit wherein you'd have to modify the source code to allow those characters. I thought there was public code that allowed an extra whitelist but it doesn't look to be present. Please give us a feature request in Github. Thanks.
Reply all
Reply to author
Forward
0 new messages