grouping in metric calculation?

optimusprime

unread,

Jan 20, 2012, 7:44:32 PM1/20/12

to cube-user

What exactly is "grouping" while calculating metrics. When will an
expression have an associated group?

Mike Bostock

unread,

Jan 21, 2012, 1:59:51 PM1/21/12

to cube...@googlegroups.com

It's the same concept as MySQL's GROUP BY. If you took the example
query from the Cube page and added a group clause:

{
"expression": "sum(request).group(host)",
"start": "2011-09-10T12:37:12Z",
"stop": "2011-09-13T04:00:02Z",
"step": 300000
}

Then you'll get back individual results for each host + time, rather
than just a single value:

{"time": "2011-09-10T12:40:00Z", "group": "web11", "value": 5023}
{"time": "2011-09-10T12:40:00Z", "group": "web12", "value": 492}
{"time": "2011-09-10T12:40:00Z", "group": "web13", "value": 1401}

I think in a subsequent release, we may need to combine the results
for each time, so that you can tell when all the results have been
returned and when results are missing. That might look like this:

{"time": "2011-09-10T12:40:00Z", "value": {"web11": 5023, "web12":
492, "web13", "value": 1401}}

This might not perform as well with very high-cardinality groups,
though; I'd like to add a top N or bottom N to the group-by as well to
solve that problem.

Mike

optimusprime

unread,

Jan 21, 2012, 7:54:50 PM1/21/12

to cube-user

This sounds pretty nice. I guess I'll have to do some work on the
front-end to add new visualizations that support grouping. Another
question I had was is it possible to save more than one time stamp in
an event and then plot by either of the times. Say I have a task
started and task completed event and I just want one collection with
the data containing a start and end TS. Otherwise it seems like I'll
have to replicate all task metadata for each event. I know its not
currently supported in Cube, but how complicated would it be to make
this change if I wanted to?

Mike Bostock

unread,

Jan 21, 2012, 8:01:46 PM1/21/12

to cube...@googlegroups.com

> is it possible to save more than one time stamp in
> an event and then plot by either of the times.

If you had multiple times, they'd have to be indexed separately, so
there would be some overhead regardless.

At any rate, no, Cube does not currently support indexing by multiple
times—you'd need to send those as separate events. In general, you're
expected to denormalize data when you send events to Cube (since the
query language, like most NoSQL stores, doesn't support joins). So
duplicating that data for a start and end event isn't typically
consider expensive relative to all the other denormalization.

It would be possible to override the name of the time attribute to
allow you to index by multiple time fields, although that would
complicate the code somewhat.

Mike

Chris Bond

unread,

Aug 30, 2012, 5:18:59 PM8/30/12

to cube...@googlegroups.com

Optimus,

Can you store the duration? Eg: (from the cube examples)

  "data": {
    "duration_ms": 241
  }