Sorting in a groupBy query with timestamp

502 views
Skip to first unread message

Kyro Zetera

unread,
Mar 31, 2016, 6:54:54 PM3/31/16
to Druid Development
Is it possible to sort a query by a metric or dimension, over timestamp? Or does a groupBy query just always sort by timestamp first?

I am running the following query on my data:

{
  "queryType": "groupBy",
  "dataSource": "reporting",
  "granularity": {"type": "period", "period": "P1W", "origin": "2015-12-27"},
  "limitSpec": {"type":"default","limit":1000,"columns":[
      {"dimension": "revenue", "direction": "ASCENDING"}
  ]}, 
  "dimensions": ["hotel_id", "placement"],
  "aggregations": [
    {"type": "count", "name": "count"},
    { "type": "doubleSum", "name": "billing_cost", "fieldName": "billing_cost" },
    { "type": "doubleSum", "name": "transactions", "fieldName": "transactions" },
    { "type": "doubleSum", "name": "revenue", "fieldName": "revenue" }
  ],  
  "intervals": [ "2016-01-01T00:00:00.000/2016-03-01T00:00:00.000" ],
  "having": {
      "type": "and",
      "havingSpecs": [{
          "type": "greaterThan",
          "aggregation": "billing_cost",
          "value": 0
      }]  
  }
}

I can use `granularity: all` to just sort by my given metric in some cases, but it would be nice to get the top performing weeks/days/months over a set of dimensions.

Fangjin Yang

unread,
Apr 1, 2016, 6:19:06 PM4/1/16
to Druid Development
Kyro, take a look at http://druid.io/docs/latest/querying/granularities.html#period-granularities

Alternatively, you can try https://github.com/implydata/plyql and just write SQL queries. PlyQL also acts as a pretty advanced query planner for Druid.

Kyro Zetera

unread,
Apr 3, 2016, 11:56:14 PM4/3/16
to Druid Development
Fangjin, thanks for the reply, but I think my question was not clear. I'm pretty clear on how the granularities themselves work, but what I'm wondering is if there's a way to sort the query by another column with sort priority over the timestamp. I don't see this information in the doc for any query type, and I'm guessing it's not currently supported? 

For example, a query like above returns the following result(using just the hotel_id dimension to simplify a bit):

[ {
  "version" : "v1",
  "timestamp" : "2016-01-03T00:00:00.000Z",
  "event" : {
    "hotel_id" : 1,
    "revenue" : 123.12,
    "language" : "en"
  }
}, {
"version" : "v1",
  "timestamp" : "2016-01-03T00:00:00.000Z",
  "event" : {
    "hotel_id" : 2,
    "revenue" : 456.12,
    "language" : "en"
  }
}, {
  "version" : "v1",
  "timestamp" : "2016-01-10T00:00:00.000Z",
  "event" : {
    "hotel_id" : 1,
    "revenue" : 234.12,
    "language" : "en"
  }
}, {
  "version" : "v1",
  "timestamp" : "2016-01-10T00:00:00.000Z",
  "event" : {
    "hotel_id" : 2,
    "revenue" : 567.12,
    "language" : "en"
  }
} ]

But is it possible to get this output:

[ {
  "version" : "v1",
  "timestamp" : "2016-01-03T00:00:00.000Z",
  "event" : {
    "hotel_id" : 1,
    "revenue" : 123.12,
    "language" : "en"
  }
}, {
"version" : "v1",
  "timestamp" : "2016-01-10T00:00:00.000Z",
  "event" : {
    "hotel_id" : 1,
    "revenue" : 234.12,
    "language" : "en"
  }
}, {
  "version" : "v1",
  "timestamp" : "2016-01-03T00:00:00.000Z",
  "event" : {
    "hotel_id" : 2,
    "revenue" : 456.12,
    "language" : "en"
  }
}, {
  "version" : "v1",
  "timestamp" : "2016-01-10T00:00:00.000Z",
  "event" : {
    "hotel_id" : 2,
    "revenue" : 567.12,
    "language" : "en"
  }
} ]

nav...@gmail.com

unread,
Apr 4, 2016, 10:22:40 PM4/4/16
to Druid Development
Could you check https://github.com/druid-io/druid/pull/2032 is the one you are finding?

Kyro Zetera

unread,
Apr 6, 2016, 12:29:58 AM4/6/16
to Druid Development
nav, I think that might be what I'm looking for.  
My understanding of the `limitGranularity` would make me think you would need to set that to `all` to sort over the entire set rather than the granularityt buckets, is that right?

Thanks for pointing me to that. I was getting nowhere searching for it.
Reply all
Reply to author
Forward
0 new messages