How to store and query array of key value pairs?

225 views
Skip to first unread message

Ankit Nayan

unread,
Dec 14, 2020, 5:58:35 AM12/14/20
to Druid User

My data format is like:

{
 timestamp string,
 name string,
 kind int,
 status_code int,
 ...
 tags: 
 {
   key1: value1,
   key2: value2,
   ...
  }
}

The list of keys in tags is varying in size for each row and the keys are also not defined (it can be anything string).

I want to query dataset by tags, i.e., return all rows which have given key: value pair.

In Pinot, we can achieve this by storing all keys in a multi-dimension column TAGS__KEYS and all values in another multi-dimension column TAGS__VALUES and then query using MAP_VALUE function.

Sample query:

SELECT * FROM xyzView WHERE ( tags__KEYS = 'key1' AND tags__VALUES = 'value1' AND mapValue(tags__KEYS,'key1',tags__VALUES) = 'value1' )

How can we achieve the same in Druid? Druid also has a multi-dimension column but does not have mapValue functionality. Or can we achieve this using any other way in Druid?

Ankit Nayan

unread,
Mar 8, 2021, 6:27:50 AM3/8/21
to Druid User
Any inputs on this?

ClickHouse also has similar support using IndexOf in array. 

Ben Krug

unread,
Mar 8, 2021, 10:11:02 PM3/8/21
to druid...@googlegroups.com
I don't know that that functionality is built-in.  There are lookup tables and now "experimental" indexed tables, but they're for smaller lookup type of functionality.  Probably not good for arbitrary keys, as you describe.

You might be able to make something like it using multivalued dimensions and MVD functions like MV_OFFSET_OF and MV_OFFSET to extract the value that follows the key.  (If so, though, be aware of the behaviors of MVDs in GROUP BY queries, it may be counterintuitive in some cases.)

I hope that's helpful.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b0e2443b-098c-428b-98a4-29a9a779a3d4n%40googlegroups.com.

Ankit Nayan

unread,
Mar 9, 2021, 12:54:41 AM3/9/21
to Druid User
Thanks, Ben for taking out some time. I think that's the right set of functions I was looking for. These functions don't seem to be available in 0.19.0, are they available in 0.20.0? 

Ben Krug

unread,
Mar 9, 2021, 2:06:10 AM3/9/21
to druid...@googlegroups.com
I thought they'd been around for a while.  Did you capitalize?  It has to be, eg, MV_OFFSET, not mv_offset.  I don't have 0.19 running right now, but I could check tomorrow if you still have problems.

Ankit Nayan

unread,
Mar 9, 2021, 12:24:57 PM3/9/21
to Druid User
Not able to get this function in druid 0.19.0

OFFSET_DRUID_QUERY.png

Ben Krug

unread,
Mar 9, 2021, 1:17:37 PM3/9/21
to druid...@googlegroups.com
It doesn't autocomplete, but it will work.  I should maybe have suggested MV_OFFSET_OF.  (Same thing, won't autocomplete, but will work.  Eg, I tried 
SELECT MV_OFFSET_OF(d6_mvd,'ro') FROM dataTest
on a dataset with 0.19, and it worked.  (Note it starts counting with 0, and also, if the string isn't there, will return -1.)

Ankit Nayan

unread,
Mar 9, 2021, 2:58:08 PM3/9/21
to Druid User
You are right..it works. Thanks a lot for guiding me there. Appreciate your help

Ben Krug

unread,
Mar 9, 2021, 3:23:39 PM3/9/21
to druid...@googlegroups.com
You're welcome!  I hope it'll be useful!

Reply all
Reply to author
Forward
0 new messages