Dynamic indexing for flattening JSON spec

502 views
Skip to first unread message

Shubham Goel

unread,
Oct 22, 2018, 1:49:11 AM10/22/18
to Druid User
Hello,
     I was trying to flatten the nested json objects coming in the druid cluster. I have a nested json object of the type.
{
 "abc" : "def" ,
 "xyz": [
        {
           "id" : 123,
           "pqr": "ghi"
        },
        {
           "id" : 456,
           "pqr": "jkl"
        }
       ]
}
Now the issue here is i have a json array "xyz" which many have any number of objects which i don't know initially. Is there any possible method to create automatic indexing while flattening the nested json such that if xyz[0].id is mapped to xyz_0_id then if in future a third object comes then that xyz[2].id is automatically mapped to xyz_2_id. Is there any alternate method to achieve the intended result somehow as the number of objects present in the object array is not known beforehand and may vary from time to time.
Thanks for your help,
Shubham

Shubham Goel

unread,
Oct 25, 2018, 10:21:08 AM10/25/18
to Druid User
here can i somehow use javascript parse spec such that i run a loop over the xyz array till it's length and then flatten all the json which come dynamically and then ingest it.

clint...@imply.io

unread,
Oct 25, 2018, 8:49:07 PM10/25/18
to Druid User
Hi Shubham,

Yeah, this is definitely not supported with flattening via the flattenSpec as described here http://druid.io/docs/latest/ingestion/flatten-json. If you know the upper limit of occurrences, a poor workaround would be to add a json path expression for each '$.xyz[n].id' mapping to 'xyz_n_id' up to that limit and have null values for rows which do not contain these values.

I've been investigating enhancing the flattening abilities of Druid ingestion when dealing with nested lists to be able to 'explode' out one row with nested list data into multiple rows, distributing list elements one per row to flatten, but this type of flattening lists into a single row also seems potentially useful so I'll take it into consideration. I unfortunately don't have a timeline on doing this work, it's just something I'd like to take a look at in the near future, so I can't make any promises as to when such functionality will exist.

As for the javascript parse spec (which I forgot even exists!) I looked through the code and it would appear that if you can write a function that will take a json string, parse the json string into an object inside of this function, and then transform that object to your desired flattened output map, it should work. Disclaimer: I haven't used this functionality personally so I'm unsure if this will actually work, and even if it does I would expect some amount of performance hit for the extra overhead of running a javascript function on every row. For further information, the details of using javascript parse spec is described here http://druid.io/docs/latest/ingestion/data-formats.html#javascript and general usage of javascript with Druid here http://druid.io/docs/latest/development/javascript.html.

Good luck!
Clint

Can Elmas

unread,
Apr 22, 2019, 7:54:14 AM4/22/19
to Druid User
Hi Clint,

I was considering your workaround, "a poor workaround would be to add a json path expression for each '$.xyz[n].id' mapping to 'xyz_n_id' up to that limit and have null values for rows which do not contain these values." ;

Is this doable with druid at the ingestion step? 

"path" or "jq" expressions can be given with json flatten spec in order to access a specific field in a json object. But is it possible to map these values and use dynamically constructed column names like xyz_0_id, xyz_1_pqr etc. with similar approach?

clint...@imply.io

unread,
Apr 22, 2019, 6:02:18 PM4/22/19
to Druid User
Hi Can,

I do not think there exists a mechanism today to dynamically create column names like that, I think it would have to be explicitly defined manually for each item of the array to extract, which is why I suggested it was a "poor" workaround. I'll poke around and update if I find otherwise though.

prasanna lakshmi

unread,
Aug 8, 2019, 6:08:28 AM8/8/19
to Druid User
Hi Shubham,

I am also facing the same issue with kafka json packets.Can you share me the javascript code ,whatever you are used for parsing the dynamic  nested json sample like below and then ingest,

"xyz": [
        {
           "id" : 123,
           "pqr": "ghi"
        },
        {
           "id" : 456,
           "pqr": "jkl"
        }
       ]

Prasanna.P

Bin Li

unread,
Aug 26, 2019, 3:53:21 AM8/26/19
to Druid User
Similar issue, but the object to flatten is map<string, int> which size of paris varied and new pair will be added in future. Finally, I use the "poor" solution in ingest spec,

{
  "type": "path",
  "name": "abtest_test1",
  "expr": "$.abtest.test1"
},
{
  "type": "path",
  "name": "abtest_test2",
  "expr": "$.abtest.test2"
},
...

It's somehow ugly even though it works.

在 2018年10月22日星期一 UTC+8下午1:49:11,Shubham Goel写道:
Reply all
Reply to author
Forward
0 new messages