Extract data from a column with JSON string

328 views
Skip to first unread message

Arpit Jain

unread,
Dec 9, 2021, 6:32:45 AM12/9/21
to Druid User
Hi,

I have a String column with JSON data. I want to extract individual key/value from this JSON string e.g. I would like to extract "type" from below JSON stored as String

{type=MBS, pool={MaturityDate=20211216, Key=US01F0124C76, Key1=IR.MBS.USD, Yield=100.30859375}}

I have already extracted/flattened the data while ingestion but I would like to see if I can extract with query.

Is there a way to do this ?

Regards,

Peter Marshall

unread,
Dec 9, 2021, 7:07:35 AM12/9/21
to Druid User
Hey Arpit;

Two places to go hunting:

First, the standard SQL expressions - there may be something here: https://druid.apache.org/docs/latest/querying/sql.html#scalar-functions

Then there's Native query expressions in case you find anything so you can use a Native versus SQL query – I took a quick look but couldn't see any like Xpath things specifically - but I may not have had my eyes open !!! :D  (I believe actually that SQL expressions are translated into Native expressions by Calcite so if you can't find it here, there may not be something in SQL at all --- but I have known to be wrong!)
https://druid.apache.org/docs/latest/misc/math-expr.html

Then I wondered – could you form a REGEX to pull it out?  Or maybe some other kind of string manipulation?

There's also the option of using Javascript functions – you can read more about that here:

This does sound like a good idea to develop the available query-time functions...

Arpit Jain

unread,
Dec 10, 2021, 7:14:11 AM12/10/21
to Druid User
Thanks for the help.
I managed to extract a key/value using REGEXP scalar function. For example, to get type=MBS from the example I quoted above, I wrote below REGEX
REGEXP_EXTRACT(<columnname>, 'type=[a-zA-Z]+')
Reply all
Reply to author
Forward
0 new messages