How to store json without knowing the schema

2,353 views
Skip to first unread message

Chris Roebuck

unread,
Jun 17, 2016, 4:50:44 AM6/17/16
to ClickHouse
Is there a recommended way to store arbitrarily nested, user-defined json objects efficiently in clickhouse so that any individual property value of the json can be queried and analysed?

The challenge is that the schema of the json object is not known upfront by clickhouse so how to design a table schema for any json object and secondly how to know which table engine to use for this purpose?

man...@gmail.com

unread,
Jun 17, 2016, 5:16:31 PM6/17/16
to ClickHouse
First, extract all common fields and store them in separate columns.
Even "sparse" fields better to store in separate columns.

Columns with values, presented in 1% of rows are Ok.
For example, when storing web-analytics logs, you could have such rare columns as "PushAPIStatus" in main table.
Up to few hundreds of columns in table is fine.

But after all, your JSON could have fields that couln't be known in advance. Such as user-provided attributes.
There are several ways to deal with them:

1. Key and Value columns of Array(String) type.
Could be written as Attributes Nested(Key String, Value String).
Example: https://gist.github.com/alexey-milovidov/d6ffc9e0bc0bc72dd7bca90e76e3b83b

2. For limited number of user-provided attributes, you could create columns Attr1 String, Attr2 String, ... Attr10 String.
This is rather rigid.

3. Store whole JSON in String field and access fields with visitParamExtract family of functions.

How to choose storage engine:
For most cases, use MergeTree or ReplicatedMergeTree.
Reply all
Reply to author
Forward
0 new messages