Schema for user metrics

35 views

Skip to first unread message

pablo platt

unread,

Apr 10, 2021, 2:49:05 AM4/10/21

to ClickHouse

Hello,

I have the following tables in PostgreSQL and I wonder what is the best schema in ClickHouse.

I have chat rooms and users.

A chat room can start on a different server each time. I track it with room_sessions.

A user can join a chat room session. I track it with user_sessions.

Each user sends 3 metrics every 10 seconds. I track metrics in the user_metrics table.

In postgres I can normalize the data and later join user_metrics with user_sessions and room_sessions. I can plot a user metric as a function of time and I can aggregate a metric for a user, room or server.

PostgreSQL tables:

rooms - room_id(bigint), name(text)

users - user_id(bigint), name(text)

room_sessions - room_session_id(bigint), room_id(bigint), server(int), started(timestamptz), ended(timestamptz)

user_sessions - user_session_id(bigint), room_session_id(bigint), user_id(bigint), started(timestamptz), ended(timestamptz)

user_metrics - user_session_id(bigint), metric1(int), metric2(int), metric3(int), created(timestamptz)

1. In ClickHouse, should I put all columns in the same table to avoid joins?

2. What should I choose for primary key and order by?

3. Should I add secondary indexes for room_session_id and server?

4. Will columns compress well if I put the same server value on each user metric belonging to the same user session?

I would like to be able to:

1. Select metric1 of a specific user-session and plot it as a function of time.

2. Aggregate metric1 of all users belonging to a room-session and plot the avg as a function of time.

3. Aggregate metric1 of all users on the same server and plot the avg as a function of time.

CREATE TABLE user_metrics (
user_session_id UInt64,

user_id UInt64,

room_session_id UInt64,

room_id UInt64,

server UInt32,

metric1 UInt32,

metric2 UInt32,

metric3 UInt32,

started DateTime,

ended DateTime

) ENGINE = MergeTree()
PRIMARY KEY (user_session_id, started),
ORDER BY (user_session_id, started);