I have the following tables in PostgreSQL and I wonder what is the best schema in ClickHouse.
I have chat rooms and users.
A chat room can start on a different server each time. I track it with room_sessions.
A user can join a chat room session. I track it with user_sessions.
Each user sends 3 metrics every 10 seconds. I track metrics in the user_metrics table.
In postgres I can normalize the data and later join user_metrics with user_sessions and room_sessions. I can plot a user metric as a function of time and I can aggregate a metric for a user, room or server.
rooms - room_id(bigint), name(text)
users - user_id(bigint), name(text)
room_sessions - room_session_id(bigint), room_id(bigint), server(int), started(timestamptz), ended(timestamptz)
user_sessions - user_session_id(bigint), room_session_id(bigint), user_id(bigint), started(timestamptz), ended(timestamptz)
user_metrics - user_session_id(bigint), metric1(int), metric2(int), metric3(int), created(timestamptz)
1. In ClickHouse, should I put all columns in the same table to avoid joins?
2. What should I choose for primary key and order by?
3. Should I add secondary indexes for room_session_id and server?
4. Will columns compress well if I put the same server value on each user metric belonging to the same user session?
I would like to be able to:
1. Select metric1 of a specific user-session and plot it as a function of time.
2. Aggregate metric1 of all users belonging to a room-session and plot the avg as a function of time.
3. Aggregate metric1 of all users on the same server and plot the avg as a function of time.
CREATE TABLE user_metrics (
) ENGINE = MergeTree()
PRIMARY KEY (user_session_id, started),
ORDER BY (user_session_id, started);