I tend to do a JOIN query speed, by comparing between distributed database and non-distributed database.
2 tables are being used. There are 500k row data in each tables.
The following commands are used to create distributed table.
CREATE TABLE events(
id bigint NOT NULL,
data jsonb,
insert_timestamp timestamp default now()
);
CREATE TABLE actions(
id bigint NOT NULL,
data jsonb,
insert_timestamp timestamp default now()
);
SELECT master_create_distributed_table('events', 'id');
SELECT master_create_worker_shards('events', 16, 2);
SELECT master_create_distributed_table('actions', 'id');
SELECT master_create_worker_shards('actions', 16, 2);
This is the result I'm getting
(non distributed. tables were created without invoking master_create_distributed_table & master_create_worker_shards)
select * from events e inner join actions a on e.id = a.id limit 10000;
788.425 ms
(distributed. tables were created followed by master_create_distributed_table & master_create_worker_shards)
select * from events e inner join actions a on e.id = a.id limit 10000;
8245.667 ms
Again, I create index by
CREATE INDEX e_id ON events (id);
CREATE INDEX a_id ON actions (id);
The outcome is still the same.
It seems that for distributed database, the it slow down by 10x. May I know, is this an expected behaviour. Is there anything we can do to speed up the query?