comparison of JOIN query speed

Yan Cheng Cheok

unread,

Feb 24, 2016, 12:34:36 AM2/24/16

to pg_shard users

I tend to do a JOIN query speed, by comparing between distributed database and non-distributed database.

2 tables are being used. There are 500k row data in each tables.

The following commands are used to create distributed table.

CREATE TABLE events(
    id bigint NOT NULL,
    data jsonb,
    insert_timestamp timestamp default now()
);

CREATE TABLE actions(
    id bigint NOT NULL,
    data jsonb,
    insert_timestamp timestamp default now()
);

SELECT master_create_distributed_table('events', 'id');
SELECT master_create_worker_shards('events', 16, 2);
SELECT master_create_distributed_table('actions', 'id');
SELECT master_create_worker_shards('actions', 16, 2);

This is the result I'm getting

(non distributed. tables were created without invoking master_create_distributed_table & master_create_worker_shards)
select * from events e inner join actions a on e.id = a.id limit 10000;
788.425 ms

(distributed. tables were created followed by master_create_distributed_table & master_create_worker_shards)
select * from events e inner join actions a on e.id = a.id limit 10000; 
8245.667 ms

Again, I create index by

CREATE INDEX e_id ON events (id);
CREATE INDEX a_id ON actions (id);

The outcome is still the same.

It seems that for distributed database, the it slow down by 10x. May I know, is this an expected behaviour. Is there anything we can do to speed up the query?

Onder Kalaci

unread,

Feb 26, 2016, 6:21:56 PM2/26/16

to pg_shard users

Hi Yan,

No, it is not the expected behaviour.

Are the ids on both tables the same? This may lead to fetching 10,000 rows from each shard join, which might end up lots of network traffic. The network latencies could be the dominating factor of the query execution times in your case.

To test that, could you share the execution times for the following query "SELECT count(*) FROM events e inner join actions a on e.id = a.id"

Thanks,

Onder

James Wang

unread,

Feb 29, 2016, 11:40:28 AM2/29/16

to pg_shard users

What is your query explain (I am from MySQL) results please?

Reply all

Reply to author

Forward