From how much data is the hash distribution efficient?

58 views
Skip to first unread message

Mario Sanz Rodrigo

unread,
May 12, 2016, 11:24:09 AM5/12/16
to citus-users
Hi,

I have set an example to test the operation of Citus.

I have the following table:

CREATE TABLE test (
int id,
int value,
PRIMARY KEY (id)
);

I have generated two tables (test1 and test2)
test1 -> 100,000 data
test2 -> 1,000,000 data

I compared query times in normal table vs distributed table and for example when I run "SELECT * FROM test1 WHERE id = 4000", the search time in the normal table is lower than in the distributed table. But when do the same in the table with 1,000,000 time data distributed query table is lower.

There is a minimum number of data from which the hash distribution is efficient?

Regards,

Mario.


Sai Srirampur

unread,
May 12, 2016, 5:28:14 PM5/12/16
to Mario Sanz Rodrigo, citus-users
Hi Mario,

Thank you for contacting us.

There isn't an exact answer to what number of rows will make hash partitioning show a significant difference between a distributed setup and a non distributed one, for lookup queries.  It depends on whether data fits in RAM, the schema, the indexes, among other factors.

Citus linearly scales memory(RAM) on the workers, hence allowing more amount of data getting cached in memory, when compared to a single node postgres. Hence improving the performance.

For single-row lookups on the distribution column, if the data is small enough that it fits in memory, you can expect latency to increase (by a few / several milliseconds depending on network) given an additional network hop inherent in the distributed system


For larger select queries such as aggregates, Citus parallelizes the queries across different cores leading to significant improvement in query times.

To understand your setup better, I had a few questions:
1. Are you using Citus 5.0 release?
2. Does your data fit in memory ?
3. What is the distribution column you used ?

The reason I ask the first question is because Citus 5.1 automatically switches to a more efficient executor for such type of queries to improve query time.

Sai

--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users...@googlegroups.com.
To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/citus-users/ca7d8b71-addb-47f8-9483-6267d131357d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mario Sanz Rodrigo

unread,
May 13, 2016, 6:07:59 AM5/13/16
to citus-users
Hello,

I am using the docker-compose.yml file, the images are:

- Citusdata / citus: 5.0.1
- Citusdata / workerlist-gen: 0.9.0

The distribution column That I used is 'id'. To test I created docker-machine (with the same characteristics in VirtualBox), 512MB RAM and 1 CPU.

For normal table I created a docker-machine and for Citus i have created one docker-machine for the master and 10 docker-machine for the worker. The table has distributed 128 shards.

I have saved data in the distributed table with the command:

mkdir chunks
split -n l / 64 data.csv chunks /

export PgDatabase = postgres
Chunks find / -type f | xargs -n 1 -P 64 sh -c 'echo $ 0 $ 0 -C `copy_to_distributed_table test2`'

Regards.

Sai Srirampur

unread,
May 13, 2016, 8:13:51 PM5/13/16
to Mario Sanz Rodrigo, citus-users
Hi Mario,

Thanks for your reply.

With Citus 5.0, you could try the following to check whether the performance improved:

There is a setting citus.remote_task_check_interval, which you could be set to a lesser value, may be 1ms and see if the performance improved.

                          set citus.remote_task_check_interval to '1ms';

You could read more about remote task check interval in the following link.

Also, as I mentioned in my previous email, Citus 5.1 will automatically switch to a more efficient executor for such type of queries to improve query times. We are planning Citus 5.1 to be released next week.

Sai

Reply all
Reply to author
Forward
0 new messages