Is it possible to perform copy_to_distributed_table or stage directly from memory data?

Skip to first unread message

Yan Cheng Cheok

Feb 22, 2016, 5:33:45 AM2/22/16
to pg_shard users
Previously, without using any distribution table, I can perform a very fast data insertion, by COPY data from memory (instead of disk file)

// In PHP
pg_query($con, "copy events(id, data) from stdin");
for ($i=0; $i<$count; $i++) {
= sprintf($data_template, $i);
= $i . "\t" . pg_escape_string($data) . "\n";
($con, $query);
($con, "\\.\n");

However, when I enable worker shards by

    SELECT master_create_worker_shards('events', 16, 2);

I will get the following error

    Query failed: ERROR:  COPY commands on distributed tables are unsupported

Soon, I realize I need to use either
[ ] or stage command [ ]

However, either copy_to_distributed_table or stage command, require the data to be in CSV disk file.

Is there any way we can transfer the data from memory directly to postgres, by having equivalent speed as COPY's ?


Onder Kalaci

Feb 23, 2016, 10:28:20 PM2/23/16
to pg_shard users
Hi Yen, 

Unfortunealty current implementation of copy_to_distributed_table expects to read the data from a file. Since the script basically triggers a copy command, it might be possible to allow reading from the stdin with some changes in the script. 

However, there is an open pull request for implementing native COPY command for distributed tables on github. Basically, we plan to direct our users to use native COPY when it is checked-in. Our initial tests show that its ingest rate will be similar to what you get with postgres' COPY.

Again, it'd be useful to learn your target use case for better understanding.

Reply all
Reply to author
0 new messages