I tend to test with the following table, with additional column yymmdd (year year month month day day)
CREATE TABLE cheok_ddmmyy
(
id bigint,
data text,
yymmdd text
);
SELECT master_create_distributed_table('cheok_ddmmyy', 'yymmdd');
SELECT master_create_worker_shards('cheok_ddmmyy', 16, 1);
insert into cheok_ddmmyy values (1, 'hello 1', '160318');
insert into cheok_ddmmyy values (2, 'hello 2', '160318');
insert into cheok_ddmmyy values (3, 'hello 3', '160318');
insert into cheok_ddmmyy values (4, 'hello 4', '160318');
insert into cheok_ddmmyy values (5, 'hello 5', '160319');
insert into cheok_ddmmyy values (6, 'hello 6', '160319');
I can confirm data is distributed based on day. It seems that it meets my 2 requirements.
My question is, is it necessary we create an INDEX on yymmdd column, or it is not necessary?
Also, I would like to understand DISTRIBUTE BY APPEND
I perform
CREATE TABLE cheok_timestamp
(
rockman_id bigint,
data text,
time timestamp
) DISTRIBUTE BY APPEND (time);
But when I do
SELECT master_create_worker_shards('cheok_timestamp', 16, 1);
ERROR: unsupported table partition type: a
Is it because table created using DISTRIBUTE BY APPEND is *not* suitable to insert data in real-time manner? As I saw the example given in https://www.citusdata.com/documentation/citusdb-documentation/user_guide/append_data_loading.html , it seems that it is meant for insertion by bulk, by loading CSV data from disk? Am I right?
Currently, we wish to
- Perform real-time data insertion
- Data should be distributed among servers, based on day. For instance, on 16 May 16, all data should goes to server A; on 17 May 16, all data should goes to server B, ...
I tend to test with the following table, with additional column yymmdd (year year month month day day)
CREATE TABLE cheok_ddmmyy
(
id bigint,
data text,
yymmdd text
);
SELECT master_create_distributed_table('cheok_ddmmyy', 'yymmdd');
SELECT master_create_worker_shards('cheok_ddmmyy', 16, 1);
insert into cheok_ddmmyy values (1, 'hello 1', '160318');
insert into cheok_ddmmyy values (2, 'hello 2', '160318');
insert into cheok_ddmmyy values (3, 'hello 3', '160318');
insert into cheok_ddmmyy values (4, 'hello 4', '160318');
insert into cheok_ddmmyy values (5, 'hello 5', '160319');
insert into cheok_ddmmyy values (6, 'hello 6', '160319');
I can confirm data is distributed based on day. It seems that it meets my 2 requirements.
My question is, is it necessary we create an INDEX on yymmdd column, or it is not necessary?
Also, I would like to understand DISTRIBUTE BY APPEND
I perform
CREATE TABLE cheok_timestamp
(
rockman_id bigint,
data text,
time timestamp
) DISTRIBUTE BY APPEND (time);
But when I do
SELECT master_create_worker_shards('cheok_timestamp', 16, 1);
ERROR: unsupported table partition type: aIs it because table created using DISTRIBUTE BY APPEND is *not* suitable to insert data in real-time manner? As I saw the example given in https://www.citusdata.com/documentation/citusdb-documentation/user_guide/append_data_loading.html , it seems that it is meant for insertion by bulk, by loading CSV data from disk? Am I right?