Data expiry

58 views
Skip to first unread message

M Murphy

unread,
May 19, 2016, 7:20:45 AM5/19/16
to cstore users
Dear cstore users,

How do you normally delete old data in cstore?  I see that cstore has a truncate function but as I understand it it's a standard truncate in that it deletes all the data in a table.  If you want to delete an initial section of the table, how would you do it?

I can, presumably, make a dozen tables and use inheritance to make them appear as a single table.  That is what I do for some time series at the moment, however before I charge off to use the hammer I'm familiar with it would be good to know whether there is a good alternative.  https://github.com/bitdivine/pg_day_partitions

Best wishes, Max

Murat Tuncer

unread,
May 23, 2016, 2:41:58 AM5/23/16
to cstore users
Hi Max;

Your method is currently the best one for removing data from cstore_fdw table as we do not support delete operation yet. Even

Please see the thread https://groups.google.com/forum/#!searchin/cstore-users/inherited/cstore-users/dusFfdah9Ac/5YaZbGPCBgAJ on how to make it work with cstore_fdw tables.

When you do that, please continue loading data in chunks (copy) directly to inherited table. Using an insert trigger on the parent table would not work for cstore_fdw tables.

An alternate method would be creating a new cstore_fdw table with only required data. Be careful with this if you plan to insert new data during the operation.


Please let me know if you need further assistance.
Murat

M Murphy

unread,
May 23, 2016, 6:52:55 AM5/23/16
to cstore users

Thanks Murat, good to know.

I have the data inheritance and expiry working.  One thing that is worrisome though was that the database was growing more rapidly than when poking the JSON blobs into a naive postgres bson column.  I will have to soak test this to make sure that cstore reduces the size and increases the speed as we hoped it would.

Best wishes, Max

Murat Tuncer

unread,
May 23, 2016, 8:08:57 AM5/23/16
to M Murphy, cstore users
Hi Max,

compression is not enabled by default. You need to explicitly state that you want compression, something like below.

CREATE FOREIGN TABLE cstore_table
(
 your columns here ...
)
SERVER cstore_server
OPTIONS(compression 'pglz');


Performance wise, cstore_fdw works best if you are accessing low number of columns.  I suggest extracting frequently used columns in where clauses and inserting as a separate column. This could reduce amount of data you transfer and improve your query performance.




--
You received this message because you are subscribed to the Google Groups "cstore users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cstore-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Murat Tuncer
Software Engineer | Citus Data
mtu...@citusdata.com

M Murphy

unread,
May 23, 2016, 9:27:34 AM5/23/16
to cstore users

I have compression enabled :-)  I have about 10G of raw data per hour as json blobs, including long string keys.  There are about 4000 "columns" so the high data usage might be because of some high overhead per column.  Most columns are long lived so, if this is the problem, disk usage should flatten out quite soon.  At the moment cstore is taking up about 1G every four minutes.  If the string key is stored only once that number should be or become much lower.

M Murphy

unread,
May 23, 2016, 10:05:17 AM5/23/16
to cstore users
Thanks for the hint though.  For interest:

The table schema is:
   CREATE FOREIGN TABLE speedo_cstore (tim timestamp, dat jsonb);

An entry in the dat column is of the form: { key1:123, key2:456, ..., key4000:999 }  (I thought I would start simple)

Looking at $pgdir/data/cstore_fdw/16384/978777 with hexdump the data has long sequences of zeros or uncompressed data.  The size of the cdata file is 5G.  If I compress it it drops to just under 500M.  Does cstore compress data as it is written to the file or retrospectively?

Best wishes, Max

M Murphy

unread,
May 23, 2016, 10:06:57 AM5/23/16
to cstore users

Sorry, I dropped the second half of the table schema.  It is:

  CREATE FOREIGN TABLE speedo_cstore (tim timestamp, dat jsonb) SERVER cstore_server OPTIONS(compression 'pglz');

Murat Tuncer

unread,
May 24, 2016, 1:44:17 AM5/24/16
to cstore users
cstore compresses data by column bases. it collects up to predefined number of rows  (block) for each column, serializes, then compresses.

hexdump does not always show compressed section. There are parts related to metadata that are not compressed.

could you share you data generation script so that I can test it locally ?

M Murphy

unread,
May 24, 2016, 5:14:02 AM5/24/16
to cstore users

Hello Murat,

I am using a live data feed so I can't share it easily, however I can make a toy data generator later that behaves similarly.

Thanks for looking into this.

Regarding data layout, there was initially a section that was largely zero, then came a lot of string keys from the JSON, then uncompressed numbers, then compressed chunks interspersed with more string keys, although interestingly the strings were ditted - they had zero bytes in them at an erratic spacing but about one zero every five or six bytes, breaking up the strings.  I might try to parse the file fully if that seems the most promising approach.

Regards, Max


On Thursday, 19 May 2016 12:20:45 UTC+1, M Murphy wrote:
Reply all
Reply to author
Forward
0 new messages