How to get database size

214 views
Skip to first unread message

David Rosenstrauch

unread,
Dec 7, 2016, 2:22:46 PM12/7/16
to citus...@googlegroups.com
Does Citus have a special command for computing the size of a sharded
database? I tried using the standard PostgreSQL database size functions:

select pg_size_pretty(pg_database_size('db_name'));

But the number it's returning seems much too low. The size it tells me
is 2 orders of magnitude smaller than the size of the same data being
stored in an in-memory database. And I'm even storing the data in a
more compact format in the memory DB - storing integer user ID's rather
than full-text user names - so it doesn't seem possible that Citus could
be storing the data in so much less space.

Does that Postgres pg_database_size function calculate data size
correctly across a sharded table? Or is there some other utility I
should be using to figure out the data size?

Thanks,

DR

Sumedh Pathak

unread,
Dec 7, 2016, 4:21:41 PM12/7/16
to David Rosenstrauch, citus-users
Hi David,

This Github issue documents a workaround for getting the table size for each table: https://github.com/citusdata/citus_docs/issues/231. You could adapt that to calculate the total database size as well. We're planning to document that in Docs as well.

Thanks!
Sumedh



--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users+unsubscribe@googlegroups.com.
To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/citus-users/86c89b44-725d-8c8c-d3b4-4473818493c3%40darose.net.
For more options, visit https://groups.google.com/d/optout.



--

David Rosenstrauch

unread,
Dec 7, 2016, 4:25:46 PM12/7/16
to citus...@googlegroups.com
Thanks much for sending this along, Sumedh.

I think there might be an issue with this function though. I tried
running it against my database. It worked on 2 of the tables, but it
returned a size of 0 for the largest table in my DB. Any idea what the
issue might be?

Thanks,

DR
>> email to citus-users...@googlegroups.com.

Metin Doslu

unread,
Dec 8, 2016, 1:59:15 AM12/8/16
to David Rosenstrauch, citus-users
Hi David,

Which version of Citus (\dx) do you use?

Could you share results of the slightly modified version of the second query:

SELECT *, pg_size_pretty(citus_table_size(logicalrelid)) AS Size FROM pg_dist_partition;

Best,
Metin


To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/citus-users/86c89b44-725d-8c8c-d3b4-4473818493c3%40darose.net.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users+unsubscribe@googlegroups.com.

To post to this group, send email to citus...@googlegroups.com.

Brian Cloutier

unread,
Dec 8, 2016, 8:45:08 AM12/8/16
to Metin Doslu, David Rosenstrauch, citus-users
To give you some intuition, tables which are managed by citus (we call them distributed tables) are also regular postgres tables, but citus intercepts queries and forwards them to the workers to be run on the tables' shards. However, it doesn't intercept everything yet.

When you run `pg_database_size` citus never gets a chance to intercept so Postgres reports that those tables have size 0. To put it another way: `pg_database_size` is correctly reporting the size of your local database and not counting any data stored on other machines.

`citus_table_size`, from the above link, has the opposite behavior. It reads some metadata which citus stores on the distributed tables but that metadata isn't maintained for local tables. Meaning, on any tables which aren't managed by citus, `citus_table_size` will report a size of 0, since for those tables no data is being stored remotely!

It worked on 2 of the tables, but it returned a size of 0 for the largest table in my DB.

Is it possible that the table `citus_table_size` returns 0 for is a local table? (one you didn't call master_create_distributed_table on)

David Rosenstrauch

unread,
Dec 8, 2016, 9:28:33 AM12/8/16
to citus...@googlegroups.com
On 12/08/2016 01:59 AM, Metin Doslu wrote:
> Hi David,
>
> Which version of Citus (\dx) do you use?

6.0-18

> Could you share results of the slightly modified version of the second
> query:
>
> SELECT *, pg_size_pretty(citus_table_size(logicalrelid)) AS Size FROM
> pg_dist_partition;
>
> Best,
> Metin

Same thing. Large table shows as 0 bytes:

darose=# SELECT *, pg_size_pretty(citus_table_size(logicalrelid)) AS
Size FROM
darose-# pg_dist_partition;
logicalrelid | partmethod |
partkey
| colocationid | repmodel | size
------------------+------------+-----------------------------------------------------------------------------------------------------------------------------+--------------+----------+---------
profiles | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 132 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 25 MB
segments | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 259 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 3072 kB
profile_segments | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 132 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 0 bytes
(3 rows)

Thanks,

DR
>>>> email to citus-users...@googlegroups.com.
>>>> To post to this group, send email to citus...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/citus-users/86c89b44-725d-8c8c-d3b4-4473818493c3%40darose.net.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "citus-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to citus-users...@googlegroups.com.
>> To post to this group, send email to citus...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/citus-users/7d395e72-94ec-ca55-a20d-a5a9a4d1b4be%40darose.net.

David Rosenstrauch

unread,
Dec 8, 2016, 9:32:50 AM12/8/16
to citus...@googlegroups.com
On 12/08/2016 08:45 AM, Brian Cloutier wrote:
> Is it possible that the table `citus_table_size` returns 0 for is a local
> table? (one you didn't call master_create_distributed_table on)

Hmmm ... possible. Let me check that.

Is there a query to run to find out which tables are known to Citus as
distributed and which are not?

Thanks,

DR

David Rosenstrauch

unread,
Dec 8, 2016, 9:37:37 AM12/8/16
to citus...@googlegroups.com
Looking at master_get_table_metadata, it looks like my table is defined
as distributed. But it still reports a size of zero.

darose=# select * from master_get_table_metadata('profile_segments');
logical_relid | part_storage_type | part_method | part_key |
part_replica_count | part_max_size | part_placement_policy
---------------+-------------------+-------------+----------+--------------------+---------------+-----------------------
16945 | t | h | user_id |
2 | 1073741824 | 2
(1 row)


DR

Metin Doslu

unread,
Dec 8, 2016, 9:39:07 AM12/8/16
to David Rosenstrauch, citus-users
Hi David,

Could you update citus_table_size() to use max() instead of min() like below, and try again:

CREATE OR REPLACE FUNCTION citus_table_size(table_name regclass)
RETURNS bigint LANGUAGE plpgsql
AS $function$
DECLARE
        table_size bigint;
BEGIN
        PERFORM master_update_shard_statistics(shardid)
        FROM pg_dist_shard
        WHERE logicalrelid = table_name;

        SELECT sum(shard_size) INTO table_size
        FROM (
                SELECT max(shardlength) AS shard_size
                FROM pg_dist_shard JOIN pg_dist_shard_placement USING (shardid)
                WHERE logicalrelid = table_name
                GROUP BY shardid
        ) shard_sizes;

        RETURN table_size;
END;
$function$;




DR

--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users+unsubscribe@googlegroups.com.
To post to this group, send email to citus...@googlegroups.com.

David Rosenstrauch

unread,
Dec 8, 2016, 10:08:43 AM12/8/16
to citus...@googlegroups.com
There we go!

darose=# SELECT *, pg_size_pretty(citus_table_size(logicalrelid)) AS
Size FROM
pg_dist_partition;
logicalrelid | partmethod |
partkey
| colocationid | repmodel | size
------------------+------------+-----------------------------------------------------------------------------------------------------------------------------+--------------+----------+---------
profiles | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 132 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 25 MB
segments | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 259 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 3072 kB
profile_segments | h | {VAR :varno 1 :varattno 1 :vartype
1043 :vartypmod 132 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno
1 :location -1} | 14 | c | 391 MB
(3 rows)

Thanks!

DR

On 12/08/2016 09:39 AM, Metin Doslu wrote:
> Hi David,
>
> Could you update citus_table_size() to use max() instead of min() like
> below, and try again:
>
> CREATE OR REPLACE FUNCTION citus_table_size(table_name regclass)
> RETURNS bigint LANGUAGE plpgsqlAS $function$
>> email to citus-users...@googlegroups.com.

Metin Doslu

unread,
Dec 8, 2016, 10:11:13 AM12/8/16
to David Rosenstrauch, citus-users
Great!

I suspect that you have invalid shards. Could you run this line?

SELECT * FROM pg_dist_shard_placement WHERE shardstate != 1;

Did you get errors while inserting data?

Best,
Metin


To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/citus-users/a1dbb468-93fb-f37a-5471-e8e6a0618d09%40darose.net.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users+unsubscribe@googlegroups.com.

To post to this group, send email to citus...@googlegroups.com.

David Rosenstrauch

unread,
Dec 8, 2016, 10:17:01 AM12/8/16
to citus...@googlegroups.com
Hmmm ... yes. Looks like I do have a few shards like that. (32 of
them.) I had an issue a while back where dhcp changed the IP address of
one of my nodes while I was loading data.

What should be done to fix this? The data is all POC/test, so I can
drop and re-create the tables if needed. Or is there some other fix I
should apply?

Thanks,

DR
>>>> email to citus-users...@googlegroups.com.
>>>> To post to this group, send email to citus...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/citus-users/a1dbb468-93fb-f37a-5471-e8e6a0618d09%40darose.net.
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "citus-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to citus-users...@googlegroups.com.
>> To post to this group, send email to citus...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/citus-users/38b0ddfc-2d48-d11b-7bab-a0dca909981a%40darose.net.

Metin Doslu

unread,
Dec 8, 2016, 10:27:32 AM12/8/16
to David Rosenstrauch, citus-users
Hi David,

Yes, dropping and recreating the table will work.

For a production cluster;

You can use rebalance_table_shards() in enterprise version;

Or you can manually repair inactive shards by using use master_copy_shard_placement() in community edition;

Best,
Metin


To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/citus-users/a1dbb468-93fb-f37a-5471-e8e6a0618d09%40darose.net.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups
"citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an

To post to this group, send email to citus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/ms
gid/citus-users/38b0ddfc-2d48-d11b-7bab-a0dca909981a%40darose.net.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "citus-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to citus-users+unsubscribe@googlegroups.com.

To post to this group, send email to citus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages