Hi ,
We noticed that gpdb5/6 does not clear out the disk space for a table that is created and truncated+dropped iteratively in a plpython function.
Setup
Local mac with gpdb5/gpdb6 compiled from source (also tested this on a gcp instance with gppkg)
1. create table foo as select 1 as i;
2. create table bar(i int, k bytea);
3. insert into bar values(1, <insert a big bytea string ~ 600 MB>); -- in our code, we have a plpython function that compiles a keras/tensorflow model, serializes it and then writes it to the table as bytea
Repro
create or replace function run() returns void as
$$
create_table_query="create table baaz as select * from foo join bar using (i);"
for i in range(1,10):
plpy.execute(create_table_query)
## In our production code, this table baaz is used to run some sql queries but for the repro we don't need to do anything with it.
plpy.execute('truncate baaz')
plpy.execute('drop table baaz')
plpy.info('done with i = {}'.format(i))
$$ language plpythonu;
select run();
Contrary to our assumption, the truncate + drop statement does not free up the disk space. We noticed that for each iteration, the disk size goes up by ~ 600MB (size of the large table baaz). This happens because truncate + drop does not clear out the disk on file and instead creates a new file in the base folder.
We then modified the run function to run all the sql statements in the same query so that they get run in the same subtransaction. This solved the issue on gpdb6 but not on gpdb5.
create or replace function run2() returns void as
$$
q="""
create table baaz as select * from foo join bar using (i);
truncate baaz;
drop table baaz;
"""
for i in range(1,10):
plpy.execute(q)
plpy.info('done with i = {}'.format(i))
$$ language plpythonu;
select run2();
We are now blocked on making it work for gpdb5, any suggestions ?
Thanks