PostgreSQl Database Cleanup Questions

169 views
Skip to first unread message

Cory Johnson

unread,
Mar 23, 2022, 12:10:58 PM3/23/22
to xnat_discussion
We are attempting to migrate our 1.6 xnat systems to 1.7 and the restores are taking weeks.

Some of the systems started as clones of each other and then new projects were added and old projects deleted.

As a result, the database dump of one system with only 6 studies is 17GB.

I queried the Postgres database for the top 10 largest tables and found xdat_user_login is by far the largest, as shown below.

Has anyone delete historical data from this tor other tables, such as the _history tables?

Any issues or gotchas to be aware of?

adir_xnat-# limit 10;
 table_schema |         table_name         | total_size | data_size | external_size
--------------+----------------------------+------------+-----------+---------------
 public       | xdat_user_login            | 17 GB      | 4980 MB   | 12 GB
 public       | xdat_user_login_meta_data  | 10057 MB   | 4476 MB   | 5582 MB
 public       | xdat_change_info           | 9388 MB    | 2702 MB   | 6686 MB
 public       | wrk_workflowdata           | 1797 MB    | 649 MB    | 1148 MB
 public       | wrk_workflowdata_meta_data | 1566 MB    | 595 MB    | 970 MB
 public       | xnat_resource_history      | 636 MB     | 589 MB    | 48 MB
 public       | xnat_imagescandata_history | 550 MB     | 502 MB    | 48 MB
 public       | xnat_mrscandata_history    | 409 MB     | 361 MB    | 48 MB
 xdat_search  | xs_item_access             | 391 MB     | 391 MB    | 136 kB
 public       | adir_mrscan_history        | 358 MB     | 310 MB    | 48 MB

Herrick, Rick

unread,
Mar 23, 2022, 3:00:13 PM3/23/22
to xnat_di...@googlegroups.com

I don’t see any reason you couldn’t clear the xdat_user_login, xdat_change_info, xdat_search.xdat_change_info, and *_history tables, although you might want to dump their contents and keep it around somewhere just in case.

 

The wrk_workflowdata table is a bit iffier if you want to maintain any sort of provenance for your data, but if that’s not something you need you could clear that as well (again, dumping the contents for posterity).

 

You should be careful when clearing the metadata tables: you don’t want to delete rows there for rows that are still in the corresponding primary tables. Once you clear out a primary table, you can delete any orphaned metadata rows with a query something like this:

 

DELETE FROM xdat_user_login_meta_data WHERE meta_data_id NOT IN (SELECT user_login_info FROM xdat_user_login);

 

Of course if you delete everything in the primary table there won’t be corresponding rows there so you can just delete all of that as well.

 

Another table you can clear without concern is xs_item_cache, which can swell up to a lot of rows and the content of a single row can be really large.

 

Of course, when doing this always make sure you have safe and secure backups of your original data so you can roll back in case anything goes wrong! It’s not a bad idea to take snapshots along the way as well.

 

-- 

Rick Herrick

Sr. Programmer/Analyst

Neuroinformatics Research Group

Washington University School of Medicine

Phone: +1 (314) 273-1645

 

From: 'Cory Johnson' via xnat_discussion <xnat_di...@googlegroups.com>
Date: Wednesday, March 23, 2022 at 11:11 AM
To: xnat_discussion <xnat_di...@googlegroups.com>
Subject: [XNAT Discussion] PostgreSQl Database Cleanup Questions

* External Email - Caution *

--
You received this message because you are subscribed to the Google Groups "xnat_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xnat_discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xnat_discussion/50846828-1c79-4987-a50a-404c91f984c0n%40googlegroups.com.

 


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Cory Johnson

unread,
Mar 23, 2022, 3:29:42 PM3/23/22
to xnat_discussion
Thank you Rick! Helpful as always!

The reason for the cleanup is many of our xnat systems have been in service since 2012\ 2013 and are still version 1.6 and we need to migrate but the pg_restore attempts have executed for weeks without completion. 

As shown in the following image, the dumps are rather large. I believe the deletions I have done so far will have dropped the size of the last server in table to about half. There is no guarantee this will increase the speed of the restore.

I noted Jenny, in the following discussion, modified some indexes for performance (not restore related); do you have any indexing suggestions that may increase restore speed?

{061D4C0C-0596-441D-8C5C-0A16D16ED0C8}.png

Cory Johnson

unread,
Mar 24, 2022, 11:54:24 AM3/24/22
to xnat_discussion
Hello Rick,

I attempted the following delete command on a couple systems yesterday, letting the command execute for about 16 hours before killing it on each instance:
  • DELETE FROM xdat_user_login_meta_data WHERE meta_data_id NOT IN (SELECT user_login_info FROM xdat_user_login);
I thought I may have to add an index for the two columns in the two tables but it appears they are already indexed.

I looked at some of the records in following tables; are they really just table for logging that the data could be completely deleted, with the exception of xnat instances used for FDA storage? or would doing so break mapping to experiments, scan sessions, etc.?
  • xdat_user_login
  • xdat_user_login_meta_data
  • xdat_user_login
  • xdat_user_login_meta_data
Is there an xnat data model available or documentation on the data structure?

Regards, 

Cory

Herrick, Rick

unread,
Mar 24, 2022, 2:36:39 PM3/24/22
to xnat_di...@googlegroups.com

Yeah, that table only contains info on when users logged in and out as well as the IP address from which they logged in, the ID of their HTTP session, and the ID of the node they actually logged in on (only recorded in multinode configurations). You can find the data model definition for xdat_user_login in the xdat XML schema definition. You can find the corresponding data models for other tables/data types in other XSDs. The various prefixes and XSDs include:

 

 

Most things you’re interested in are going to be in the xdat_* and xnat_* XSDs.

 

-- 

Rick Herrick

XNAT Architect/Developer

Computational Imaging Laboratory

Washington University School of Medicine

 

 

From: 'Cory Johnson' via xnat_discussion <xnat_di...@googlegroups.com>
Date: Thursday, March 24, 2022 at 10:54 AM
To: xnat_discussion <xnat_di...@googlegroups.com>
Subject: Re: [XNAT Discussion] PostgreSQl Database Cleanup Questions

* External Email - Caution *

Hello Rick,

 

I attempted the following delete command on a couple systems yesterday, letting the command execute for about 16 hours before killing it on each instance:

  • DELETE FROM xdat_user_login_meta_data WHERE meta_data_id NOT IN (SELECT user_login_info FROM xdat_user_login);

I thought I may have to add an index for the two columns in the two tables but it appears they are already indexed.

 

I looked at some of the records in following tables; are they really just table for logging that the data could be completely deleted, with the exception of xnat instances used for FDA storage? or would doing so break mapping to experiments, scan sessions, etc.?

  • xdat_user_login
  • xdat_user_login_meta_data
  • xdat_user_login
  • xdat_user_login_meta_data

Is there an xnat data model available or documentation on the data structure?

 

Regards, 

 

Cory

On Wednesday, March 23, 2022 at 2:29:42 PM UTC-5 Cory Johnson wrote:

Thank you Rick! Helpful as always!

 

The reason for the cleanup is many of our xnat systems have been in service since 2012\ 2013 and are still version 1.6 and we need to migrate but the pg_restore attempts have executed for weeks without completion. 

 

As shown in the following image, the dumps are rather large. I believe the deletions I have done so far will have dropped the size of the last server in table to about half. There is no guarantee this will increase the speed of the restore.

 

I noted Jenny, in the following discussion, modified some indexes for performance (not restore related); do you have any indexing suggestions that may increase restore speed?


Reply all
Reply to author
Forward
0 new messages