If this were me, I would:
- Stop both Bareos and pgsql processes on the server;
- If you can, copy the root '/' to a removable drive for safety.
- Prune any cruft from the root volume to make space, for example
prune journalctl (--vacuum-size 4M), syslog, cached rpm packages,
etc. Anything else that can be recreated/downloaded easily. Use
"sudo du -s" to discover disk space used by parts of the system,
e.g. "sudo du -sm /var/*" will show the total megabytes used by
every directory under /var. Check the tape spool area too, if you
are using it.
If in pruning cruft you can save 1 to 2GB then try to get pgsql running again. I don't know pgsql well, but 'big' databases generally don't release disk space as soon as you delete a row -- they just mark the space as unused. in mysql you would run "analyze" commands to ask the db engine to compact the space and free up what can be freed, but I don't know for pg.
It might be simpler, assuming pgsql runs, to take a sql-format backup of the db, which can then be reloaded into a new pgsql instance & hence only consume the space actually needed. I would imagine you have such a sql-format db backup, already but ...
If you can't prune enough to get pgsql to work properly again, I think the best option is to move the pgsql data files onto a new drive -- either install a third ssd or use a removable drive. It doesn't have to be fast or fancy. Once that is online, delete the pgsql files from /var/.../pgsql and mount the new drive "on top" of the /var/.../pgsql directory [sorry, can't recall exact dir name]. Having done that you should be able to bring up pgsql again properly.
Get pgsql to check it's database integrity. It looks like
pg_checksums is the tool.
If that all pans out, start bareos again and see if it's happy. If so (probably is) delete the last job, as it's incomplete, and see if you can manually restart it. Hopefully you can do so and all will be well.
Once initial recovery complete:
I suggest ordering a couple of 1TB SSDs and use them (mirrored)
as a dedicated pgsql drive, migrating the whole DB to them, and
reserving the 380GB drives (mirrored) for system use. Consider
using a data-checksumming filesystem such as btrfs or zfs, so you
can discover if the drives fail.
If you have been using the tape spool function, I strongly suggest using a _separate_ physical drive/drives for that purpose as the spool is writing a lot of data and so ssd wear will be significant. Also, the spool area has the chance of accumulating spooled data from failed jobs, which can very quickly fill a system disk.
Hope this all helps,
Ruth
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bareos-users/7e58bcbc-0913-408c-8909-1a43b210d5bbn%40googlegroups.com.