Bareos database filled up root partition; backups stopped - what now?

28 views
Skip to first unread message

Riot Nrrrd

unread,
Jun 20, 2025, 4:02:35 PMJun 20
to bareos-users
At my work we have a Bareos setup with about 70 or so clients.

It wasn't set up by me originally; I inherited it.  It (Bareos 22.1.4) was set up on a RHEL system with dual SSDs for the root volume and the total disk space on "/" is around 380 GB.  Well, due to 'slow creep' eventually the Bareos PostgreSQL database filled up the partition to 100% and now backups have stopped.

The last job before they stopped failed with

--
18-Jun 21:06 bareos-sd JobId 93060: Releasing device "Disk2" (/export/bareos/storage2).
18-Jun 21:06 bareos-sd JobId 93060: Elapsed time=01:06:55, Transfer rate=12.86 M Bytes/second
18-Jun 21:06 bareos-dir JobId 93060: Insert of attributes batch table with 475489 entries start
18-Jun 21:07 bareos-dir JobId 93060: Fatal error: cats/sql_create.cc:815 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, Name, LStat, MD5, DeltaSeq, Fhinfo, Fhnode) SELECT batch.FileIndex, batch.JobId, Path.PathId, batch.Name, batch.LStat, batch.MD5, batch.DeltaSeq, batch.Fhinfo, batch.Fhnode FROM batch JOIN Path ON (batch.Path = Path.Path) : ERR=ERROR:  relation "batch" does not exist
LINE 1: ..., batch.DeltaSeq, batch.Fhinfo, batch.Fhnode FROM batch JOIN...
--

I tried using the bconsole 'prune' command to prune back the jobs, hoping it might result in a database shrinkage.  Instead it just kept getting larger. :-(  I tried asking ChatGPT for suggestions and it just returned a bunch of pgsql commands that I don't really understand (not that I'd trust ChatGPT anyway).

Does anyone have any 'ELI5" suggestions on what to do?

I suppose I could shut down Bareos and the database and move /var/lib/pgsql to one of the data (backups) volumes and out of the root partition, but I was hoping I could solve this without having to move it by getting the database to shrink.  Is that a possibility?

Ruth Ivimey-Cook

unread,
Jun 22, 2025, 8:30:04 PMJun 22
to bareos-users

If this were me, I would:

- Stop both Bareos and pgsql processes on the server;

- If you can, copy the root '/' to a removable drive for safety.

- Prune any cruft from the root volume to make space, for example prune journalctl (--vacuum-size 4M), syslog, cached rpm packages, etc. Anything else that can be recreated/downloaded easily. Use "sudo du -s" to discover disk space used by parts of the system, e.g. "sudo du -sm /var/*" will show the total megabytes used by every directory under /var. Check the tape spool area too, if you are using it.

If in pruning cruft you can save 1 to 2GB then try to get pgsql running again. I don't know pgsql well, but 'big' databases generally don't release disk space as soon as you delete a row -- they just mark the space as unused. in mysql you would run "analyze" commands to ask the db engine to compact the space and free up what can be freed, but I don't know for pg.

It might be simpler, assuming pgsql runs, to take a sql-format backup of the db, which can then be reloaded into a new pgsql instance & hence only consume the space actually needed. I would imagine you have such a sql-format db backup, already but ...

If you can't prune enough to get pgsql to work properly again, I think the best option is to move the pgsql data files onto a new drive -- either install a third ssd or use a removable drive. It doesn't have to be fast or fancy. Once that is online, delete the pgsql files from /var/.../pgsql and mount the new drive "on top" of the /var/.../pgsql directory [sorry, can't recall exact dir name]. Having done that you should be able to bring up pgsql again properly.

Get pgsql to check it's database integrity. It looks like pg_checksums is the tool.

If that all pans out, start bareos again and see if it's happy. If so (probably is) delete the last job, as it's incomplete, and see if you can manually restart it. Hopefully you can do so and all will be well.

Once initial recovery complete:

I suggest ordering a couple of 1TB SSDs and use them (mirrored) as a dedicated pgsql drive, migrating the whole DB to them, and reserving the 380GB drives (mirrored) for system use. Consider using a data-checksumming filesystem such as btrfs or zfs, so you can discover if the drives fail.

If you have been using the tape spool function, I strongly suggest using a _separate_ physical drive/drives for that purpose as the spool is writing a lot of data and so ssd wear will be significant. Also, the spool area has the chance of accumulating spooled data from failed jobs, which can very quickly fill a system disk.


Hope this all helps,

Ruth

--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bareos-users/7e58bcbc-0913-408c-8909-1a43b210d5bbn%40googlegroups.com.

Birgit Ducarroz

unread,
Jun 27, 2025, 11:55:55 AMJun 27
to bareos-users
Hi,

In case the server is "unbootable" because the root directory is full, you could try to mount from an usb stick and access your root partition by mounting it on the stick.
You could then rsync or move the bareos database to a mount of an external storage.
Once the rsync has been successful, delete the database on / then restart your server.
Once the root partition is bootable again, make sure that bareos will find the new database location. You should either adapt it in the configuration file
vi /etc/postgresql/<database>/main/postgresql.conf - search for "data_directory =" or you can crate a symbolic link inside /var/lib (something like postgresql -> /your/mount/path/to-the-new-database-location).

Start then the database and check if it is running.
systemctl start postgresql

To test the bareos services, use 
/usr/sbin/bareos-sd -t
/usr/sbin/bareos-fd -t
/usr/sbin/bareos-dir -t

If everythig is okey, you should be able to start the bareos services:
/bin/systemctl start bareos-dir bareos-sd bareos-fd

Hoping that might help you.
Birgit
Reply all
Reply to author
Forward
0 new messages