I have some backups to an LTO-8 pool
I have some backups to an LTO-9 pool (that I'm migrating to from LTO-8)
I have 2 LTO-8 drives in a changer
I have 2 LTO-9 drives in a changer
I'm doing VirtualFull backups with a destination of an offsite LTO-9 pool.
I'm finding that bareos is starting 2 VirtualFull backups at the same time and appears to be deadlocked waiting for drives. I expected bareos to reserve a drive for reading and for writing and then go and block other jobs.
Things that I've tried change to reduce this down to a single job running at a time:
- Director -> Director -> Maximum Concurrent Jobs = 1
- Director -> Client (bareos-fd) -> Maximum Concurrent Jobs = 1
- Director -> Client (client1-fd) -> Maximum Concurrent Jobs = 1
- Director -> Storage (LTO-8) -> Maximum Concurrent Jobs = 1
- Director -> Storage (LTO-9) -> Maximum Concurrent Jobs = 1
I'm doing a reload after making each change and I have not undone any of the changes.
After I reload I cancel one of the running jobs and add it back to the queue so that it gets picked up later.
I'm still seeing bareos execute 2 jobs and neither is making any progress.
Output of one of the jobs
2025-04-17 13:06:02 bareos-dir JobId 20624: Version: 24.0.3~pre0.54685a85d (27 March 2025) Red Hat Enterprise Linux release 9.5 (Plow)
2025-04-17 13:06:02 bareos-dir JobId 20624: Start Virtual Backup JobId 20624, Job=client1-job1-offsite.2025-04-12_00.01.01_17
2025-04-17 13:06:02 bareos-dir JobId 20624: Bootstrap records written to /var/lib/bareos/bareos-dir.restore.100.bsr
2025-04-17 13:06:02 bareos-dir JobId 20624: Consolidating JobIds 20078,20239,20393,20543 containing 49 files
2025-04-17 13:06:02 bareos-dir JobId 20624: Connected Storage daemon at
bareos.mgmt.bbn.com:9103, encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
2025-04-17 13:06:02 bareos-dir JobId 20624: Encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "LTO-9_drive1" to read.
2025-04-17 13:06:03 bareos-sd JobId 20624: Using just in time reservation for job 20624
2025-04-17 13:06:03 bareos-dir JobId 20624: Using Device "JustInTime Device" to write.
LTO-9 storage status
JobId=20624 Level=Virtual Full Type=Backup Name=client1-job1-offsite Status=Created
Reading: Volume=""
pool="onsite-LTO-9" device="LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
Writing: Volume=""
pool="offsite-LTO-9" device="LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
spooling=0 despooling=0 despool_wait=0
Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0
FDSocket closed
JobId=20647 Level=Virtual Full Type=Backup Name=client1-job2-offsite Status=Created
Reading: Volume=""
pool="onsite-LTO-9" device="LTO-9_drive0" (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst)
Writing: Volume=""
pool="offsite-LTO-9" device="LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
spooling=0 despooling=0 despool_wait=0
Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0
FDSocket closed
====
Jobs waiting to reserve a drive:
3603 JobId=20624 device "LTO-9_drive0" (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading.
3609 JobId=20624 Max concurrent jobs exceeded on drive "LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst).
3603 JobId=20647 device "LTO-9_drive0" (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst) is busy reading.
3609 JobId=20647 Max concurrent jobs exceeded on drive "LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst).
...
Used Volume status:
ANJ645L9 on device "LTO-9_drive0" (/dev/tape/by-id/scsi-35000e111ca01f0c9-nst)
Reader=1 writers=0 reserves=1 volinuse=0
ANJ646L9 on device "LTO-9_drive1" (/dev/tape/by-id/scsi-35000e111ca01f0d3-nst)
Reader=0 writers=0 reserves=1 volinuse=0
Read Volume: 003048L8 no device. volinuse= 0
Read Volume: 003041L8 no device. volinuse= 0
Read Volume: 003048L8 no device. volinuse= 0
Read Volume: ANJ621L9 no device. volinuse= 0
Read Volume: ANJ651L9 no device. volinuse= 0
The status of the LTO-8 storage only shows me the LTO-9 information, nothing about LTO-8 drives in use.
How do I get bareos unstuck?
Jon