One of our Virtual Full jobs has been failing every day during Always Incremental consolidation, and I'm having trouble figuring out why.
*list joblog jobid=18065
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
2024-07-23 09:01:22 bareos-dir-prod JobId 18065: Start Virtual Backup JobId 18065, Job=node0057-AI.2024-07-23_09.00.04_41
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Bootstrap records written to /var/lib/bareos/bareos-dir-prod.restore.45.bsr
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Consolidating JobIds 17646,13356,13422,13488,13556,13625,13694 containing 2025684 files
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Connected Storage daemon at bareos-sd-t1-prod.foo.bar.edu:9103, encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device "FileStorage5" to read.
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Max configured use duration=72,000 sec. exceeded. Marking Volume "node0057-AI-Consolidated-12250" as Used.
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Created new Volume "node0057-AI-Consolidated-12276" in catalog.
2024-07-23 09:01:41 bareos-dir-prod JobId 18065: Using Device "FileStorageConsolidated5" to write.
2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Labeled new Volume "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" (/var/lib/bareos/storage).
2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Wrote label to prelabeled Volume "node0057-AI-Consolidated-12276" on device "FileStorageConsolidated5" (/var/lib/bareos/storage)
2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Ready to read from volume "node0057-AI-Consolidated-6560" on device "FileStorage5" (/var/lib/bareos/storage).
2024-07-23 09:01:41 bareos-sd-t1-prod JobId 18065: Forward spacing Volume "node0057-AI-Consolidated-6560" to file:block 0:274.
2024-07-23 09:02:15 bareos-dir-prod JobId 18065: Insert of attributes batch table with 800001 entries start
2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table done
2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Fatal error: Director's comm line to SD dropped.
2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table with 3303 entries start
2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Insert of attributes batch table done
2024-07-23 09:02:49 bareos-dir-prod JobId 18065: Replicating deleted files from jobids 17646,13356,13422,13488,13556,13625,13694 to jobid 18065
2024-07-23 09:03:03 bareos-dir-prod JobId 18065: Error: Bareos bareos-dir-prod 22.1.4 (28Feb24):
Build OS: Red Hat Enterprise Linux release 9.1 (Plow)
JobId: 18065
Job: node0057-AI.2024-07-23_09.00.04_41
Backup Level: Virtual Full
Client: "node0057.foo.bar.edu-fd" 22.1.5 (04Jun24) Red Hat Enterprise Linux Server release 7.9 (Maipo),redhat
FileSet: "LinuxAll" 2023-10-22 13:39:16
Pool: "node0057-AI-Consolidated" (From Job Pool's NextPool resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "bareos-sd-t1-prod-Consolidated" (From Storage from Pool's NextPool resource)
Scheduled time: 23-Jul-2024 09:00:04
Start time: 21-May-2024 02:00:01
End time: 21-May-2024 02:30:31
Elapsed time: 30 mins 30 secs
Priority: 10
SD Files Written: 0
SD Bytes Written: 110,846,184 (110.8 MB)
Rate: 60.6 KB/s
Volume name(s): node0057-AI-Consolidated-12276
Volume Session Id: 19
Volume Session Time: 1721645047
Last Volume Bytes: 275 (275 B)
SD Errors: 0
SD termination status: Error
Accurate: yes
Bareos binary info: Bareos subscription release
Job triggered by: User
Termination: *** Backup Error ***
*
Director and SD are separate hosts, and this issue seems to persist only with jobs from this client, node0057. I enabled debug tracing on the SD but haven't seen anything that makes sense to me.
...
bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 FI=804499 SessId=19 Strm=1998 len=65
bareos-sd-t1-prod (200): stored/mac.cc:195-18065 before write JobId=18065 FI=804499 SessId=19 Strm=MD5 len=16
bareos-sd-t1-prod (100): stored/mac.cc:655-18065 ok=0
bareos-sd-t1-prod (130): stored/label.cc:627-18065 session_label record=fc052df8
bareos-sd-t1-prod (150): stored/label.cc:652-18065 Write sesson_label record JobId=18065 FI=EOS_LABEL SessId=19 Strm=18065 len=234 remainder=0
bareos-sd-t1-prod (150): stored/label.cc:660-18065 Leave WriteSessionLabel Block=390293886d File=0d
bareos-sd-t1-prod (100): stored/block.cc:567-18065 return WriteBlockToDev, job is canceled
bareos-sd-t1-prod (100): stored/mac.cc:684-18065 Set ok=FALSE after WriteBlockToDevice.
bareos-sd-t1-prod (200): stored/mac.cc:687-18065 Flush block to device pos 0:390293886
bareos-sd-t1-prod (100): stored/acquire.cc:538-18065 releasing device "FileStorageConsolidated5" (/var/lib/bareos/storage)
bareos-sd-t1-prod (100): stored/acquire.cc:560-18065 There are 0 writers in ReleaseDevice
bareos-sd-t1-prod (50): stored/askdir.cc:366-18065 >dird UpdCat Job=node0057-AI.2024-07-23_09.00.04_41 FileAttributes bareos-sd-t1-prod (50): stored/askdir.cc:369-18065 create_jobmedia error BnetRecv
bareos-sd-t1-prod (200): stored/mac.cc:229-18062 bareos-sd-t1-prod (200): stored/acquire.cc:568-18065 ===== Wrote block new pos 2:4028935146
bareos-sd-t1-prod (50): stored/askdir.cc:298-18065 Update cat VolBytes=390293887
bareos-sd-t1-prod (50): stored/askdir.cc:317-18065 >dird bareos-sd-t1-prod (200): stored/acquire.cc:587-18065 dir_update_vol_info. Release vol=node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
bareos-sd-t1-prod (150): stored/vol_mgr.cc:695-18065 === clear in_use vol=node0057-AI-Consolidated-12276
bareos-sd-t1-prod (150): stored/vol_mgr.cc:712-18065 === set not reserved vol=node0057-AI-Consolidated-12276 num_writers=0 dev_reserved=0 dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
bareos-sd-t1-prod (150): stored/vol_mgr.cc:740-18065 === clear in_use vol=node0057-AI-Consolidated-12276
bareos-sd-t1-prod (150): stored/vol_mgr.cc:751-18065 === remove volume node0057-AI-Consolidated-12276 dev="FileStorageConsolidated5" (/var/lib/bareos/storage)
...
I can provide client/job/pool/storage configurations if they seem relevant here, and am continuing to poke at this myself. Thanks for any thoughts on troubleshooting.
Josh