Large backup jobs fail with unknown reason

92 views
Skip to first unread message

Łukasz Szczepanik

unread,
Sep 9, 2024, 2:30:01 AM9/9/24
to bareos-users
Hi,

After last Full backups session I noticed that all backup jobs with a quite large number of files (500k ~ 2M) terminated with Error status.  What is strange that in the joblog I could not find any information why it cpuld happened.

The filedaemon see the job as successfull:
################################################################
Terminated Jobs:
 JobId  Level    Files      Bytes   Status   Finished        Name
======================================================================
 61135  Full   2,110,919    985.3 G  OK       08-Sep-24 04:37 client1-backup-job
################################################################

When director marked the same job as an error:
################################################################
| 61135 | client1-backup-job           | client1           | 2024-09-07 20:00:01 | 08:37:49 | B    | F     | 2,110,919 |   985,395,488,102 | E         |
################################################################

The job logs also says nothing for me:
################################################################
*list joblog jobid=61135
 2024-09-07 20:00:00 bareos-director JobId 61135: Start Backup JobId 61135, Job=client1-backup-job.2024-09-07_20.00.00_27
 2024-09-07 20:00:00 bareos-director JobId 61135: Connected Storage daemon at bareos-director:9103, encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
 2024-09-07 20:00:00 bareos-director JobId 61135:  Encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
 2024-09-07 20:00:00 bareos-director JobId 61135: Using Client Initiated Connection (client1).
 2024-09-07 20:00:00 bareos-director JobId 61135:  Handshake: Immediate TLS
 2024-09-07 20:00:00 bareos-director JobId 61135:  Encryption: ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2
 2024-09-07 20:00:01 bareos-director JobId 61135: Volume "bareos-client1-full-pool-0" has Volume Retention of 18144000 sec. and has 1 jobs that will be pruned
 2024-09-07 20:00:01 bareos-director JobId 61135: Purging the following 1 JobIds: 40170
 2024-09-07 20:00:05 bareos-director JobId 61135: There are no more Jobs associated with Volume "bareos-client1-full-pool-0". Marking it purged.
 2024-09-07 20:00:05 bareos-director JobId 61135: All records pruned from Volume "bareos-client1-full-pool-0"; marking it "Purged"
 2024-09-07 20:00:05 bareos-director JobId 61135: Recycled volume "bareos-client1-full-pool-0"
 2024-09-07 20:00:05 bareos-director JobId 61135: Using Device "bareos-client1-s3bucket" to write.
 2024-09-07 20:12:54 bareos_sd JobId 61135: Recycled volume "bareos-client1-full-pool-0" on device "bareos-client1-s3bucket" (S3), all previous data lost.
 2024-09-07 23:00:17 bareos-director JobId 61135: Insert of attributes batch table with 800001 entries start
 2024-09-07 23:00:29 bareos-director JobId 61135: Insert of attributes batch table done
 2024-09-08 02:10:31 bareos-director JobId 61135: Insert of attributes batch table with 800001 entries start
 2024-09-08 02:10:43 bareos-director JobId 61135: Insert of attributes batch table done
 2024-09-08 04:37:12 bareos_sd JobId 61135: Releasing device "bareos-client1-s3bucket" (S3).
 2024-09-08 04:37:42 bareos-director JobId 61135: Insert of attributes batch table with 510916 entries start
 2024-09-08 04:37:50 bareos-director JobId 61135: Insert of attributes batch table done
 2024-09-08 04:37:50 bareos-director JobId 61135: Error: Bareos bareos-director 23.0.3~pre47.36e516c0b (19Mar24):
  Build OS:               Debian GNU/Linux 12 (bookworm)
  JobId:                  61135
  Job:                    client1-backup-job.2024-09-07_20.00.00_27
  Backup Level:           Full
  Client:                 "client1" 21.0.0 (21Dec21) Debian GNU/Linux 9.13 (stretch),debian
  FileSet:                "client1-fileset" 2022-07-18 12:56:14
  Pool:                   "client1-full-pool" (From Job FullPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "client1-s3storage" (From Job resource)
  Scheduled time:         07-Sep-2024 20:00:00
  Start time:             07-Sep-2024 20:00:01
  End time:               08-Sep-2024 04:37:50
  Elapsed time:           8 hours 37 mins 49 secs
  Priority:               10
  Allow Mixed Priority:   no
  FD Files Written:       2,110,919
  SD Files Written:       0
  FD Bytes Written:       985,395,488,102 (985.3 GB)
  SD Bytes Written:       378,118,458 (378.1 MB)
  Rate:                   31716.4 KB/s
  Software Compression:   16.2 % (lzo)
  VSS:                    no
  Encryption:             yes
  Accurate:               no
  Volume name(s):         bareos-client1-full-pool-0
  Volume Session Id:      1098
  Volume Session Time:    1724320276
  Last Volume Bytes:      986,302,118,563 (986.3 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  Running
  Bareos binary info:     Bareos community build (UNSUPPORTED): Get professional support from https://www.bareos.com
  Job triggered by:       Scheduler
  Termination:            *** Backup Error ***
################################################################

I also increased debug level and still I don't see any useful information. What is important I think is that Incremental jobs for those clients works fine!
One thing what changed is that I upgraded Bareos from version 22 to version 23.0.3~pre47.36e516c0b (19Mar24).

Thank you

Łukasz Szczepanik

unread,
Sep 11, 2024, 12:38:13 AM9/11/24
to bareos-users
Seems that update from 23.0.3 to 23.0.4 solved this strange problem.
Reply all
Reply to author
Forward
0 new messages