Bareos-fd is beeing killed with signal 11 on restore from tape

138 views
Skip to first unread message

Andreas R

unread,
May 26, 2025, 6:23:28 AMMay 26
to bareos-users
Hi,

I have trouble restoring from tape. Jobs start as expected, but at some point during the restore, the filedaemon is killed with signal 11.

*restore jobid=213438 client=prestore01-fd all done yes

May 23 05:16:57 prestore01 bareos-fd[30717]: bareos-fd, prestore01-fd got signal 11 - Segmentation violation. Attempting traceback.
May 23 05:16:57 prestore01 bareos-fd[30717]: exepath=/usr/sbin/
May 23 05:16:57 prestore01 bareos-fd[30717]: BAREOS interrupted by signal 11: Segmentation violation
May 23 05:16:57 prestore01 bareos-fd[30917]: Calling: /usr/sbin/btraceback /usr/sbin/bareos-fd 30717 /var/lib/bareos
May 23 05:16:57 prestore01 bareos-fd[30924]: bsmtp: tools/bsmtp.cc:455-0 Failed to connect to mailhost localhost
May 23 05:16:57 prestore01 bareos-fd[30717]: The btraceback call returned 1
May 23 05:16:57 prestore01 bareos-fd[30717]: Dumping: /var/lib/bareos/prestore01-fd.30717.bactrace

cat /var/lib/bareos/prestore01-fd.30717.bactrace
Attempt to dump current JCRs. njcrs=1
threadid=0x00007f399fdfe6c0 JobId=213439 JobStatus=R jcr=0x7f3998047ec0 name=RestoreFiles.2025-05-23_10.16.37_28
threadid=0x00007f399fdfe6c0 killable=1 JobId=213439 JobStatus=R jcr=0x7f3998047ec0 name=RestoreFiles.2025-05-23_10.16.37_28
       UseCount=1
       JobType=R JobLevel=
       sched_time=23-May-2025 05:16 start_time=23-May-2025 05:16
       end_time=31-Dec-1969 18:00 wait_time=31-Dec-1969 18:00
       db=(nil) db_batch=(nil) batch_started=0

Steps to reproduce:
1. Full backup to disk
2. Copy to tape via next pool
3. Restore from disk is ok
4. Restore from tape is not ok

What I tried without success so far:
- Deleted the jobs from tape and copied them again
  The error occourred after the same amount of restored files
- Tried a different Tape
- Tried other fd versions. 22(debian), 23(suse) and 24(suse)
- Changed the blocksize to 512 in the sd
- Disabled compression and rerun everything

Client {
 Name = prestore01-fd
 #Maximum Concurrent Jobs = 20
 FDport = 9102
 PKI Signatures = Yes
 PKI Encryption = Yes
 PKI Keypair = "/etc/bareos/master.pem"
 PKI Master Key = "/etc/bareos/prestore01.cert"
 PkiCipher = AES256  
}

Pool {
 Name = Full
 Pool Type = Backup
 Recycle = Yes
 Volume Retention = 12 months
 Maximum Volumes = 125
 Maximum Volume Bytes = 125G
 Next Pool = "TapeFull"
 Label Format = "Full-"
 Storage = LocalStorage
}

Pool {
 Name = TapeFull
 Pool Type = Backup
 Recycle = Yes
 Volume Retention = 13 month
 Storage = TL1000
 Cleaning Prefix = CLN
}

Job {
 Name = CopyFull2Tape
 JobDefs = "CycleJob"
 Type = Copy
 Selection Type = PoolUncopiedJobs
 Level = Full
 Pool = Full
 Messages = Standard
 Client = pbackup01-fd
 FileSet = "SuseBase"
 Storage = "LocalStorage"
 Schedule = "CopyFull2Tape"
}

System Info:
Bareos: 24.0.4~pre0.1014be830-74
OS: openSUSE Leap 15.6
Catalog: Postgresql
Tape: LTO8

Thanks in advance

Sebastian Sura

unread,
May 26, 2025, 8:38:07 AMMay 26
to bareos...@googlegroups.com

Hi Andreas,

you attached the `.bactrace` file that the fd created.  It would be very helpful if you could also send us the `.traceback` file that was created during the crash, as that file contains the stacktrace.
Without it we would have to guess were the problem occured.

As this problem occured on a restore, could you

1) check if this is reproducable, and if so,
2) send us the bootstrap record file of that restore job ?

If you give the restore command the option `bootstrap=<path>`, then bareos will write the bsr file to that path and will not delete it.

Kind Regards
Sebastian Sura

Am 26.05.25 um 12:23 schrieb 'Andreas R' via bareos-users:
Thanks in advance --
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bareos-users/08776ca6-2a98-4901-a228-524922713a9en%40googlegroups.com.
-- 
 Sebastian Sura                  sebasti...@bareos.com
 Bareos GmbH & Co. KG            Phone: +49 221 630693-0
 https://www.bareos.com
 Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
 Komplementär: Bareos Verwaltungs-GmbH
 Geschäftsführer: Stephan Dühr, Jörg Steffens, Philipp Storz
Message has been deleted

A Riedl

unread,
May 26, 2025, 10:51:12 AMMay 26
to Sebastian Sura, bareos...@googlegroups.com
Hi Sebastian,

thank you for your reply.
I have attached the files

Kind Regards,
Andreas

You received this message because you are subscribed to a topic in the Google Groups "bareos-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bareos-users/pPvlU-7Y7Vs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bareos-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bareos-users/48c43b9f-7a6e-486e-af03-ae0f74bd03cf%40bareos.com.
bootstrap.txt
bareos.6167.traceback

Andreas R

unread,
May 27, 2025, 3:49:55 AMMay 27
to bareos-users
Hi Sebastian,

thank you for your reply.
I have attached both files.

Kind Regards,
Andreas
bareos.6167.traceback
bootstrap.txt

Sebastian Sura

unread,
May 27, 2025, 4:07:07 AMMay 27
to bareos...@googlegroups.com

Thanks for the crash report.  This looks very weird.  I have not seen this kind of crash before.
Would it be possible for you to install the debug packages and recreate the crash ?

See here on how to install the debug symbol packages: https://docs.bareos.org/Appendix/Debugging.html#installing-debug-symbols-packages

Kind Regards
Sebastian Sura

Am 26.05.25 um 16:42 schrieb 'Andreas R' via bareos-users:

Andreas R

unread,
May 27, 2025, 7:23:36 AMMay 27
to bareos-users
Thank you for looking into this matter.
Here is the debug report.

Best Regards,
Andreas
bareos.3757.traceback

Sebastian Sura

unread,
May 28, 2025, 3:45:20 AMMay 28
to bareos...@googlegroups.com

Thanks for that traceback.  Something really weird is happening.  It looks like the fd tries to decrypt your encrypted backup, and it thinks it succeeds, but it actually went wrong.

Could you redo the restore, but with debug tracing enabled ? I.e. do

setdebug client=<clientname> level=500 trace=1

before the restore.
This command should print a filename where the debug messages will be stored.  It would be great if you could send this file to me (after the filedaemon crashed).

I created an internal issue to track this as there is clearly something going wrong here.

Kind Regards
Sebastian Sura

Am 27.05.25 um 13:23 schrieb 'Andreas R' via bareos-users:

Andreas R

unread,
May 30, 2025, 7:08:13 AMMay 30
to bareos-users
I have sent you the debug trace. Let me know if I can provide further information.
Kind Regards
Andreas

Sebastian Sura

unread,
Jun 2, 2025, 1:48:55 AMJun 2
to bareos...@googlegroups.com

Hi Andreas,

i want to check why the copy is not restorable.  Could you do the following for me ?
1) Grab the bsr of the (working) full and the (not working) copy.  You can do this via

* restore jobid=<full/copy id> bsr=/path/to/the/file.bsr all done

bareos then writes the bsr in the given file.  Lets say the bsrs are now in /tmp/full.bsr an /tmp/copy.bsr.

2) We now want to use bscan to see what data is getting sent to the fd:

$ bscan -b /path/to/the/file.bsr --list-records -c path/to/config ... <your device>

This should output a list like the following:

bscan: stored/butil.cc:327-0 Using device: "FileStorage2" for reading.
02-Jun 07:37 bscan JobId 0: Ready to read from volume "Copy-0002" on device "FileStorage2" (storage).
02-Jun 07:37 bscan JobId 0: Forward spacing Volume "Copy-0002" to file:block 0:216.
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=-4 Stream=5 len=164
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=1 len=184
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=22 len=640
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=20 len=8624
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=20 len=16
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=1998 len=81
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=19 len=322
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=1 Stream=40 len=16
bscan: stored/bscan.cc:501-0 Record: SessId=1 SessTim=1748841876 FileIndex=2 Stream=1 len=185
...

Could you send the two bsrs and the two lists to me ?

Kind Regards
Sebastian Sura

Am 30.05.25 um 13:08 schrieb 'Andreas R' via bareos-users:

A Riedl

unread,
Jun 2, 2025, 5:27:57 AMJun 2
to Sebastian Sura, bareos...@googlegroups.com
Hi Sebastian,

thank you for your guidance. I have created the files accordingly:
https://drive.google.com/drive/folders/1neYWzoAe6cHukNXqFM1fCdimMdhmfjHO?usp=sharing

Hope this helps!
Andreas

You received this message because you are subscribed to a topic in the Google Groups "bareos-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bareos-users/pPvlU-7Y7Vs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bareos-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bareos-users/18bebbc3-3218-41c3-9cf2-a67fac50dad3%40bareos.com.

Sebastian Sura

unread,
Jun 3, 2025, 3:45:07 AMJun 3
to bareos...@googlegroups.com

Hi Andreas,

thanks for the help!  Diffing those files yielded:

-bscan: stored/bscan.cc:496-0 Record: ... Stream=20 len=262144
+bscan: stored/bscan.cc:496-0 Record: ... Stream=20 len=209312

This is very weird.  It looks like some of the data was not copied correctly.  I will come back to this after my vacation.  It definitely looks weird.
Could you modify the copy.bsr by deleting the VolSessionId=,VolSessionTime=,FileIndex=,Count= lines and running bscan again like before?
I am wondering if some other job somehow cut off that part.

Kind Regards
Sebastian Sura

Am 02.06.25 um 07:48 schrieb Sebastian Sura:

Andreas R

unread,
Jun 3, 2025, 8:53:05 AMJun 3
to bareos-users
Hi Sebastian,

the bscan output with the modified bsr was uploaded to the shared folder.

I did some more debugging.

First I created a new storage and a new disk pool.
Then I copied the initial full job to the new disk pool. (disk > disk)
Selection Pattern = "SELECT 212964 AS jobid;"

The restore from that pool also failed. So it seems the problem is not related to tape.

With the debug traces I was able to identify affected files. There is some kind of pattern:
host1:
- /var/adm/backup/rpmdb/Packages-20250517.gz
- /var/adm/backup/rpmdb/Packages-20250520.gz
- /var/lib/ca-certificates/openssl/OISTE_WISeKey_Global_Root_GC_CA.pem
host2:
- /var/adm/backup/rpmdb/Packages-20250517.gz
- /etc/vmware-tools/vgauth/schemas/XMLSchema.xsd
host3:
- /etc/vmware-tools/vgauth/schemas/XMLSchema.xsd
host4:
- /var/lib/ca-certificates/openssl/DIGITALSIGN_GLOBAL_ROOT_ECDSA_CA.pem
- /var/lib/sss/mc/initgroups
etc.
All these jobs run simultaneously to a single pool.

Have a nice vacation,
Andreas

Andreas R

unread,
Jun 18, 2025, 12:21:10 PMJun 18
to bareos-users
Here is a little update.
I have created more devices in the sd for parallel job execution. But all of them with "Maximum Concurrent Jobs = 1".
Previously we had a single device with 20 concurrent jobs. That solved the problem for us. At least for new backups.

Still, something seems to be wrong with copy jobs and the crashing fd.
Let me know if I can provide any more information to get this sorted out.

I will be on vacation until end of next week.
Best wishes,
Andreas

Sebastian Sura

unread,
Jul 24, 2025, 1:50:37 AMJul 24
to bareos...@googlegroups.com

Hey Andreas,

sorry for taking so long to come back to you.   Did this restore issue happen to you more than once ? I.e. for more than one copy/backup ?
The hint with `Maximum Concurrent Jobs = 20` is something i will look into.

Kind Regards
Sebastian Sura

Am 18.06.25 um 18:21 schrieb 'Andreas R' via bareos-users:

Andreas R

unread,
Aug 8, 2025, 11:14:06 AMAug 8
to bareos-users
Hi Sebastian,

yes. It happened with most of the copied backups. The clients are very similar.
On every job, only a few files were affected. (Comment June 3)
That's why we didn't realize it earlier. Older jobs are also affected. But I am not sure since when exactly.

Kind regards,
Andreas
Reply all
Reply to author
Forward
0 new messages