30 minute timeout

282 views
Skip to first unread message

Christian Sacks

unread,
Jun 24, 2022, 6:01:07 AM6/24/22
to bareos-users
Hello there,

I am about done pulling my hair out trying to find where I need to adjust a timeout setting for a job that takes longer than 30 minutes to run.

I have a job that before running will create a snapshot in ceph for a vm's disks, pull the snapshot to the backup server, extract it into a folder and then backup the contents of that folder however the disk is quite large that it backs up and so takes longer than 30 minutes to create/pull/extract the snapshot.

I have added the "FD Connect Timeout" and "SD Connect Timeout" with a value of 14400 seconds to the following files;

# grep -ir timeout
bareos-sd.d/storage/bareos-sd.conf:  FD Connect Timeout = 14400 seconds
bareos-sd.d/storage/bareos-sd.conf:  SD Connect Timeout = 14400 seconds
bareos-dir.d/director/bareos-dir.conf:  FD Connect Timeout = 14400 seconds
bareos-dir.d/director/bareos-dir.conf:  SD Connect Timeout = 14400 seconds
bareos-fd.d/client/myself.conf:  SD Connect Timeout = 14400 seconds
bareos-fd.d/client/myself.conf:  FD Connect Timeout = 14400 seconds

The job still times out after 30 minutes.

Scheduled time: 23-Jun-2022 23:00:00
Start time: 23-Jun-2022 23:00:00
End time: 23-Jun-2022 23:30:31
Elapsed time: 30 mins 31 secs
FD termination status: Fatal Error
SD termination status: Waiting on FD

I don't know where else to add an increased timeout, please can someone tell me where I need to look to add this timeout increased figure.

What is interesting is if I run the backup manually, it works fine and does it in under 30 minutes, but because the automatic schedule starts at 11pm every night, other jobs run at the same time, and so it takes longer to run it and hits this bizarre 30 minute timeout.

Thanks in advance.

Bruno Friedmann

unread,
Jun 27, 2022, 10:40:39 AM6/27/22
to bareos-users
Didn't you have some firewall just cutting the connection because they believe it is hang out ?

Did you already tried to setup HeartbeatInterval https://docs.bareos.org/Configuration/Director.html#config-Dir_Client_HeartbeatInterval instead of ConnectTimeout ?

Ivan Kovach

unread,
Jul 8, 2022, 8:53:05 AM7/8/22
to bareos-users
Check if parameter set Max Run Sched Time in bareos-dir.d/job/your_job.conf

понедельник, 27 июня 2022 г. в 17:40:39 UTC+3, Bruno Friedmann:

Christian Sacks

unread,
Nov 10, 2022, 9:00:42 AM11/10/22
to bareos-users
Hi, thought I'd check back and realised I hadn't posted my reply before;

I have these in the job's .conf file;

  MaxRunSchedTime = 2 hours
  MaxRunTime = 1 hours
  MaxWaitTime = 1 hours

Yet still as soon as it gets to 
  Elapsed time:           32 mins 20 secs
it just automatically cancels the job and fails with 
  FD termination status:  Fatal Error

I'm so stumped.
Thanks for your helps.

Christian Sacks

unread,
Nov 10, 2022, 10:31:41 AM11/10/22
to bareos-users
I have got it working, finally!!!

# grep -ir heartbeat|grep -v \#
bareos-sd.d/storage/bareos-sd.conf:  Heartbeat Interval = 60 seconds
bareos-dir.d/director/bareos-dir.conf:  Heartbeat Interval = 60 seconds
bareos-fd.d/client/myself.conf:  Heartbeat Interval = 60 seconds

Woohoo!!

Christian Sacks

unread,
Nov 11, 2022, 5:42:15 AM11/11/22
to bareos-users
I was premature, it was a one off fluke. Back to the drawing board.

Brock Palen

unread,
Nov 11, 2022, 9:36:09 AM11/11/22
to Christian Sacks, bareos-users
I would remove those max runtime / wait etc until you have it working. There is often confusion with those settings. 

I have had jobs run for over a day no issue. 

Sent from my iPhone
Brock Palen

On Nov 11, 2022, at 5:42 AM, 'Christian Sacks' via bareos-users <bareos...@googlegroups.com> wrote:

I was premature, it was a one off fluke. Back to the drawing board.
--
You received this message because you are subscribed to the Google Groups "bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/46a99a3a-8625-41bf-ba09-9fb96acf9a26n%40googlegroups.com.

Christian Sacks

unread,
Nov 11, 2022, 10:23:22 AM11/11/22
to bareos-users
Thanks, I did just remove those from the job.
I wonder what else could be causing it to error, the backup server is also the job's client, it connects to ceph and creates a snapshot of the vm's disk and then pulls the snapshot to it's local file system (in /var/lib/bareos/vmname_snap), and then extracts that snapshot into /var/lib/bareos/vmname_snap_fs_0/ (which it creates using libguestfs).

The same process works for the other 150+ vms it's just always this one vm that causes the issue. the size of the snapshot is 58GB and the extracted snapshot filesystem is currently at about 20GB, it backs up much larger snapshots ok, just this one. It's so frustrating.

Reply all
Reply to author
Forward
0 new messages