Copy job does not copy all jobs

40 views
Skip to first unread message

Rick Tuk

unread,
Feb 24, 2021, 3:57:03 AM2/24/21
to bareos-users
LS,

I recently added a second storage daemon and a copy job to my configuration. I use Selection Type = PoolUncopiedJobs to select the jobs to copy.
I assumed that on the first run it would select all jobs in the read pool and copy those to the next pool

It seems like the copy job does select all jobs that still need to  be copied: JobId 32583: The following 2235 JobIds were chosen to be copied
However only about 200 jobs are actually added to the queue and completed.

Either there is a limit of 200 I cannot seem to find in either the documentation or my config files, or only a single job for a client is actually copied in each run.

Any help would be very much appreciated.

Director config:
Director {
    Name = soteria
    Dir Address = soteria
    Dir Port = 9101
    Password = “<password>"
    Query File = "/usr/lib/bareos/scripts/query.sql"
    Maximum Concurrent Jobs = 1
    Messages = Daemon
    Auditing = yes

    # Enable the Heartbeat if you experience connection losses
    # (eg. because of your router or firewall configuration).
    # Additionally the Heartbeat can be enabled in bareos-sd and bareos-fd.
    #
    # Heartbeat Interval = 1 min

    Backend Directory = /usr/lib/bareos/backends

    # remove comment from "Plugin Directory" to load plugins from specified directory.
    # if "Plugin Names" is defined, only the specified plugins will be loaded,
    # otherwise all director plugins (*-dir.so) from the "Plugin Directory".
    #
    # Plugin Directory = "/usr/lib/bareos/plugins"
    # Plugin Names = ""
}

Pool config:
Pool {
    Name = Local-Full
    Pool Type = Backup
    Recycle = yes
    AutoPrune = yes
    Storage = Local-Full
    Next Pool = Remote-Full
    File Retention = 12 months
    Job Retention = 12 months
    Volume Retention = 12 months
    Maximum Volume Bytes = 25G
    Label Format = full-
}

Pool {
    Name = Remote-Full
    Pool Type = Backup
    Recycle = yes
    AutoPrune = yes
    Storage = Remote-Full
    File Retention = 12 months
    Job Retention = 12 months
    Volume Retention = 12 months
    Maximum Volume Bytes = 25G
    Label Format = full-remote-
}

Storage config:
Storage {
    Name = Local-Full
    Address = salus
    SD Port = 9103
    Password = “<password>"
    Device = Local-Full
    Media Type = File
}

Storage {
    Name = Remote-Full
    Address = sancus
    SD Port = 9103
    Password = “<password>"
    Device = Remote-Full
    Media Type = File
}

Schedule config:
Schedule {
    Name = "Default"
    Run = Level=Full Pool=Local-Full   1st sat at 23:00
    Run = Level=Differential Pool=Local-Diff FullPool=Local-Full  2nd-5th sat at 23:00
    Run = Level=Incremental Pool=Local-Inc FullPool=Local-Full DifferentialPool=Local-Diff sun-fri at 23:00
}

Copy job:
Job {
    Name = “CopyLocalToRemote"
    Type = Copy
    Level = Incremental
    Storage = Local-Inc
    Pool = Local-Inc
    Full Backup Pool = Local-Full
    Differential Backup Pool = Local-Diff
    Incremental Backup Pool = Local-Inc
    Selection Type = PoolUncopiedJobs
    Schedule = "Default"
    Messages = "Standard"
    Priority = 14
}

SD config on sales:
Storage {
    Name = salus
    SD Address =  salus
    SD Port = 9103
    Maximum Concurrent Jobs = 20

    Backend Directory = /usr/lib/bareos/backends

    # remove comment from "Plugin Directory" to load plugins from specified directory.
    # if "Plugin Names" is defined, only the specified plugins will be loaded,
    # otherwise all storage plugins (*-sd.so) from the "Plugin Directory".
    #
    Plugin Directory = "/usr/lib/bareos/plugins"
    # Plugin Names = ""
}

SD config on sancus:
Storage {
    Name = sancus
    SD Address =  sancus
    SD Port = 9103
    Maximum Concurrent Jobs = 20

    Backend Directory = /usr/lib/bareos/backends

    # remove comment from "Plugin Directory" to load plugins from specified directory.
    # if "Plugin Names" is defined, only the specified plugins will be loaded,
    # otherwise all storage plugins (*-sd.so) from the "Plugin Directory".
    #
    Plugin Directory = "/usr/lib/bareos/plugins"
    # Plugin Names = ""
}

Device config on sales:
Device {
    Name = Local-Full
    Archive Device = /bareos/backup/full
    Device Type = File
    Media Type = File
    Label Media = yes
    Random Access = yes
    Automatic Mount = yes
    Removable Media = no
    Always Open = no
    Maximum Concurrent Jobs = 1
}

Device config on sancus:
Device {
    Name = Remote-Full
    Archive Device = /bareos/backup/full
    Device Type = File
    Media Type = File
    Label Media = yes
    Random Access = yes
    Automatic Mount = yes
    Removable Media = no
    Always Open = no
    Maximum Concurrent Jobs = 1
}

Met vriendelijke groet / With kind regards,
Rick Tuk
Senior DevOps Engineer

Brock Palen

unread,
Feb 24, 2021, 8:58:53 AM2/24/21
to Rick Tuk, bareos-users
Others can confirm, but I have never had bareos have more than 200 jobs in the queue at a time. Let the jobs finish, but manually run the copy job again, it should grab more and more until it’s caught up and you can run it nightly/weekly etc.


Brock Palen
bro...@mlds-networks.com
www.mlds-networks.com
Websites, Linux, Hosting, Joomla, Consulting
> --
> You received this message because you are subscribed to the Google Groups "bareos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/BFD997D1-46E3-42FD-BD73-894350EEC9AE%40mostwanted.io.

Rick Tuk

unread,
Feb 25, 2021, 3:06:13 AM2/25/21
to Brock Palen, bareos-users
I have ran the copy job manually a fair few times now, it did provide me with a clearer view of what exactly happens:

CopyLocalToRemote is scheduled (either manually or automatically)
This job runs for a few seconds selecting all JobIds for not previously copied jobs
It then starts adding 2 jobs to run per found JobId, 1 “control job” with the name CopyLocalToRemote and a new JobId and 1 job with the name of original job that is being copied.
It does this for exactly 100 original jobs, resulting in 200 jobs to be run.
It then start running jobs, both the CopyLocalToRemote and the job with the original job name are started, so always 2 jobs run at the same time
All other jobs are waiting.

Here is an overview of what “status director” says under running jobs:

Running Jobs:
Console connected at 25-Feb-21 08:25
 JobId Level   Name                       Status
======================================================================
 41809 Increme  CopyLocalToRemote.2021-02-25_08.26.49_13 is running
 41810 Full    typhon-default.2021-02-25_08.26.49_14 is running
 41811 Increme  CopyLocalToRemote.2021-02-25_08.26.49_15 is waiting on max Storage jobs
 41812 Full    thoth-stacks.2021-02-25_08.26.49_16 is waiting execution
 41813 Increme  CopyLocalToRemote.2021-02-25_08.26.49_17 is waiting on max Storage jobs
 41814 Full    worker005-default.2021-02-25_08.26.49_18 is waiting execution
 41815 Increme  CopyLocalToRemote.2021-02-25_08.26.49_19 is waiting on max Storage jobs
 41816 Full    metis-default.2021-02-25_08.26.49_20 is waiting execution
 41817 Increme  CopyLocalToRemote.2021-02-25_08.26.49_21 is waiting on max Storage jobs
 41818 Full    soter-default.2021-02-25_08.26.49_22 is waiting execution

In the logs for the initial CopyLocalToRemote job that is started it shows how many and which JobIds are being selected, this list does contain all uncopied jobs:

soteria-dir JobId 41866: The following 2253 JobIds were chosen to be copied: 27655,27656,27657,27659,27660,27661,27662,27672,27677,27679,27681,27682,27683,27686,27688,27691,27692,27693,27694,27695, etc

It than has 3 entries for every job that is started:

soteria-dir JobId 41866: Using Catalog “Catalog"
soteria-dir JobId 41866: Automatically selected Catalog: Catalog
soteria-dir JobId 41866: Copying JobId 42067 started.

This run started at JobId 42067 and ended with 42265
Every JobId that is started from the above 3 log line increment each JobId by 2

And again, the list of JobIds that “were chosen to be copied” is a lot longer than the list of jobs that are actually started

With over 120 clients to be backed up every day (daily incremental, weekly differential and monthly full runs) the copy job will never catch up unless I keep running copy jobs manually.

Met vriendelijke groet / With kind regards,
Rick Tuk
Senior DevOps Engineer

Rick Tuk

unread,
Mar 5, 2021, 3:38:36 AM3/5/21
to bareos-users
For anyone reading this with the same issue, we have added Max Concurrent Copies = 1000 to the definition of the copy job, this has fixed the issues for us

Met vriendelijke groet / With kind regards,
Rick Tuk
Senior DevOps Engineer

Reply all
Reply to author
Forward
0 new messages