longtime archive

Frank Cherry

unread,

Jan 10, 2021, 6:48:45 AM1/10/21

to bareos-users

Hi there,

I have PDF files on a folder. They should be moved on a tape for archiveing. If this is successful done. the source files will be delete. Then after some time, new files are in the source folder and should be moved to the tape and so on. The tape should be a fast recovery solution for us. If it's not useable -we can get the files from the client. So this solution is only a faster way for us.

My plan is to define a pool like this:

Pool {
Name = Archive
Pool Type = Backup
Recycle = no                       # Bareos can automatically recycle Volumes
AutoPrune = no                    # Prune expired volumes
Volume Retention = 20 years         # How long should the Full Backups be kept? (#06)
Maximum Volumes = 100               # Limit number of Volumes in Pool
Label Format = "Archive-"              # Volumes will be labeled "Full-<volume-id>"
}

The job will be done manually as a full backup type.
So I'm right, that every new job will append on the tape until it's full and then the next empty tape will used by using auto-labeling?

When using auto-labeling, it defines a new volume by taking the prefix from the Label Format, defined at the Pool settings followed by counting up - 0000 4 digits.

In SD the device is defined with LabelMedia = yes;

Would it be possible to do manually label a tape, then add it to the pool for using if needed?

thank, Frank

Kjetil Torgrim Homme

unread,

Feb 12, 2021, 11:41:19 AM2/12/21

to bareos...@googlegroups.com

"'Frank Cherry' via bareos-users" <bareos...@googlegroups.com>
writes:

> I have PDF files on a folder. They should be moved on a tape for
> archiveing. If this is successful done. the source files will be
> delete. Then after some time, new files are in the source folder and
> should be moved to the tape and so on. The tape should be a fast
> recovery solution for us. If it's not useable -we can get the files
> from the client. So this solution is only a faster way for us.
>
> My plan is to define a pool like this:
>
> Pool {
> Name = Archive
> Pool Type = Backup
> Recycle = no # Bareos can automatically recycle Volumes
> AutoPrune = no # Prune expired volumes
> Volume Retention = 20 years # How long should the Full Backups be kept? (#06)
> Maximum Volumes = 100 # Limit number of Volumes in Pool
> Label Format = "Archive-" # Volumes will be labeled "Full-<volume-id>"
> }

Well, Label Format does not matter for tape storage. You also need to
specify a Storage which uses a tape device.

> The job will be done manually as a full backup type.
> So I'm right, that every new job will append on the tape until it's
> full

Yes.

> and then the next empty tape will used by using auto-labeling?

No, you will have to label your tapes in advance. If you have a scratch
pool, you can put unused tapes there, and Bareos will pick a tape from
there into whichever pool needs one. I don't really recommend it
though, I think it makes management more difficult, ie., you have to
check the database to see if LB1004L7 is an Archive tape or not. I
prefer to make my own barcode labels with separate number series for
different usage.

> When using auto-labeling, it defines a new volume by taking the prefix
> from the Label Format, defined at the Pool settings followed by
> counting up - 0000 4 digits. In SD the device is defined with
> LabelMedia = yes;

Oh, I was assuming you have a tape robot. Auto-labling may work on a
standalone drive.

> Would it be possible to do manually label a tape, then add it to the
> pool for using if needed?

Yeah, put it in the Scratch pool if you don't want to put it in the
Archive pool directly. Bareos will not start using the extra tapes
before the current tape is full, anyway.

--
Kjetil T. Homme
Redpill Linpro AS

Frank Kirschner | Celebrate Records GmbH

unread,

Feb 15, 2021, 2:04:10 AM2/15/21

to bareos...@googlegroups.com, kjetil...@redpill-linpro.com

Thanks Kjetil,

Now, after some month of using Bareos and understanding the system, I'm
thinking about to buy a Quantum SuperLoader LTO 7 with 16 slots ;-)

Your explanation describe exactly the way I want to go, figuring out in the
last days. Planing with the auto loader, I will do also my own barcode labels
and put it in advance to a storage pool. Cumulative long time archives will
not be placed permanently inside the auto loader and I can put it after 3 months
manually in and out.

Finally, can I discuss the following example:

I have to archive from 3 departments audio, video and print files as "cold data",
stored on tape:

First, I will will do copy all audio files to a local hard disk on the same host,
where the tape is connected directly, because copying files of the network from
different host a slower than writing to tape.

Second, creating a job, using "Enabled = no" for starting it manually via GUI.
Setting the client to the local fd,
Setting a file set, which points to a local directory, where the data are stored
from step #1
Setting also the storage to the tape
Define a pool where I have predefined some empty tapes

Now run the job and archive the audio files.
When finished successful, would it be a good idea to delete the collected audio
files from the hard disk and go ahead with copying now the video files and start
the job again or will it be better, to make for each types of data (audio, video,
print) an own job with an own pool and so tape like named:
audio1, audio2 / video1, video2 ..?

Sorry for asking but you guys have more experience with such tape scenario than a
green horn like me :-)

Thanks for your help,
Frank

Spadajspadaj

unread,

Feb 15, 2021, 3:13:50 AM2/15/21

to bareos...@googlegroups.com

On 15.02.2021 08:04, 'Frank Kirschner | Celebrate Records GmbH' via

My first thought is that archiving is much more than just using backup
solution to copy files. Archiving is a whole process which should be
designed with proper data security in mind (i.e. appropriate copy
redundancy and data verifiability).

Secondly, "First, I will will do copy all audio files to a local hard

disk on the same host, where the tape is connected directly, because
copying files of the network from different host a slower than writing

to tape". Not necessarily. That's what you use spooling for.

Thirdly - I used to do a "copy and delete" scenario few years ago but I
had a slightly different setup so my solution is not directly copy-paste
appliable to you but I'd suggest you look into:

1) Dynamically create a list of files to backup (might involve checking
client files for ctime or querying bareos database to verify if the file
has already been backed up)

2) Create a post-job script which removes files that have already been
backed up in a proper way (i.e. included in a given number of backup
jobs if you want to have several copies) - this definitely involves
querying director's database.

Best regards,

MK

Frank Kirschner | Celebrate Records GmbH

unread,

Feb 15, 2021, 3:27:40 AM2/15/21

to bareos...@googlegroups.com

Spooling is not working for this scenario, because I have to backup
multiple clients, the manual says: "Each Job will reference only a
single client."
So I use a "run before" script which collects from the 3 clients the
data. On each client are placed the files in a "archiving" folder
manually by the operator.

>
> Thirdly - I used to do a "copy and delete" scenario few years ago but
> I had a slightly different setup so my solution is not directly
> copy-paste appliable to you but I'd suggest you look into:
>
> 1) Dynamically create a list of files to backup (might involve
> checking client files for ctime or querying bareos database to verify
> if the file has already been backed up)
>
> 2) Create a post-job script which removes files that have already been
> backed up in a proper way (i.e. included in a given number of backup
> jobs if you want to have several copies) - this definitely involves
> querying director's database.

That's a good idea for my scenario. Thanks for this good hint,
>
> Best regards,
>
> MK
>
>

Spadajspadaj

unread,

Feb 15, 2021, 3:47:02 AM2/15/21

to bareos...@googlegroups.com

On 15.02.2021 09:27, 'Frank Kirschner | Celebrate Records GmbH' via

bareos-users wrote:
>
>> Secondly, "First, I will will do copy all audio files to a local hard
>> disk on the same host, where the tape is connected directly, because
>> copying files of the network from different host a slower than
>> writing to tape". Not necessarily. That's what you use spooling for.
> Spooling is not working for this scenario, because I have to backup
> multiple clients, the manual says: "Each Job will reference only a
> single client."
> So I use a "run before" script which collects from the 3 clients the
> data. On each client are placed the files in a "archiving" folder
> manually by the operator.

Sure. If this is the case, it sounds reasonable :-)

You might also have just three separate clients from which you backup
with spooling but it's of course up to you. I don't know your setup
sufficiently well to suggest this solution or another.

>>
>> Thirdly - I used to do a "copy and delete" scenario few years ago but
>> I had a slightly different setup so my solution is not directly
>> copy-paste appliable to you but I'd suggest you look into:
>>
>> 1) Dynamically create a list of files to backup (might involve
>> checking client files for ctime or querying bareos database to verify
>> if the file has already been backed up)
>>
>> 2) Create a post-job script which removes files that have already
>> been backed up in a proper way (i.e. included in a given number of
>> backup jobs if you want to have several copies) - this definitely
>> involves querying director's database.
> That's a good idea for my scenario. Thanks for this good hint,

For example, my fileset included something like that:

FileSet {
    Name = "Local-archives"
    Include {
        File = "\\| find /srv/archives -type f -not -path '*backup*'
-ctime +60"
    }
}

Which copied onto tape only files located in /srv/archives and not in
"backup" in file (or directory in path) name that were created more than
two months ago.

Then I would run a script (in my case it was ran asynchronously by cron,
not from post-job trigger but post-job script is just as good here)
involving a query like

select

    concat(Path.Path,file.name) as filepath,

    count(distinct job.jobid) as jobcount

from

    ((path join

    file

        on

    file.pathid = path.pathid)

        join

    job
        on

    file.jobid=job.jobid)

where job.jobstatus='t' and job.name like '%my_srv%'

group by filepath

having jobcount>=3;

To find files that had already been backed up 3 times with different
jobs so I can remove them from disk. Of course you might want to extend
the query to include - for example - media table to make sure that files
have been copied to separate tapes.

Reply all

Reply to author

Forward