Restore error "block checksum mismatch"

170 views
Skip to first unread message

Benedict Ecker (Bene)

unread,
Dec 13, 2022, 10:34:23 AM12/13/22
to bareos-users

Hey everyone!

I have encountered problems when restoring from LTO Tapes.
Making backups and storing them on the tapes is working absolutely fine. No problems here. Writing and reading with tar directly to and from the tapes is also working fine.

However when I am starting a restore job, I encounter problems. The operation aborts with the error message "Volume data error! Block checksum mismatch!".

I just can't get my head around the reason. I have experimented with different block sizes (variable and fixed), different compression settings (software-compression and hardware-compression), using brand new tapes for testing, but the end result is always the above mentioned error.

I realise, that this was a problem with miscalculated CRC-checksums some versions prior (Bareos Bug 0001180). However I am using version 21.1.5 under FreeBSD 13.1 from the official package sources. The tape streamer in use is a HP LTO4 Ultrium 1760.

Does anybody have an idea or even had this error before? Any input whatsoever is very appreciated.

Thanks in advance. Cheers!

Andreas Rogge

unread,
Dec 15, 2022, 6:30:14 AM12/15/22
to bareos...@googlegroups.com
Hi Benedict,

Am 13.12.22 um 16:34 schrieb Benedict Ecker (Bene):
> However when I am starting a restore job, I encounter problems. The
> operation aborts with the error message "Volume data error! Block
> checksum mismatch!".

Thats bad. And - unless you wrote your tapes with a version affected by
Bug #1180 - shouldn't happen. Even if you had been hit that Bug, it
should still work if you read and write with the same version (it would
miscalculate the checksum on write and on read, so it would still match).
We currently don't do automated testing with FreeBSD on a real tape
drive, so there might still be some kind of stange bug you've
encountered here.

However, the block checksum calculation happens before the data is
handed off to the device-specific storage backend. So if there is some
kind of problem, you should also see the same behaviour when backing up
to files.
Did you already try that?

In any case, I would love to take a look at the device resource you
configured in the SD.

If you can reproduce the problem with files (which I doubt, as this is
automatically tested during CI), could you provide a sample volume file
(with just two or three files backed up?

If you cannot reproduce the problem with files and it only happens with
tape, would you run a small job (again, just two or three files) to a
fresh or recycled tape, dump the tape-files and send them to me?

To dump the tape files you can just mt rewind the tape and then

dd if=/dev/your-drive of=file-0 bs=<your-blocksize>
dd if=/dev/your-drive of=file-1 bs=<your-blocksize>
...
dd if=/dev/your-drive of=file-N bs=<your-blocksize>

until there is no more files to read (probably there will be just two -
the label block and the backup session)

I really hope we can sort this one out.

Best Regards,
Andreas

--
Andreas Rogge andrea...@bareos.com
Bareos GmbH & Co. KG Phone: +49 221-630693-86
http://www.bareos.com

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Komplementär: Bareos Verwaltungs-GmbH
Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz

Benedict Ecker (Bene)

unread,
Dec 16, 2022, 8:37:44 AM12/16/22
to bareos-users
Hi Andreas,

thank you for the swift answer.

I did try backing up to file storage, both on ZFS and UFS. Restoring from this worked absolutely fine.

This is my device resource for the tape drive:
Device {
  Name = "Ultrium"
  Always Open = yes
  Archive Device = "/dev/nsa0"
  Auto Changer = no
  Automatic Mount = yes
  Block Checksum = yes
  Block Positioning = yes
  BSF at EOM = yes
  Backward Space File = yes
  Backward Space Record = yes
  Device Type = Tape
  Fast Forward Space File = yes
  Forward Space File = yes
  Forward Space Record = yes
  Hardware End of File = yes
  Hardware End of Medium = yes
  Label Media = no
  Maximum Block Size = 1048576
  Maximum File Size = 32G
  Maximum Spool Size = 256G
  Media Type = "LTO"
  Minimum Block Size = 65536
  Offline On Unmount = no
  Spool Directory = "/mnt/appdata/bareos/spool"
  TWO EOF = yes
  Use Mtiocget = yes
}

As I was not able to reproduce the problem with files, I tried to narrow the problem down when restoring from tape and I came to notice something odd. Backing up files, that just write a relatively small amount of data to the tape (a few hundred MiB), works just fine. I tried it at least 20 times. However when I get to about 900 MiB and above total backup size, I get the described checksum mismatch errors. It does not matter if it is a single large 900 MiB file or several smaller files. As long as the overall number of bytes written to the tape exeeds about 900 MiB, it faults when restoring.
Unfortunately this is the reason why I can't dump you the files, as I probably cannot upload a Gigabyte worth of files (correct me if I am mistaken).

I wonder if there are any limitations by the operating system, that I don't know of.

I hope, that we can get to the bottom of this.

Thank you very much for helping me, I appreciate it.

Best Regards,
Benedict Ecker

Andreas Rogge

unread,
Dec 19, 2022, 8:31:35 AM12/19/22
to bareos...@googlegroups.com
Hi Benedict,

looking at your configuration there are two parameters where I don't
unterstand why you set them.
Do you have any special requirement for "Two EOF"? Usually that
parameter is not needed.
Why did you set Minimum Block size to 64k? The default is 63k and
usually there is no need to configure this at all.

If it doesn't break with files and only happens after you have written
some amount of data to the tape, I would suggest that we first make sure
your tape drive works correctly.
Especially "it works until i back up more than 900 MiB" sounds like your
tape drive might have an issue.

Did you check for tapealert flags? If I'm not mistaken on FreeBSD this
should be doable with "smartctl -H". Maybe your drive reported problems
that you didn't realize yet.
Did you look at the kernel log if there is anything related to your
tape-drive?

When you tested with tar, did you make sure the restored data was
correct? The checksums in tar only protect the headers, not the data. So
if there is a bit-flip in the data, tar won't notice that, but just
restore bogus data.
OpenPGP_signature

Benedict Ecker (Bene)

unread,
Dec 22, 2022, 10:35:02 AM12/22/22
to bareos-users
Hi Andreas,

I remember, that the bareos packages I installed from the official FreeBSD ports came with a note saying, that I should change some default settings or else the tape positioning would not work correctly. Two EOF was one of them. Among others I also set the minimum block size because I experimented with different dynamic block sizes in order to get to a solution.
However for the latest tests I kept the config file minimalistic and kept the values mostly default.

Thanks for the tip with the tapealert flags. Either there is nothing wrong with the tape drive or it does not emit TapeAlerts. I tried capturing them with tapeinfo and with smartctl. None produced any valuable output.

I also tested again with tar. This time I also checked the file integrity before writing to tape and after restoring from tape with SHA256 checksums. I've had several errors which led to the checksums not being identical. But what's weird is that most of the files that I restored from tape are fine while a few are not. And it is always the same files which are corrupt.
I also used a brand new tape again and I tried everything using a LTO-Tape from a previous generation (LTO-3 Tape in a LTO-4 drive), but with the same results.

One thing that I came across is that when I set my tape drive to fixed block size (e.g. 131072 bytes) with mt, I can successfully write and restore files with tar.
I tried setting the Minimum, the Maximum and the Label Block Size in Bareos to the same value that the tape drive is set to and at first it seemed to improve things. However when I run a restore or a verify job, I get an "End of Volume" error when the drive is forward spacing to the wanted block.

I must admit, I am running out of ideas what could be the problem.
I still have my old LTO3 tape drive lying around and I am willing to test everything with that, because I am getting the feeling, that my LTO4 drive is not working properly, which would be a shame, because I just bought it.

If you have any other idea, which does not involve ripping my hardware apart, let me know.
I am very grateful for any input you can provide.

Thanks again and happy christmas days!
Reply all
Reply to author
Forward
0 new messages