Discover reason for "Re-read of last block: block numbers differ by more than one."

Ruth Ivimey-Cook

unread,

Aug 31, 2016, 4:02:52 PM8/31/16

to bareos-users

I have bareos backing up to two LTO tape drives, and occasionally see errors such as the following. This problem is not specific to a particular client. I don't think it's specific to a particular drive, though mostly it happens with the LTO3 rather than the LTO4. Howver The LTO3 gets more use...

I'd really like a solution because these errors require manual intervention to fix, wasting my time and also hold up the backup process for everything.

... earlier messages
2016-08-31 19:40:56 helva-sd JobId 1088: Despooling elapsed time = 00:23:49, Transfer rate = 89.57 M Bytes/second
2016-08-31 19:40:56 helva-sd JobId 1088: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a-nst): ERR=block.c:1003 Read error on fd=4 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a-nst). ERR=Cannot allocate memory.
2016-08-31 19:40:56 helva-sd JobId 1088: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=178510.

Director and Storage are on same machine. The job messages above were from a remote client but the data is spooled via SSD. Drive I/O typically achieves 90MB/s.

The error messages suggest to me that the problem is not in fact with the tape drive but something else. Given the previous append error, I suspect the attempt to read the previous block is not in fact reading the previous block but the "current" failed one, somehow.

If the message text is correct and it is a tape misconfiguration, what sorts of things might be wrong?

Of course, I'm a complete novice at this. Can anyone help identify what is really happening?

Might it be possible to modify the code to skip back several (e.g. 4) blocks and reread/verify them all, and then carry on writing (possibly on the next tape) from one after the last known 'good' block?

Regards
Ruth

Config details:

Intel i5-2500K at 3.3GHz with 8GB RAM, 16GB swap.
One HP LTO3 tape drive connected using U320-SCSI on Symbios 53c1030 PCIe interface at 160MB/s. Firmware D26D <== drive involved in error above.
One Quantum LTO4 tape drive connected using U320-SCSI on Symbios 53c1030 PCIe interface at 320MB/s. Firmware W53F <== connected to same scsi bus

Ubuntu 16.04 LTS, kernel 4.4.0-34, very lightly loaded.
Bareos installed from Bareos.org repo, version 16.3.1.1471011706.f4784a7-1304.1
Volume data spooling from Samsung SSD on SATA-II interface.

SD config for the LTO3 drive:
Device {
Name = LTO3TapeStorage
Drive Index = 1
Autochanger = no
Archive Device = /dev/tape/by-id/scsi-3500110a-nst
AutomaticMount = yes
AlwaysOpen = yes
Media Type = LTO3
RandomAccess = no
RemovableMedia = yes
Offline On Unmount = yes
Spool Directory = /media/tapespool/LTO3
Maximum Block Size = 512kb
Maximum File Size = 32gb
Maximum Network Buffer Size = 65536
Maximum Concurrent Jobs = 1
}

David Sherer

unread,

Sep 1, 2016, 12:10:37 AM9/1/16

to bareos-users

your block size is wrong. The tape block size should be 512b not 512kb. 512 is small though you might want to try 1024b

Philipp Storz

unread,

Sep 1, 2016, 4:00:51 AM9/1/16

to bareos...@googlegroups.com

Hello Ruth,

You should specify "512 k" instead of "512 kb".

To try different blocksizes with your configuration you can use "btape test".
The full test will check if the wringint and re-reading of the last block works.

I would suggest you try if the default blocksize works.(which is valid if you do not specify a
blocksize at all).

Then, you can double the blocksize to see which blocksize still works and if it gives you more speed.

For more info see:

http://www.bareos.org/en/Whitepapers/articles/Speed_Tuning_of_Tape_Drives.html

Best regards,

Philipp

--
Mit freundlichen Grüßen

Philipp Storz philip...@bareos.com
Bareos GmbH & Co. KG Phone: +49 221 63 06 93-92
http://www.bareos.com Fax: +49 221 63 06 93-10

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Geschäftsführer: Stephan Dühr, M. Außendorf,
J. Steffens, P. Storz, M. v. Wieringen

Ruth Ivimey-Cook

unread,

Sep 7, 2016, 11:50:12 AM9/7/16

to bareos-users

On Thursday, 1 September 2016 09:00:51 UTC+1, philipp.storz wrote:
> Hello Ruth,
>
> You should specify "512 k" instead of "512 kb".

Hi Philip,

I've taken your advice, ensuring the new block size was used on new tapes (just to be sure). It has, however, just fallen over again :(

This is the interesting part of the most recent log :

07-Sep 14:16 helva-sd JobId 1101: Spooling data again ...
07-Sep 14:27 helva-sd JobId 1101: User specified Job spool size reached: JobSpoolSize=128,000,504,532 MaxJobSpoolSize=128,000,000,000
07-Sep 14:27 helva-sd JobId 1101: Writing spooled data to Volume. Despooling 128,000,504,532 bytes ...
07-Sep 15:15 helva-sd JobId 1101: Despooling elapsed time = 00:47:09, Transfer rate = 45.24 M Bytes/second
07-Sep 15:15 helva-sd JobId 1101: Spooling data again ...
07-Sep 15:26 helva-sd JobId 1101: User specified Job spool size reached: JobSpoolSize=128,000,504,505 MaxJobSpoolSize=128,000,000,000
07-Sep 15:26 helva-sd JobId 1101: Writing spooled data to Volume. Despooling 128,000,504,505 bytes ...
07-Sep 15:37 helva-sd JobId 1101: End of Volume "Incr-12" at 21:19445 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). Write of 524288 bytes got -1.
07-Sep 15:38 helva-sd JobId 1101: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=73471.
07-Sep 15:38 helva-sd JobId 1101: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst): ERR=block.c:1003 Read error on fd=4 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). ERR=Cannot allocate memory.

07-Sep 15:38 helva-sd JobId 1101: Despooling elapsed time = 00:11:23, Transfer rate = 187.4 M Bytes/second
07-Sep 15:38 helva-sd JobId 1101: Elapsed time=03:34:48, Transfer rate=39.71 M Bytes/second
07-Sep 15:38 helva-fd JobId 1101: Error: bsock_tcp.c:405 Write error sending 65536 bytes to Storage daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
07-Sep 15:38 helva-fd JobId 1101: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer
07-Sep 15:38 helva-dir JobId 1101: Error: Bareos helva-dir 16.3.1 (01Jul16):
Build OS: x86_64-pc-linux-gnu ubuntu Ubuntu 16.04 LTS
JobId: 1101
Job: Helva_Home.2016-09-03_18.05.00_28
Backup Level: Incremental, since=2016-09-03 11:56:34
Client: "helva-fd" 16.3.1 (01Jul16) x86_64-pc-linux-gnu,ubuntu,Ubuntu 16.04 LTS,xUbuntu_16.04,x86_64
FileSet: "Home" 2016-01-25 21:16:12
Pool: "Incr" (From Run IncPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "LTO3TapeStorage" (From Pool resource)
Scheduled time: 03-Sep-2016 18:05:00
Start time: 07-Sep-2016 12:02:17
End time: 07-Sep-2016 15:38:17
Elapsed time: 3 hours 36 mins
Priority: 10
FD Files Written: 45,238
SD Files Written: 45,238
FD Bytes Written: 511,861,346,861 (511.8 GB)
SD Bytes Written: 511,858,442,913 (511.8 GB)
Rate: 39495.5 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: yes
Volume name(s): Incr-12
Volume Session Id: 11
Volume Session Time: 1472899129
Last Volume Bytes: 422,533,307,233 (422.5 GB)
Non-fatal FD errors: 1
SD Errors: 1
FD termination status: Fatal Error
SD termination status: Fatal Error
Termination: *** Backup Error ***

In this case, the tape being written filled up -- hence the End of Volume at 15:37, but it is not well handled for some reason and instead of just requesting a new tape as I would hope, the result is a failed backup.

Any ideas? These are the current settings for the tape.

Maximum Block Size = 512k
Maximum File Size = 32g

Maximum Network Buffer Size = 65536
Maximum Concurrent Jobs = 1

"Net" wisdom appears to be that these drive's recommended block size is 256KB, but that they work just fine with 512KB.

Out of interest, I have just looked up the log for an earlier iteration of this problem, where the block size was still set as "512kb" and the spool set as 64g:

2016-05-19 02:35:46 helva-sd JobId 695: Writing spooled data to Volume. Despooling 64,000,475,814 bytes ...
2016-05-19 02:51:03 helva-sd JobId 695: End of Volume "Incr-04" at 45:15962 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). Write of 512000 bytes got -1.
2016-05-19 02:51:21 helva-sd JobId 695: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=105812.
2016-05-19 02:51:21 helva-sd JobId 695: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst): ERR=block.c:1003 Read error on fd=7 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). ERR=Input/output error.

2016-05-19 02:51:21 helva-sd JobId 695: Despooling elapsed time = 00:15:35, Transfer rate = 68.44 M Bytes/second
2016-05-19 02:51:22 helva-sd JobId 695: Elapsed time=02:51:14, Transfer rate=43.59 M Bytes/second
2016-05-19 02:51:22 helva-fd JobId 695: Error: bsock_tcp.c:369 Write error sending 65536 bytes to Storage daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
2016-05-19 02:51:22 helva-fd JobId 695: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer

Note: "Write of 512000 bytes" here. The difference between 512kb and 512k appears to be that "kb" is in 1000s and "k" is in 1024s.

Ruth

Ruth Ivimey-Cook

unread,

Sep 7, 2016, 12:41:49 PM9/7/16

to bareos-users

I am attaching the output of the btape 'test' command on this drive... may be useful?

bareos-test.txt

Philipp Storz

unread,

Sep 8, 2016, 3:29:33 AM9/8/16

to bareos...@googlegroups.com

On 07.09.2016 17:50, Ruth Ivimey-Cook wrote:
> On Thursday, 1 September 2016 09:00:51 UTC+1, philipp.storz wrote:
>> Hello Ruth,
>>
>> You should specify "512 k" instead of "512 kb".
>
>

Hello Ruth,

please read my answer carefully and do what I suggested, especially the part to start with the
default blocksize.

Also, read the tape tuning whitepaper and see how the configuration was set there.

It does not make sense to try to get a bigger blocksize to work if the default does not even work.

If you need more support please consider our services at bareos.com.

Thank you,

Philipp

Reply all

Reply to author

Forward