I'd really like a solution because these errors require manual intervention to fix, wasting my time and also hold up the backup process for everything.
... earlier messages
2016-08-31 19:40:56 helva-sd JobId 1088: Despooling elapsed time = 00:23:49, Transfer rate = 89.57 M Bytes/second
2016-08-31 19:40:56 helva-sd JobId 1088: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a-nst): ERR=block.c:1003 Read error on fd=4 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a-nst). ERR=Cannot allocate memory.
2016-08-31 19:40:56 helva-sd JobId 1088: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=178510.
Director and Storage are on same machine. The job messages above were from a remote client but the data is spooled via SSD. Drive I/O typically achieves 90MB/s.
The error messages suggest to me that the problem is not in fact with the tape drive but something else. Given the previous append error, I suspect the attempt to read the previous block is not in fact reading the previous block but the "current" failed one, somehow.
If the message text is correct and it is a tape misconfiguration, what sorts of things might be wrong?
Of course, I'm a complete novice at this. Can anyone help identify what is really happening?
Might it be possible to modify the code to skip back several (e.g. 4) blocks and reread/verify them all, and then carry on writing (possibly on the next tape) from one after the last known 'good' block?
Regards
Ruth
Config details:
Intel i5-2500K at 3.3GHz with 8GB RAM, 16GB swap.
One HP LTO3 tape drive connected using U320-SCSI on Symbios 53c1030 PCIe interface at 160MB/s. Firmware D26D <== drive involved in error above.
One Quantum LTO4 tape drive connected using U320-SCSI on Symbios 53c1030 PCIe interface at 320MB/s. Firmware W53F <== connected to same scsi bus
Ubuntu 16.04 LTS, kernel 4.4.0-34, very lightly loaded.
Bareos installed from Bareos.org repo, version 16.3.1.1471011706.f4784a7-1304.1
Volume data spooling from Samsung SSD on SATA-II interface.
SD config for the LTO3 drive:
Device {
Name = LTO3TapeStorage
Drive Index = 1
Autochanger = no
Archive Device = /dev/tape/by-id/scsi-3500110a-nst
AutomaticMount = yes
AlwaysOpen = yes
Media Type = LTO3
RandomAccess = no
RemovableMedia = yes
Offline On Unmount = yes
Spool Directory = /media/tapespool/LTO3
Maximum Block Size = 512kb
Maximum File Size = 32gb
Maximum Network Buffer Size = 65536
Maximum Concurrent Jobs = 1
}
your block size is wrong. The tape block size should be 512b not 512kb. 512 is small though you might want to try 1024b
Hi Philip,
I've taken your advice, ensuring the new block size was used on new tapes (just to be sure). It has, however, just fallen over again :(
This is the interesting part of the most recent log :
07-Sep 14:16 helva-sd JobId 1101: Spooling data again ...
07-Sep 14:27 helva-sd JobId 1101: User specified Job spool size reached: JobSpoolSize=128,000,504,532 MaxJobSpoolSize=128,000,000,000
07-Sep 14:27 helva-sd JobId 1101: Writing spooled data to Volume. Despooling 128,000,504,532 bytes ...
07-Sep 15:15 helva-sd JobId 1101: Despooling elapsed time = 00:47:09, Transfer rate = 45.24 M Bytes/second
07-Sep 15:15 helva-sd JobId 1101: Spooling data again ...
07-Sep 15:26 helva-sd JobId 1101: User specified Job spool size reached: JobSpoolSize=128,000,504,505 MaxJobSpoolSize=128,000,000,000
07-Sep 15:26 helva-sd JobId 1101: Writing spooled data to Volume. Despooling 128,000,504,505 bytes ...
07-Sep 15:37 helva-sd JobId 1101: End of Volume "Incr-12" at 21:19445 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). Write of 524288 bytes got -1.
07-Sep 15:38 helva-sd JobId 1101: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=73471.
07-Sep 15:38 helva-sd JobId 1101: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst): ERR=block.c:1003 Read error on fd=4 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). ERR=Cannot allocate memory.
07-Sep 15:38 helva-sd JobId 1101: Despooling elapsed time = 00:11:23, Transfer rate = 187.4 M Bytes/second
07-Sep 15:38 helva-sd JobId 1101: Elapsed time=03:34:48, Transfer rate=39.71 M Bytes/second
07-Sep 15:38 helva-fd JobId 1101: Error: bsock_tcp.c:405 Write error sending 65536 bytes to Storage daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
07-Sep 15:38 helva-fd JobId 1101: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer
07-Sep 15:38 helva-dir JobId 1101: Error: Bareos helva-dir 16.3.1 (01Jul16):
Build OS: x86_64-pc-linux-gnu ubuntu Ubuntu 16.04 LTS
JobId: 1101
Job: Helva_Home.2016-09-03_18.05.00_28
Backup Level: Incremental, since=2016-09-03 11:56:34
Client: "helva-fd" 16.3.1 (01Jul16) x86_64-pc-linux-gnu,ubuntu,Ubuntu 16.04 LTS,xUbuntu_16.04,x86_64
FileSet: "Home" 2016-01-25 21:16:12
Pool: "Incr" (From Run IncPool override)
Catalog: "MyCatalog" (From Client resource)
Storage: "LTO3TapeStorage" (From Pool resource)
Scheduled time: 03-Sep-2016 18:05:00
Start time: 07-Sep-2016 12:02:17
End time: 07-Sep-2016 15:38:17
Elapsed time: 3 hours 36 mins
Priority: 10
FD Files Written: 45,238
SD Files Written: 45,238
FD Bytes Written: 511,861,346,861 (511.8 GB)
SD Bytes Written: 511,858,442,913 (511.8 GB)
Rate: 39495.5 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: yes
Volume name(s): Incr-12
Volume Session Id: 11
Volume Session Time: 1472899129
Last Volume Bytes: 422,533,307,233 (422.5 GB)
Non-fatal FD errors: 1
SD Errors: 1
FD termination status: Fatal Error
SD termination status: Fatal Error
Termination: *** Backup Error ***
In this case, the tape being written filled up -- hence the End of Volume at 15:37, but it is not well handled for some reason and instead of just requesting a new tape as I would hope, the result is a failed backup.
Any ideas? These are the current settings for the tape.
Maximum Block Size = 512k
Maximum File Size = 32g
Maximum Network Buffer Size = 65536
Maximum Concurrent Jobs = 1
"Net" wisdom appears to be that these drive's recommended block size is 256KB, but that they work just fine with 512KB.
Out of interest, I have just looked up the log for an earlier iteration of this problem, where the block size was still set as "512kb" and the spool set as 64g:
2016-05-19 02:35:46 helva-sd JobId 695: Writing spooled data to Volume. Despooling 64,000,475,814 bytes ...
2016-05-19 02:51:03 helva-sd JobId 695: End of Volume "Incr-04" at 45:15962 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). Write of 512000 bytes got -1.
2016-05-19 02:51:21 helva-sd JobId 695: Fatal error: Re-read of last block: block numbers differ by more than one.
Probable tape misconfiguration and data loss. Read block=0 Want block=105812.
2016-05-19 02:51:21 helva-sd JobId 695: Fatal error: spool.c:306 Fatal append error on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst): ERR=block.c:1003 Read error on fd=7 at file:blk 0:0 on device "LTO3TapeStorage" (/dev/tape/by-id/scsi-3500110a0013dbb32-nst). ERR=Input/output error.
2016-05-19 02:51:21 helva-sd JobId 695: Despooling elapsed time = 00:15:35, Transfer rate = 68.44 M Bytes/second
2016-05-19 02:51:22 helva-sd JobId 695: Elapsed time=02:51:14, Transfer rate=43.59 M Bytes/second
2016-05-19 02:51:22 helva-fd JobId 695: Error: bsock_tcp.c:369 Write error sending 65536 bytes to Storage daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
2016-05-19 02:51:22 helva-fd JobId 695: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer
Note: "Write of 512000 bytes" here. The difference between 512kb and 512k appears to be that "kb" is in 1000s and "k" is in 1024s.
Ruth