Problem with iSCSI connected LTO-2 tape drive

46 views
Skip to first unread message

Dave partridge

unread,
Dec 10, 2016, 12:24:33 PM12/10/16
to open-iscsi
I did:

root@Charon:/home/amonra# mt -f /dev/st0 fsf 1
 mt
: /dev/st0: rmtioctl failed: Input/output error root
@Charon:/home/amonra#


The rmtioctl message appeared after about 10-15 seconds, and the iSCSI target showed that the session had dropped after another 10-15 seconds.

When I was returned to the command line prompt, the target showed the session as connected again.

During most/all of this time the forward space file operation was still running.

root@Charon:/home/amonra# iscsiadm -m node --targetname "iqn.2008-08.com.starwindsoftware:mercury-ultrium2" --portal "192.168.129.77:3260" 
# BEGIN RECORD 2.0-873
node.name = iqn.2008-08.com.starwindsoftware:mercury-ultrium2
node.tpgt = -1
node.startup = manual
node.leading_login = No
iface.hwaddress = <empty>
iface.ipaddress = <empty>
iface.iscsi_ifacename = default
iface.net_ifacename = <empty>
iface.transport_name = tcp
iface.initiatorname = <empty>
iface.bootproto = <empty>
iface.subnet_mask = <empty>
iface.gateway = <empty>
iface.ipv6_autocfg = <empty>
iface.linklocal_autocfg = <empty>
iface.router_autocfg = <empty>
iface.ipv6_linklocal = <empty>
iface.ipv6_router = <empty>
iface.state = <empty>
iface.vlan_id = 0
iface.vlan_priority = 0
iface.vlan_state = <empty>
iface.iface_num = 0
iface.mtu = 0
iface.port = 0node.discovery_address = 192.168.129.77
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.nr_sessions = 1
node.session.auth.authmethod = None
node.session.auth.username = <empty>
node.session.auth.password = <empty>
node.session.auth.username_in = <empty>
node.session.auth.password_in = <empty>
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 192.168.129.77
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = NoneThe return to the command prompt took a while longer.
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No
# END RECORD
root@Charon:/home/amonra#

I'm guessing that the problem relates to iSCSI timeouts for tape devices. Please can you guide me in baby steps what I need to do to resolve this problem.

Thanks
Dave



The Lee-Man

unread,
Dec 10, 2016, 5:38:43 PM12/10/16
to open-iscsi
What is your setup? What OS and version are you running on, what is your transport, and what tape drive are you using?

Dave partridge

unread,
Dec 11, 2016, 6:55:15 AM12/11/16
to open-iscsi
Ubuntu 16.04.1 LTS with kernel 4.8.13.  Connected to target drive over 1GB ethernet.  Drive is HP Ultrium 460 (Ultrium 2), firmware is F63D - which is latest).

Target is served by Starwind V8 running on Windows 10 x64

Dave

Dave partridge

unread,
Dec 12, 2016, 8:46:17 AM12/12/16
to open-iscsi
I just ran a Wireshark capture on the target system of the iSCSI session for a Windows initiator connecting the tape and then issuing an FSF.  I then did the same for the Ubuntu open-iscsi initiator.

The capture for the WIndows initiator looks pretty much as I would expect (given my limited knowledge of the iSCSI protocols).

The Ubuntu/open-iscsi capture has all sorts of odd stuff like logins being sent to the target every 15 seconds whle the FSF is being processed.  Definitely borked I think.

Do any of the open-iscsi folk watch this forum or am I talking to myself?

Dave
Ubuntu iscsi.cap
Windows iscsi.cap

The Lee-Man

unread,
Dec 13, 2016, 1:15:12 PM12/13/16
to open-iscsi
I am looking at these, but I haven't gotten very far.

I've started examining the Unbuntu capture, and it seems normal so far.

Lee Duncan

unread,
Dec 14, 2016, 2:18:16 PM12/14/16
to open-...@googlegroups.com
On Dec 12, 2016, at 5:46 AM, Dave partridge <minus2...@gmail.com> wrote:
>
> I just ran a Wireshark capture on the target system of the iSCSI session for a Windows initiator connecting the tape and then issuing an FSF. I then did the same for the Ubuntu open-iscsi initiator.
>
> The capture for the WIndows initiator looks pretty much as I would expect (given my limited knowledge of the iSCSI protocols).
>
> The Ubuntu/open-iscsi capture has all sorts of odd stuff like logins being sent to the target every 15 seconds whle the FSF is being processed. Definitely borked I think.
>
> Do any of the open-iscsi folk watch this forum or am I talking to myself?
>
> Dave

Hi Dave:

I think you are right — the problem is the timeout.

The default timeout on many systems (like SUSE that I work on) is set to 60 seconds for a SCSI command. And it looks like the tape drive took about 82 seconds to skip forward a file on your Windows trace.

Try setting the timeout to 90 seconds? The open-iscsi README talks about how to manually set the system SCSI timeout to longer (since this isn’t an iSCSI thing).

Also, you may want to disable the Ping/NOOPs that open-iscsi is setting. This is also discussed in the README file. I’d try setting them both to 0 to get them out of the way. It looks like the tape drive does not respond to the NOOP ping when it is busy for 80+ seconds skipping forward one file.


Lee Duncan

Ulrich Windl

unread,
Dec 15, 2016, 2:11:38 AM12/15/16
to open-...@googlegroups.com
>>> Lee Duncan <leeman...@gmail.com> schrieb am 14.12.2016 um 20:18 in
Nachricht <8286A277-F7FE-4C7D...@gmail.com>:
> On Dec 12, 2016, at 5:46 AM, Dave partridge <minus2...@gmail.com>
wrote:
>>
>> I just ran a Wireshark capture on the target system of the iSCSI session
for
> a Windows initiator connecting the tape and then issuing an FSF. I then did

> the same for the Ubuntu open-iscsi initiator.
>>
>> The capture for the WIndows initiator looks pretty much as I would expect
> (given my limited knowledge of the iSCSI protocols).
>>
>> The Ubuntu/open-iscsi capture has all sorts of odd stuff like logins being

> sent to the target every 15 seconds whle the FSF is being processed.
> Definitely borked I think.
>>
>> Do any of the open-iscsi folk watch this forum or am I talking to myself?
>>
>> Dave
>
> Hi Dave:
>
> I think you are right — the problem is the timeout.
>
> The default timeout on many systems (like SUSE that I work on) is set to 60

> seconds for a SCSI command. And it looks like the tape drive took about 82
> seconds to skip forward a file on your Windows trace.

AFAIK, 60s is the SCSI _disk_ timeout; for tapes the timeout should be
significantly longer (depending on the technology).

>
> Try setting the timeout to 90 seconds? The open-iscsi README talks about how

> to manually set the system SCSI timeout to longer (since this isn’t an
iSCSI
> thing).

15 minutes or so?

>
> Also, you may want to disable the Ping/NOOPs that open-iscsi is setting.
> This is also discussed in the README file. I’d try setting them both to 0
to
> get them out of the way. It looks like the tape drive does not respond to
the
> NOOP ping when it is busy for 80+ seconds skipping forward one file.

Regards,
Ulrich


>
> —
> Lee Duncan
>
> --
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to open-iscsi+...@googlegroups.com.
> To post to this group, send email to open-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.



david.p...@perdrix.co.uk

unread,
Dec 15, 2016, 6:19:39 AM12/15/16
to open-iscsi
Ulrich is correct, the scsi timeouts for tape drives are a *lot* longer.  The short timeout is 900 seconds and the long timeout is 14400 seconds (4 hours).

The IOCTL error message is occurring after 15 seconds, which I think points at the iSCSI layer.

Cheers
Dave Partridge

david.p...@perdrix.co.uk

unread,
Dec 15, 2016, 10:14:58 AM12/15/16
to open-iscsi
Lee,

It would appear that the guilty party was:


node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5

I changed both of these to 0 for the tape device and the problem went away.

Please note that the README.gz for open-scsi doesn't actually say that this is what you need to do to disable the NOP-out polling, so could I suggest that this be stated explicitly.

I must admit that I find it hard to imagine that an iSCSI target would reply to a NOP-out while it was processing a command such as a tape fsf or even tape erase (whose timeout is 6 * the long-timeout of 4 hours).  Should perhaps the NOP-out polling be suspended while a command is being processed?  Or alternatively maybe the NOP-out polling be completely disabled by default with something in the README.gz file that explains WHEN you might want it and how to enable it.   It's certainly clear that (at least) the MS iSCSI initiator doesn't send NOP-out polls.

Regards
Dave

Regards
Dave

Lee Duncan

unread,
Dec 15, 2016, 2:40:59 PM12/15/16
to open-...@googlegroups.com
On Dec 15, 2016, at 7:14 AM, david.p...@perdrix.co.uk wrote:

Lee,

It would appear that the guilty party was:

node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5

I changed both of these to 0 for the tape device and the problem went away.

Excellent.


Please note that the README.gz for open-scsi doesn't actually say that this is what you need to do to disable the NOP-out polling, so could I suggest that this be stated explicitly.

My README says, in section 8.2:

For this setup, you can turn off iSCSI pings by setting:

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0


I must admit that I find it hard to imagine that an iSCSI target would reply to a NOP-out while it was processing a command such as a tape fsf or even tape erase (whose timeout is 6 * the long-timeout of 4 hours).  Should perhaps the NOP-out polling be suspended while a command is being processed?  Or alternatively maybe the NOP-out polling be completely disabled by default with something in the README.gz file that explains WHEN you might want it and how to enable it.   It's certainly clear that (at least) the MS iSCSI initiator doesn't send NOP-out polls.

open-iscsi is normally used to deal with discs. When it’s used with tape it’s not unusual to find bugs or design errors that we did not know were present.

Perhaps a small blurb in the README about dealing with tape, suggesting turning NOOP/ping off. Please feel free to post a pull request on github or suggest a patch on this list.



Regards
Dave


-- 
Lee Duncan

"Choice means saying no to one thing so you can say yes to another." -- Dan Millman

David C. Partridge

unread,
Dec 15, 2016, 5:00:15 PM12/15/16
to open-...@googlegroups.com

You’re right, it is in section 8.2.  Maybe it needs to be said in 8.1.1 as well?

 

Dave

--
You received this message because you are subscribed to a topic in the Google Groups "open-iscsi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/open-iscsi/ViC-za8eHdc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to open-iscsi+...@googlegroups.com.

The Lee-Man

unread,
Dec 22, 2016, 11:51:29 AM12/22/16
to open-iscsi
Hi David:

I have created Issue#35 for this on github.

To unsubscribe from this group and all its topics, send an email to open-iscsi+unsubscribe@googlegroups.com.

David C. Partridge

unread,
Dec 28, 2016, 7:48:34 AM12/28/16
to open-...@googlegroups.com

FWIW I still think the best solution is to suspend the NOP-Out polling (of active) while a device command is being processed.  This way you get the best of both worlds

 

However I do see the attraction of a documentation only fix J

 

Cheers

Dave

 

From: open-...@googlegroups.com [mailto:open-...@googlegroups.com] On Behalf Of The Lee-Man
Sent: 22 December 2016 16:51
To: open-iscsi
Subject: Re: Problem with iSCSI connected LTO-2 tape drive

 

Hi David:



I have created Issue#35 for this on github.

On Thursday, December 15, 2016 at 2:00:15 PM UTC-8, David C. Partridge wrote:

You’re right, it is in section 8.2.  Maybe it needs to be said in 8.1.1 as well?

 

Dave

 

To unsubscribe from this group and all its topics, send an email to open-iscsi+...@googlegroups.com.


To post to this group, send email to open-...@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to a topic in the Google Groups "open-iscsi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/open-iscsi/ViC-za8eHdc/unsubscribe.

To unsubscribe from this group and all its topics, send an email to open-iscsi+...@googlegroups.com.

Lee Duncan

unread,
Dec 28, 2016, 4:39:28 PM12/28/16
to open-...@googlegroups.com
On Dec 28, 2016, at 4:48 AM, David C. Partridge <david.p...@perdrix.co.uk> wrote:

FWIW I still think the best solution is to suspend the NOP-Out polling (of active) while a device command is being processed.  This way you get the best of both worlds
 
However I do see the attraction of a documentation only fix J

LOL. It’s not (just) that I’m lazy. I honestly don’t think the code should change for this issue.

IMHO the NOP usage is a bad idea anyway, but it can be handy to detect a bad connection when no I/O is occurring.

The problem I think in this case is that open-iscsi does not treat tape and disc drives separately.

So if I send a command to a disc drive and don’t hear back for 8 minutes, I know that is not good. And for a disc drive, they always handle commands disconnected, i.e. the response for a read or write comes later, not when I request it. In that case, the disc brains does actually respond to PINGs while an operation is going on.

If we tried to add code that said “if any commands are outstanding don’t send PINGs”, then we could not catch the case where the disc server has gone away while a command was outstanding.

So unless you can come up with an idea that addresses this tape issue and regular usage, I don’t see any way to easily fix this.

But I’m open to suggestions. :)
Reply all
Reply to author
Forward
0 new messages