Drobo Pro on iscsi: Amber light/shutdown under heavy load

Andrew Noonan

unread,

Jul 12, 2010, 1:46:18 PM7/12/10

to drobo-talk

Hi all,

We're running a Pro on iSCSI to a machine doing backups, and
occasionally, when we're doing large I/O requests (mainly when
restoring databases), the iSCSI LUN goes away, and you can't talk to
the Drobo at all anymore. The techs at the datacenter where this
lives report that the front panel is completely dark, except for the
power light, which shows amber. We've had this happen 5-6 times so
far, sometimes days, usually weeks in-between events. We're running
Centos 5 on the accessing system, but I don't know why that would make
the Drobo basically shut itself down. Has anyone seen anything like
this or have a clue as to how to proceed? Any ideas would be helpful.
I've got another problem with it, but I'm going to open that as a
different thread.

Thanks,
Andrew

Javier Barroso

unread,

Jul 12, 2010, 2:56:48 PM7/12/10

to drobo...@googlegroups.com

Hello,

We have a drobo elite which too shutdown itself when I try mkfs.ext3 / pvcreate to them. I'll wait it doesn't happen when I use vmfs in its luns. It sends a RST / ACK and then shutdown

Regards,

Robert Euston

unread,

Jul 12, 2010, 3:13:54 PM7/12/10

to drobo...@googlegroups.com

Hi,

I have the same problem, which is why I dont use iSCSI and ext3 anymore.

Looking in the log, it appeared to do this when there's an error in accessing the iSCSI - timeout I think. I was able to reduce the incidence, by upgrading the firmware on the drobo to the latest level - and increase the iscsi timeouts.

Robert

--
You received this message because you are subscribed to the Google Groups "drobo-talk" group.
To post to this group, send email to drobo...@googlegroups.com.
To unsubscribe from this group, send email to drobo-talk+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/drobo-talk?hl=en.

Andrew Noonan

unread,

Jul 12, 2010, 3:43:37 PM7/12/10

to drobo...@googlegroups.com

I'd already increased the iSCSI timeout, and I think we're on the
latest firmware (need to check, it's been a few months), but I'm not
sure why an iSCSI timeout would cause the Drobo itself to go offline.
I could understand the iSCSI subsystem getting confused and requiring
a restart to talk to the Drobo again, but I don't know why the Drobo
would go to Standby.

Andrew

docfruitbat

unread,

Jul 12, 2010, 6:53:45 PM7/12/10

to drobo-talk

I've had this same kind of problem recently myself, though a bit
differently. I have 8 x 2Gb volumes setup. Doing heavy I/O to the
first volume (just reading, even) and a some point I get they same
symptom as people describe in this thread. But, doing the same kinds
of I/O to any of the other volumes works just fine. It's that first
volume that's strange in some way.
I spent some time with wireshark looking at the iSCSI flows and
determined the following:
At some point the drobo is returning a large sequence of blocks. At
the end of that sequence it appears the host computer (iSCSI
initiator), not the drobo, resets the TCP session and effective logs
off from the drobo. The drobo, does an immediate shutdown (same as if
you'd pulled the USB cable), which people see as all the drive lights
turning off (drobo going to sleep, effectively).
The iSCSI initiator (if you are using open-iscsi and probably some
others?), then goes through a re-try sequence of logon again, which
times out because the drobo thinks it's starting up fresh (new logon)
and has to validate the volumes, and invariable there's a timeout at
the initiator because of the delay. The initiator then drops the
session again, and the drobo gets a little confused at this point
(probably because it thinks it completed the logon but gets a TCP
reset again) and is in a funny state after that...and so, is the iSCSI
initiator, for that matter... I have to stop & restart iscsid too to
get it all cleared.
If I attach the drobo via USB and do the same kinds of heavy I/O works
just fine without any problems or corruption. Since I can make it
happen by just reading, I've been mounting the filesystems as read-
only so I don't actually corrupt anything (safer that way :-).
I've yet to understand what about the last block sequence transmission
is disliked by the host (something the iSCSI initiator doesn't like?)
and I haven't had time to do any further testing. I'm using open-
iscsi-2.0-871. I understand there's newer release (872-rc2) out but I
have yet to try it.
For anyone else experiencing this: what iSCSI software are you using
and what platform is it on?

Javier Barroso

unread,

Jul 13, 2010, 2:25:36 AM7/13/10

to drobo...@googlegroups.com

Hi,

On Tue, Jul 13, 2010 at 12:53 AM, docfruitbat <doc...@fruitbat.org> wrote:

I've had this same kind of problem recently myself, though a bit
differently. I have 8 x 2Gb volumes setup. Doing heavy I/O to the
first volume (just reading, even) and a some point I get they same
symptom as people describe in this thread. But, doing the same kinds
of I/O to any of the other volumes works just fine. It's that first
volume that's strange in some way.
I spent some time with wireshark looking at the iSCSI flows and
determined the following:
At some point the drobo is returning a large sequence of blocks. At
the end of that sequence it appears the host computer (iSCSI
initiator), not the drobo, resets the TCP session and effective logs
off from the drobo. The drobo, does an immediate shutdown (same as if
you'd pulled the USB cable), which people see as all the drive lights
turning off (drobo going to sleep, effectively).

How do you determine that is iscsi initiator and not drobo which is issuing the reset ?

With tshark over all my capture files I can only view drobo as source of packages where "tcp.flags.reset == 1"

I'm trying to using open iscsi from lenny (2.0.870~rc3-0.4). I guess I have to test with drobo pro similar parameters with my Drobo Elite (http://www.thirdmartini.com/index.php/DroboPro%2C_iSCSI_and_Linux)

Thanks for the info

docfruitbat

unread,

Jul 14, 2010, 1:31:42 AM7/14/10

to drobo-talk

Unless I'm reading the trace incorrectly (always a possibility :-) I
see the following sequence pretty clearly:
No.. Time Source Destination Proto Info
1273 4.190290 192.168.66.81 192.168.66.5 TCP iscsi-target > 37427
[ACK] Seq=806113 Ack=5089 Win=215 Len=1448 TSV=82131 TSER=435486
1274 4.190402 192.168.66.81 192.168.66.5 TCP iscsi-target > 37427
[ACK] Seq=807561 Ack=5089 Win=215 Len=1448 TSV=82131 TSER=435486
1275 4.190525 192.168.66.81 192.168.66.5 TCP iscsi-target > 37427
[ACK] Seq=809009 Ack=5089 Win=215 Len=1448 TSV=82131 TSER=435486
1276 4.190648 192.168.66.81 192.168.66.5 TCP iscsi-target > 37427
[ACK] Seq=810457 Ack=5089 Win=215 Len=1448 TSV=82131 TSER=435486
1277 4.190656 192.168.66.81 192.168.66.5 TCP iscsi-target > 37427
[PSH, ACK] Seq=811905 Ack=5089 Win=215 Len=144 TSV=82131 TSER=435486
1278 4.229051 192.168.66.5 192.168.66.81 TCP 37427 > iscsi-target
[ACK] Seq=5089 Ack=812049 Win=2921 Len=0 TSV=435535 TSER=82131
1279 4.441067 192.168.66.5 192.168.66.81 TCP 37427 > iscsi-target
[RST, ACK] Seq=5089 Ack=812049 Win=3050 Len=0 TSV=435747 TSER=82131
1281 7.954735 192.168.66.5 192.168.66.81 TCP 37428 > iscsi-target
[SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=439261 TSER=0 WS=6
1282 7.955086 192.168.66.81 192.168.66.5 TCP iscsi-target > 37428
[SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=82508 TSER=439261
WS=5

(I don't know if the above will format in any way that's visable)
192.168.66.5 is my computer (initiator)
192.168.66.81 is the drobopro (target)
The above shows the drobo sending a long sequence ACK packets followed
by a PSH,ACK, which I interpret to mean the last in the seqence of
requested block (.81 -> .5 on port 37427)
The initiator sends an ACK (.5 -> .81) then sends a RST,ACK which ends
the session.
Then the initiator then starts a new logon session (port 37428)

I see this sequence very consistently and interpret it as being the
initiator who drops the session, not the drobo.
I've looked at the raw packet contents but I'd have to analyse the
block layer to determine what it was sending and I haven't had time to
do that.

Javier Barroso

unread,

Jul 16, 2010, 6:41:27 AM7/16/10

to drobo...@googlegroups.com

Ok, I think then your issue was different from mine. In my case, RST,ACK was send by drobo

My problem was resolved reconfiguring paths in switches and drobo networks, and using the lastest open-iscsi or filesystem supported by drobo company

Thank you very much

Andrew Noonan

unread,

Jul 19, 2010, 10:19:42 AM7/19/10

to drobo...@googlegroups.com

Unfortunately for me, I'm not using LACP (just active-backup bonding),
and I'm using a supported filesystem (ext3) and the latest open-iscsi
already. For me it's reading, not writing that seems to be the
problem (I can do backups without a serious problem for weeks, but one
big restore and things go south). I'd probably be OK going back to
USB for now, but my lunsize is 4TB, and running Centos 5, it seems
like the usb-storage driver doesn't support luns over 2TB, as I get
"READ CAPACITY(16) failed" errors when plugging in over USB. Anyone
have any ideas? Has anyone talked to Data Robotics about this? I
know, Beta Support and all that, but just wondering.

Thanks,
Andrew

On Fri, Jul 16, 2010 at 5:41 AM, Javier Barroso <javib...@gmail.com> wrote:
>
> Ok, I think then your issue was different from mine. In my case, RST,ACK was
> send by drobo
>
> My problem was resolved reconfiguring paths in switches and drobo networks,
> and using the lastest open-iscsi or filesystem supported by drobo company
>
> Thank you very much
>

Reply all

Reply to author

Forward