On Fri, Jun 6, 2014 at 12:17 PM, <webm...@openhome.org> wrote:
> Hi All,
>
> First of all - great project!
Hi,
Thank you.
>
> I am currently prototyping a new SAN storage platform built upon ESOS which
> will be used by 4 VMware hosts and
> looking for some input/sanity check -
>
> 1. Is anyone else doing this with a Vsphere 5.5 cluster which has DR/HA
> enabled? (Auto failover of VMs etc and heart beating on data stores)
>Yes. We have (12) ESXi 5.5 hosts (4 per cluster) attached to (4) ESOS
>all SSD disk arrays that are running in production. We use the
>datastore heartbeat'ing and vSphere HA / DRS work correctly (Fibre
>Channel SAN).
>
> 2. I note that the latest SCST has VAAI support / Atomic Test and Set and
> may not need SCSI Reservations. Has anyone else tested this yet?
> 3. The docs suggest that Multipath I/O with Round robin is a bad idea. Is
> this actually an issue if all paths are in standby mode to the secondary
> storage array with ALUA? (i.e. Active/Standby). Not sure what I am missing
> here.
>
>I have a couple new boxes setup in a cluster (ESOS) and I'm just about
>to be testing in the next day or two. I just recently updated SCST (a
>few days ago) which has the COMPARE AND WRITE functionality in vdisk_*
>handlers (for VAAI ATS). I can let you know in a few days, but I don't
>expect any problems -- I haven't seen any recent complaints on the
>scst-devel list about it.
> 4. How to avoid data corruption when iSCSI Reservation is not shared across
> both primary/secondary node? is single node active enough? what about
> failover?
>
>Don't round-robin the paths; you only fail over to the other node if
>you have to (one active path set to a target host).
>Yes, that sounds like it would work, assuming that the VMware round robin path policy only selects paths that are "optimized". Hopefully it would not select paths that are in the >"non->optimized" ALUA state (the ones on your other node). Probably something you'll want to test to confirm, and if it doesn't work as expected, you could always experiment with the other >target states.
>
> worst case scenario if VMware tried to send I/O to a 'standby' path the I/O
> will get blocked.
>With SCST that's not really true for ALUA -- SCST will still process
>I/O if the initiators send it to a target that isn't "active".
I've
been using this SCST ALUA setup in ESOS for over a year in production
without any issues. I assume you've already seen this article, but if
not, here is what I wrote up for that setup:
http://marcitland.blogspot.com/2013/04/building-using-highly-available-esos.html
>
> worst case scenario if VMware tried to send I/O to a 'standby' path the I/O
> will get blocked.
>With SCST that's not really true for ALUA -- SCST will still process
>I/O if the initiators send it to a target that isn't "active".I don't know if that is the case if the path is marked as 'offline' or 'standby' by AULA though.
In my testing I took both primary paths down with iptables filtering and the LUN blocked traffic, despite the secondary
storage array still having two paths being connected but ALUA blocking I/O (offline/standby state and ESXi marked paths as 'dead').I've
been using this SCST ALUA setup in ESOS for over a year in production
without any issues. I assume you've already seen this article, but if
not, here is what I wrote up for that setup:
http://marcitland.blogspot.com/2013/04/building-using-highly-available-esos.html
I have seen the article - its a great write up, just not sure how to apply it to my situation
as I need more than a single path active to my primary target for extra bandwidth (no budget for FC or 10G Ethernet).Perhaps once I have completed some more thorough testing I could contribute
a write-up using iSCSI with standard 1G Ethernet for anyone else in my situation.