Trying to replace openfiler but problem with brocade

171 views
Skip to first unread message

jjman...@3defilms.com

unread,
Dec 28, 2012, 5:50:54 PM12/28/12
to esos-...@googlegroups.com
Hi
We are curently trying your system to replace openfiler on our next SAN that will be 72 TB shared by 4 Windows Seven stations linked with QLE 2460 on a Brocade 200E
The brocade has allready 2 san with Openfiler on it.
We use this system to share big video files for film editting.

for our test we used 2 PC , both with 16GB of memory and QLE 2460
one has a LSI Megaraid board and we added 2 ssd hard drive in RAid0 on it.

We succeeded to build up the SAN thanks to your wiki, and we build it with the magaraidCLi option and we mounted a raid 0 in Block IO.
The second PC sees the SAn so all is fine, but we are facing a problem now, when we try to connect another pc to the san then then disk is seen but the acces to it is forbiden.
Any reason for that?
On the other pc we have metasan installed could it come from that?

any iddea?

We will continu our experiment those next days.

Anyway thanks a lot for your effort on that topic, SCST is not something very easy to build up.

Thanks

J.J.




Steve Jones

unread,
Dec 29, 2012, 11:41:49 AM12/29/12
to esos-...@googlegroups.com
I've had a similar problem, but hadn't had time to look at it to try to figure out where I might be going wrong..  In my case, I dont have a supported controller, so I've been creating my LUNs using SCSTADMIN commands rather than the TUI, but I have found that the first ESX machine that connects to the LUNs are fine, but the second and third sees the LUNs but can't access them.. 

Like I said, I haven't had time to investigate even enough for me to bring it up myself, but since you're reporting this, I thought I'd add a "me too" and see if there is something from my system that might help diagnose..
-Steve

Marcin Czupryniak

unread,
Dec 29, 2012, 5:23:47 PM12/29/12
to esos-...@googlegroups.com
Well the first thing that comes to my mind is quite a basic topic on FC networks, zoning.
Have you done it properly on your switches? And do you have configured the Initiator WWN into the same group on ESOS?

If you're not sure, just post the /etc/scst.conf file along with your zoning settings and I will have a look on them.

Martin

Steve Jones

unread,
Dec 29, 2012, 5:58:36 PM12/29/12
to esos-...@googlegroups.com
Yeah..  The likelihood of it being something unrelated to ESOS is what stopped me from asking about it before, but it was too much of a coincidence to not at least say "me too" just in case there's something to it..
I will say that in my case, I used to have a hacked up old OpenFiler system with the FC target stuff enabled, and I was able to get my three ESX machines to talk to it, so I don't think it's my switches or masking on the switches..
 
I'm a long-time Cisco network geek, so I can talk all day about Ethernet, token ring, routing protocols, etc, but I am admittedly a novice at best at the low-level workings of the FC network, so I'll describe what I have, and what my theories of how it was supposed to work, and maybe there will be something fundamental that you can just point to and say "no - it's not gonna work like that"
 
My config consists of two 8-port 2gig switches I bought from Ebay for about $50 each, so not the highest end equipment here..  I bought a big box of 2g QLOGIC cards from ebay, and thought I'd put two ports (either one 2-port card, or two, 1-port cards) into each of my ESX servers, and into the Openfiler / ESOS target, then connect each machine's "port 1" to switch 1, and each port-2 to switch 2.  That way,  I'd have multiple paths to the storage from each host.  This part worked pretty well under Openfiler, although I had occasional disconnects/freezeups with no diagnostic information, which is what made me go looking for another solution and led me to ESOS.  Along the way, I started suspecting one of my switches was bad, but then when I'd just leave that switch off, I'd find that I had the same trouble with only the other one, and vice versa, so I don't THINK I have a hardware problem.  The only additional thing I wanted to do with the physical config is to connect a port from switch1 directly to switch2, to hopefully give me more possible paths  (ie: card1 on the ESOS goes down, and card 2 on an ESX host goes down, but I can still get connectivity through the link between switches.)  I never did this though, because I figured it just added more complexity to a system I was already having trouble diagnosing. 
 
I only mention it here, in case there's some assumption made by the drivers that all WWNs can use any path to any other WWN..  I don't think that's the case, based on some strange masking configurations our SAN guys do at work, but at work, we have MUCH better hardware, so it might not be an apples to apples compare.
 
ANYWAY, the other reason I went into that description is to explain why I have so many initiators listed in my SCST.CONF..   Each of my 3 ESX servers have 2 ports, and on the ESX screen where it identifies the ports, it shows 2 WWNs, and I've never been clear on which one to use, so to be on the safe side, I'm including all four per server.  In this config, I had reduced it to only 2 hosts, in hopes of simplifying the world until I get it all working, and then add the other initiator(s)
 
So - Here's my config..  If you have any ideas/hints/recommendations, I'm all ears and grateful for any help!  I've got a manually configured /dev/md0 which I'm just trying to share out as a big multi-access LUN.  I've also got one of the ESOS cards disabled in this config, for testing
 
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2012.12.29 17:34:06 =~=~=~=~=~=~=~=~=~=~=~=
root@san39 ~ #
root@san39 ~ # cat /etc/scst.conf
# Automatically generated by SCST Configurator v3.0.0-pre1.
# Non-key attributes
max_tasklet_cmd 10
setup_id 0x0
threads 4
HANDLER vdisk_blockio {
DEVICE disk_2t_1 {
filename /dev/md0
# Non-key attributes
blocksize 512
nv_cache 0
read_only 0
removable 0
rotational 1
t10_dev_id 42b6e98-disk_2t_1
thin_provisioned 0
threads_num 1
threads_pool_type per_initiator
usn 42b6e98
write_through 0
}
}
TARGET_DRIVER iscsi {
enabled 0
}
TARGET_DRIVER qla2x00t {
TARGET 21:00:00:e0:8b:86:97:d6 {
HW_TARGET
enabled 1
rel_tgt_id 1
# Non-key attributes
addr_method PERIPHERAL
cpu_mask ffffffff,ffffffff
explicit_confirmation 0
io_grouping_type auto
node_name 20:00:00:e0:8b:86:97:d6
GROUP myserver {
LUN 0 disk_2t_1 {
# Non-key attributes
read_only 0
}
INITIATOR 10:00:00:00:c9:34:78:7c
INITIATOR 10:00:00:00:c9:34:79:8b
INITIATOR 10:00:00:00:c9:3a:47:f8
INITIATOR 10:00:00:00:c9:40:b1:e4
INITIATOR 20:00:00:00:c9:34:78:7c
INITIATOR 20:00:00:00:c9:34:79:8b
INITIATOR 20:00:00:00:c9:3a:47:f8
INITIATOR 20:00:00:00:c9:40:b1:e4
INITIATOR 20:00:00:e0:8b:08:51:e9
INITIATOR 20:01:00:e0:8b:28:51:e9
INITIATOR 21:00:00:e0:8b:08:51:e9
INITIATOR 21:01:00:e0:8b:28:51:e9
# Non-key attributes
addr_method PERIPHERAL
cpu_mask ffffffff,ffffffff
io_grouping_type auto
}
}
TARGET 21:01:00:e0:8b:a6:97:d6 {
HW_TARGET
enabled 0
# Non-key attributes
addr_method PERIPHERAL
cpu_mask ffffffff,ffffffff
explicit_confirmation 0
io_grouping_type auto
node_name 20:01:00:e0:8b:a6:97:d6
rel_tgt_id 0
}
}
root@san39 ~ #  

Marcin Czupryniak

unread,
Dec 29, 2012, 6:48:16 PM12/29/12
to esos-...@googlegroups.com
Your configuration is not reflecting what you're trying to achieve and is very dangerous :)

- You have only 1 host group under one target, which makes impossible for MPIO to work properly.
- Your initiator WWNs are wrong, you should use portName WWN zoning instead of listing them both

Basically when a system has 2 or more FC HBA they share the same node name yet the only way to distinguish a port is to use the portname:
ex.
In this case you have an HBA connected and you declared both the nodename and portname (pWWN and nWWN respectively)
INITIATOR 20:01:00:e0:8b:28:51:e9 - this is a nodename
INITIATOR 21:01:00:e0:8b:28:51:e9 - this is a portname
Which is wrong as the FC protocol mandates (along with SCST enforcing it) that for storage connection and zoning the portName WWN must be used.

So in easy steps the configuration you're trying to achieve can be done in these easy steps:

1- Divide and write down the pWWN you're connecting to each switch
2- Enable the target mode on both WWN on ESOS
3- Fix your zoning settings on the switches to reflect the proper configuration (use only pWWN in the zoning), check the link on the bottom for more info.
4- In the zoning settings, follow the best practice of 1 initiator and multiple targets per zone (this equals to 1 zone per HBA)
5- In ESOS declare the pWWN of the initiator under the proper target WWN (group)
6- Export the storage and it should work out of the box with multipath (in ESX it will be using the VMW_SATP_DEFAULT_AA, which means that you can set round robin balancing across the paths.

The storage might hang sometimes because of old firmware/improper configuration, judging from the scst settings i vote for improper configuration,
i think you have to correct the zoning and list only the pWWN, see the attached image.

http://www.brocade.com/downloads/documents/white_papers/Zoning_Best_Practices_WP-00.pdf


zone.png

Steve Jones

unread,
Dec 30, 2012, 11:41:37 AM12/30/12
to esos-...@googlegroups.com
Wow..  Thanks - That's a lot of great information..
 
How did you know from my list of various WWNs which ones were node, and which were port?  I didn't understand the difference (and still not sure I know 100%, but I have removed the node ones!)
I found a  command "esxcfg-mpath -l" that distinguished the nWWN from the pWWNs and you were right, but I am curious from looking at them, how you could tell the difference?
 
I don't have time right now, but tonight, or maybe tomorrow morning, I'm going to pull all my FC cables and start from scratch again given this new info you've given me, and hopefully get it all working..

One question for clarification - For further redundancy, is it OK to connect the switches to each other, and if so, doesn't it make sense to have the six possible initiator pWWN numbers listed in each of the two targets? 
 
In the meantime, I've at least removed the nWWNs from the config file and restated ESOS to see what happens..
 
I'll let you know..
 
Thanks so much for the help!!
-Steve
 
 
 

Steve Jones

unread,
Dec 30, 2012, 12:16:26 PM12/30/12
to esos-...@googlegroups.com
One other thing..
 
When I boot the ESOS machine, I notice that in the POST, I get a line in the QLOGIC initialization that says "Checking Adapter 0 Loop ID 125"
 
I can't find anything on what a loop ID really is, where it's set, if I even need to care, etc.. but this doesn't sound like a "default" value so I'm wondering if there might be something in my switch that I have to change?

Marcin Czupryniak

unread,
Jan 2, 2013, 10:49:00 AM1/2/13
to esos-...@googlegroups.com
Ok so lets cover the basics of FC.

Normally FC can be deployed in different ways, and the packets can be sent to destination using different methods.
The basic operational mode of FC connection is fc-p2p (port-to-port).
Following the fc-p2p when deploying multiple hosts requires the usage of an arbitrator, which is built in inside the protocol and operates by specifying the loop and port ID, if you have one ring you have to start with port ID 0 and increment it by one for each device connected to the ring.
The FC-AL is no longer used as within Ethernet switches have taken over hubs the same apply to FC where switched are used to deploy fabrics.
Now FC is a smart protocol and within a fabric there are specifications to natively use multiple paths and provide switch redundancy and path redundancy for fail-over.
In a fabric a port is set to FC-SW mode and in the brocade webui it will be shown with an "F" (which means fabric), this is the most used operational mode for FC networks today.
A fabric can exist even with a single switch, and actually bonding 2 switches together makes no sense in your case, which only increases complexity and the number of unused zones in your configuration. Normally a fabric is built from 3 or more switches where the increased complexity pays off in terms of the number of paths to a host or storage (normally storage nodes have 4 or more different paths within a fabric built from 4 switches.
In your scenario 2 separate switches which do not share the same domain is the most deployed config (within a physical fabric you use the domain ID to allow multiple fabrics coexist on the same physical domain) as it will tolerate the failure of a switch or an HBA.



    When I boot the ESOS machine, I notice that in the POST, I get a line in the QLOGIC initialization that says "Checking Adapter 0 Loop ID 125"
    I can't find anything on what a loop ID really is, where it's set, if I even need to care, etc.. but this doesn't sound like a "default" value so I'm wondering if there           might be something in my switch that I have to change?

This is normal and cannot be changed during the POST (unless you enter the HBA BIOS and set a different ID) anyway in a switched environment it doesn't make sense as soon as you connect the port to a fabric, the port will automatically change from FC-AL into FC-SW mode where the Loop ID is not used at all.

For the WWN questions, normally the first numbers are used to identify the manufacturer of the HBA, as an example all of the WWNs which begins with 21:00 are produced by QLogic, 10:00 by Emulex and so on.
To distinguish a pWWN from a nWWN you have to browse the WWNs in the switch like I've shown you in the last picture, the first WWN identifies the nodename and the second the port name and the third the firmware actually loaded within the HBA (yes the firmware is loaded from the driver!).
Once you work on a daily basis with FC you can distinguish them by just looking at the first numbers (I do it for most of the HBAs on the market)

Anyway for more info I suggest you to read the wikipedia page: http://en.wikipedia.org/wiki/Fibre_Channel

If you have other questions just ask.

Martin

Steve Jones

unread,
Jan 3, 2013, 7:21:55 AM1/3/13
to esos-...@googlegroups.com
You' ve been SUPER helpful..  Every time I learn something about FC, I find out 10 other things I didn't know existed, but that's kind of the whole point of me doing this..  At my job, I'm more of a server / VMWare guy, and I have to interact with the SAN guys, and I wanted to know more about their world so I could be better about knowing what they can, and can't do for me, and be better able to ask for what I need..  THis is a GREAT help, not to mention the benefit I'll have of my lab at home!!

SO..  After what I've learned, I've reloaded from scratch, created a /dev/md0 as a mirror with a hotspare (3x 1tb) and then done everything else through the TUI, and it seems to be working.  The one thing is that I STILL have one ESX machine that doesn't see the LUN, but it's my newly installed box and may not have EVER seen one, so I might have a card problem there...  The switch does see the WWN on the port.. 

Anyway, does this look better?  As opposed to the other one, this one is totally setup through the TUI:
If you are using a screen reader, you may wish to switch to basic HTML for a better experience.

 
Print all
# Automatically generated by SCST Configurator v3.0.0-pre1.

# Non-key attributes
max_tasklet_cmd 10
setup_id 0x0
threads 4

HANDLER vdisk_blockio {
        DEVICE 1TB-MirrorOnSAN {

                filename /dev/md0

                # Non-key attributes
                blocksize 512
                nv_cache 0
                read_only 0
                removable 0
                rotational 1
                t10_dev_id 3ca005f3-1TB-MirrorOnSAN

                thin_provisioned 0
                threads_num 1
                threads_pool_type per_initiator
                usn 3ca005f3

                write_through 0
        }
}

TARGET_DRIVER iscsi {
        enabled 0
}

TARGET_DRIVER qla2x00t {
        TARGET 21:00:00:e0:8b:86:97:d6 {
                HW_TARGET

                enabled 1
                rel_tgt_id 2


                # Non-key attributes
                addr_method PERIPHERAL
                cpu_mask ffffffff,ffffffff
                explicit_confirmation 0
                io_grouping_type auto
                node_name 20:00:00:e0:8b:86:97:d6

                GROUP Card1 {
                        LUN 0 1TB-MirrorOnSAN {

                                # Non-key attributes
                                read_only 0
                        }

                        INITIATOR 10:00:00:00:c9:3a:47:f8

                        INITIATOR 10:00:00:00:c9:40:b1:e4

                        INITIATOR 21:01:00:e0:8b:28:51:e9

                        # Non-key attributes
                        addr_method PERIPHERAL
                        cpu_mask ffffffff,ffffffff
                        io_grouping_type auto
                }
        }

        TARGET 21:01:00:e0:8b:a6:97:d6 {
                HW_TARGET

                enabled 1
                rel_tgt_id 1

                # Non-key attributes
                addr_method PERIPHERAL
                cpu_mask ffffffff,ffffffff
                explicit_confirmation 0
                io_grouping_type auto
                node_name 20:01:00:e0:8b:a6:97:d6

                GROUP Card2 {
                        LUN 0 1TB-MirrorOnSAN {

                                # Non-key attributes
                                read_only 0
                        }

                        INITIATOR 10:00:00:00:c9:34:78:7c

                        INITIATOR 10:00:00:00:c9:34:79:8b

                        INITIATOR 21:00:00:e0:8b:08:51:e9

Marcin Czupryniak

unread,
Jan 3, 2013, 7:40:03 AM1/3/13
to esos-...@googlegroups.com
That is much better now :)

For the yet not working server, maybe it's just a zoning issue (are you 100% sure that the Initiator is listed under the right target in ESOS and it's in the same switch?). Keep in mind that all of the nodes connecting to the Storage node must have the same link speed
Or you simply forgot to declare it in ESOS (do you have 3 or 4 boxes in total?)





jjman...@3defilms.com

unread,
Feb 6, 2013, 6:39:28 PM2/6/13
to esos-...@googlegroups.com
I'm back.... sorry I didn't followed up my post.

Here everything is working fine now.
I have 2 Openfiler 2.99 and One ESOS working together with a Brocade 200E and 4 workstations sharing all those.
All workstations are windows seven based and we are using Metasan to get a file level sharing. ... all is working perfectly so far.

I may change the two openfiler to ESOS but I'm just not sure I can do that without losing my datas on my disk....
Is there anyway to use the existing hard drive and use ESOS?

Marc Smith

unread,
Feb 6, 2013, 8:31:22 PM2/6/13
to esos-...@googlegroups.com
What SCST I/O mode are you using? If you are using vdisk_blockio
you'll need to make sure the block size setting (SCST setting) is the
same when you setup the ESOS host.


--Marc
> --
> You received this message because you are subscribed to the Google Groups
> "esos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to esos-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

jjman...@3defilms.com

unread,
Feb 8, 2013, 3:28:56 AM2/8/13
to esos-...@googlegroups.com
Thanks mark

I am using blockio but don't know the block size.... There is certainly a way to find it...
I'll check.

Marc Smith

unread,
Feb 8, 2013, 9:43:05 AM2/8/13
to esos-...@googlegroups.com
On your OpenFiler host, you can check the current vdisk_blockio device
block size with this command:
cat /sys/kernel/scst_tgt/devices/DEVICE/blocksize

Where DEVICE is the name of your SCST device.


--Marc

jjman...@3defilms.com

unread,
Feb 8, 2013, 11:10:32 AM2/8/13
to esos-...@googlegroups.com
Ok my blocksize is 512 ... not even sure that is the best...... but this is the way it is.
In fact we are using mainly big video files on our san from 1GB to more than 50 GB. our FC is a 4Gb Fc....

Marc Smith

unread,
Feb 8, 2013, 10:30:19 PM2/8/13
to esos-...@googlegroups.com
I've seen other users post about using block sizes other than (greater
than) 512, but if I recall correctly, there wasn't much of a
difference in performance numbers if any.

--Marc

Marcin Czupryniak

unread,
Feb 9, 2013, 7:08:34 AM2/9/13
to esos-...@googlegroups.com
I agree on that, what makes a difference here is the underlying storage
block/stripe size and RAID type.

Martino
> I've seen other users post about using block sizes other than (greater
> than) 512, but if I recall correctly, there wasn't much of a
> difference in performance numbers if any.
>
> --Marc
>
> On Fri, Feb 8, 2013 at 11:10 AM, <jjman...@3defilms.com> wrote:
>> Ok my blocksize is 512 ... not even sure that is the best...... but this is
>> the way it is.
>> In fact we are using mainly big video files on our san from 1GB to more than
>> 50 GB. our FC is a 4Gb Fc....
>>
>>
>>
>> Le vendredi 8 f�vrier 2013 15:43:05 UTC+1, Marc Smith a �crit :

jjman...@3defilms.com

unread,
Feb 9, 2013, 6:33:53 PM2/9/13
to esos-...@googlegroups.com
Thanks!!

I'm a little bit lost with that.... is there an easy way to decide the best block size?
I have read so many different thing on the web...




On Friday, December 28, 2012 11:50:54 PM UTC+1, jjman...@3defilms.com wrote:

jjman...@3defilms.com

unread,
Feb 10, 2013, 7:53:10 AM2/10/13
to esos-...@googlegroups.com
Thanks Marc.

Changing from Openfiler to Esos keeping the Hard drive went fine and easy.

You really made a great software so easy to use.

Congratulation!!!


On Friday, December 28, 2012 11:50:54 PM UTC+1, jjman...@3defilms.com wrote:
Reply all
Reply to author
Forward
0 new messages