[Rocks-Discuss] Rocks NAS appliance vs. roll-your-own

427 views
Skip to first unread message

Larry Baker

unread,
Aug 10, 2010, 5:15:25 PM8/10/10
to Discussion of Rocks Clusters
We are upgrading our cluster. The new one will include a separate 8
TB NAS node (hardware RAID6). This storage will be in addition to 1
TB (software RAID1) for user /home directories on the front end. I
have to decide whether to configure it as a Rocks NAS appliance, or
configure it separately from Rocks and integrate it manually. It will
be configured like a front end. That is, it will be on both the
public network and the private network (both Ethernet and Infiniband,
which will be new for us). Our recent experience with CentOS' poor
performance with SATA RAID controllers tempers my preference to
configure the NAS node as a Rocks NAS appliance. The first thing I
will do is run I/O performance tests with CentOS and Ubuntu loaded to
see whether Rocks is a viable option. I cannot find a discussion of
the properties of the different appliance types available on Rocks. I
assume a Rocks NAS appliance gets its name and IP address distributed
to the entire cluster, and is included in 411 updates of user account
files. I don't know if Rocks assumes /home is on the Rocks NAS
appliance, if Rocks specifies the partitions/volumes, or if Rocks
makes fstab entries or automounter entries for a Rocks NAS appliance.
Those of you that have experience with separate NAS boxes, what are
the pros and cons of a Rocks NAS appliance vs. roll-your-own?

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Jon Forrest

unread,
Aug 10, 2010, 5:35:00 PM8/10/10
to npaci-rocks...@sdsc.edu
On 8/10/2010 2:15 PM, Larry Baker wrote:
> I cannot find a discussion of the properties of the different appliance
> types available on Rocks. I assume a Rocks NAS appliance gets its name
> and IP address distributed to the entire cluster, and is included in 411
> updates of user account files. I don't know if Rocks assumes /home is on
> the Rocks NAS appliance, if Rocks specifies the partitions/volumes, or
> if Rocks makes fstab entries or automounter entries for a Rocks NAS
> appliance. Those of you that have experience with separate NAS boxes,
> what are the pros and cons of a Rocks NAS appliance vs. roll-your-own?

I suggest you read the document I sent to this list
several weeks ago in which I talked about exactly
what needs to be done to add a non-Rocks NFS server
to a Rocks cluster. This is based on my experience
when I added a Sun 7310 to one of my Rocks clusters.

All the NFS servers I used have some kind of hardware
RAID so I haven't experienced the poor I/O performance
with software RAID recently mentioned on this list.
If it's true that a Ubuntu-based server has better
performance than a CentOS server, then this would
be a fantastic reason to use a Ubuntu machine, especially
since you're going to use both hardware and software RAID.

An NFS server doesn't need to get the updates that
411 provides. In particular, it doesn't need to know
anything about user accounts or automounter configurations.
My site is an extreme example because I use
a non-Rocks NFS server, and I've also turned off
the automounter. So far it's working fine.

Feel free to contact me directly if you'd like
to discuss this more.

Cordially,

--
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlfo...@berkeley.edu

Joe Landman

unread,
Aug 10, 2010, 5:36:26 PM8/10/10
to Discussion of Rocks Clusters
Larry Baker wrote:
> We are upgrading our cluster. The new one will include a separate 8 TB
> NAS node (hardware RAID6). This storage will be in addition to 1 TB
> (software RAID1) for user /home directories on the front end. I have to
> decide whether to configure it as a Rocks NAS appliance, or configure it
> separately from Rocks and integrate it manually. It will be configured
> like a front end. That is, it will be on both the public network and
> the private network (both Ethernet and Infiniband, which will be new for
> us). Our recent experience with CentOS' poor performance with SATA RAID

This is traditionally how we have configured boxen for storage on
clusters (multiple nets). Public interface so that the head/login node
isn't bogged down with this work. Wide private interface so you can
hammer on it.

Unless you do next to zero IO in your jobs, a single gigabit connection
won't cut it.

> controllers tempers my preference to configure the NAS node as a Rocks
> NAS appliance. The first thing I will do is run I/O performance tests
> with CentOS and Ubuntu loaded to see whether Rocks is a viable option.
> I cannot find a discussion of the properties of the different appliance
> types available on Rocks. I assume a Rocks NAS appliance gets its name
> and IP address distributed to the entire cluster, and is included in 411
> updates of user account files. I don't know if Rocks assumes /home is
> on the Rocks NAS appliance, if Rocks specifies the partitions/volumes,

Not without a bit of configuration.

> or if Rocks makes fstab entries or automounter entries for a Rocks NAS
> appliance. Those of you that have experience with separate NAS boxes,

Again, you need to do some extend-compute magic to make this automatic.

> what are the pros and cons of a Rocks NAS appliance vs. roll-your-own?

Performance ... a purpose built system will usually (significantly)
outperform a roll-your-own box.

I am biased (obviously, based upon what we do). Integrating external
NAS devices into Rocks isn't that hard. Actually pretty easy. Minor
changes are needed to extend-compute, and on the head node to use it by
default.

You need to make sure that you don't have a single gigabit in/out as
your only network connection if you expect to be doing any sort of
significant IO work. You should plan for using xfs file system on the
NAS. If you want to use a parallel file system I'd suggest talking with
the folks who've done this. In that case, I'd suggest several NAS boxen.

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: lan...@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615

Larry Baker

unread,
Aug 10, 2010, 5:50:21 PM8/10/10
to Jon Forrest, Discussion of Rocks Clusters
Jon,

I assume you are talking about your e-mail from July 25, 2010, which I
saved:

Using a Non-Rocks NFS Server in a Rocks Cluster

Jon Forrest (jlfo...@berkeley.edu)

7/25/2010
version .1

This is certainly what I would use as a guide if I choose to roll my
own. I am not at that point yet. I can run our new NAS as a Rocks
NAS appliance or not. (Unlike many of the NAS's discussed in your
article.) Assuming Rocks/CentOS has acceptable performance (may or
may not have to do with software RAID -- I'm suspicious of hardware
drivers), what are the advantages and disadvantages of the Rocks NAS
approach?

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100810/ab2ebdc1/attachment.html

Jon Forrest

unread,
Aug 10, 2010, 6:25:46 PM8/10/10
to Larry Baker, Discussion of Rocks Clusters
On 8/10/2010 2:50 PM, Larry Baker wrote:
> Jon,
>
> I assume you are talking about your e-mail from July 25, 2010, which I
> saved:
>
> Using a Non-Rocks NFS Server in a Rocks Cluster

That's the one.

> This is certainly what I would use as a guide if I choose to roll my
> own. I am not at that point yet. I can run our new NAS as a Rocks NAS
> appliance or not. (Unlike many of the NAS's discussed in your article.)
> Assuming Rocks/CentOS has acceptable performance (may or may not have to
> do with software RAID -- I'm suspicious of hardware drivers), what are
> the advantages and disadvantages of the Rocks NAS approach?

I've never used a separate Rocks NAS box so I can't
say anything concrete. However, since an NFS server doesn't
benefit from all the wonderful Rocks juju I don't see
any reason to feel any pressure to stay confined to
the Rocks world for file service. Knock yourself
out!

Philip Papadopoulos

unread,
Aug 10, 2010, 6:49:36 PM8/10/10
to Discussion of Rocks Clusters, Larry Baker
We do support Solaris for this type of thing. If you've never looked at ZFS,
you should.
Volume (AKA pool) creation is instantaneous. It has end-to-end data
integrity, data verification, snapshots, etc. etc. We run our home areas
for Triton (300 nodes, 10GbE) on ZFS and generally create a filesystem for
each user so that they can have easy access to their data snapshots. The
Linux entrant as similarly styled file system is BTRFS -- have never used
it, but ZFS has at least
1/2 decade head start on "battle hardening".

You don't have to purchase Sun Hardware (we've used a couple of Tier2
vendors) and then
install (via Rocks) Solaris on significantly less expensive HW. On Sun
Hardware, we see performance delivery of 600-700MB/sec (over 10GbE). On the
Tier2 HW, 400-500MB/sec.
There is more than a 50% cost increase to purchase Sun (aka oracle HW).

For Triton, we have 192 TB of raw disk across 4 servers. We snapshot and
replicate nightly
user's file systems. After raidz2 (~ raid6) we have about 80TB of usable
space (1st copy +
2nd copy). Also, we run our storage cluster separately from our compute
cluster and treat the
storage boxes as "external" NASes.

-P


--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)


-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100810/19640805/attachment.html

Bart Brashers

unread,
Aug 10, 2010, 7:35:55 PM8/10/10
to Discussion of Rocks Clusters
> Assuming Rocks/CentOS has acceptable performance (may or
> may not have to do with software RAID -- I'm suspicious of hardware
> drivers), what are the advantages and disadvantages of the Rocks NAS
> approach?
...snip...

> > An NFS server doesn't need to get the updates that
> > 411 provides. In particular, it doesn't need to know
> > anything about user accounts or automounter configurations.
> > My site is an extreme example because I use
> > a non-Rocks NFS server, and I've also turned off
> > the automounter. So far it's working fine.

This discussion opens the door to the question about auto-mounting, and
whether it's still needed. Certainly a few (many?) years back, many
suffered from stale NFS mounts and autofs was implemented in Rocks to
solve those problems (and others). Perhaps the NFS portions of
RHEL/CentOS have gotten better, and auto-mounting is no longer needed --
it seems to be working for you, Jon. Whether Larry wants to take that
risk is another question.

I believe the reason your NFS server doesn't need 411 is because you use
static mounts. Were you using autofs, then your NFS server would need
to know about the users who are requesting the mounts. Am I right?

I have been using NAS appliances for several years now, mostly with
hardware RAID (and only recently, as a stop-gap measure, with
mis-behaving software RAID). It's pretty easy, once you get a few
pointers. There is, unfortunately, not much by way of documentation on
NAS appliances, so you have to learn by asking this list.

I always have a separate disk in the NAS for the OS. This allows me to
skip the hassles of supporting the latest-n-greatest RAID card in the
install kernel. It also helps separate user data (which lasts forever)
from operating systems, which come and go. When (re-) installing a NAS
node, you are forced to use manual partitioning. This helps prevent the
accidental formatting of user-data partitions (your RAID array). I
generally just use a swap partition and a single "/" partition using up
the rest of the OS disk.

I have an extend-nas.xml file, that does things like this (in addition
to installing the xfs RPMs):

<!-- Change default runlevel from 5 to 3, so we don't start X-windows
-->
sed -i "s/id:5:initdefault/id:3:initdefault/" /etc/inittab
<!-- remove /etc/cron.d/sysstat, since I never look at it -->
rm /etc/cron.d/sysstat
<!-- set up the auto-mounting of /usr/local -->
/bin/rm -r /usr/local
/bin/ln -s /share/apps /usr/local
<!-- Mount the Areca card RAID array -->
mkdir -p /state/partition1
<file name="/etc/fstab" mode="append">
/dev/sda1 /state/partition1 auto defaults
0 2
</file>
<file name="/etc/exports">
/state/partition1 10.0.0.0/255.255.0.0(rw,no_root_squash,async)
</file>

After it installs for the first time, I log in and do whatever I need to
do to get the RAID array working. Sometimes that means installing a
driver RPM (and adding it to extend-nas.xml for next re-install) and
creating/initializing the array. Sometimes that means running parted
and mdadm to set up a software RAID partition. I never bother with
trying to script this portion of the install in extend-nas.xml because I
only want to do it once (and because I have a separate OS disk).
There's no point in automating things you only have to do once in a
while! Once the RAID array is initialized, you just "mkfs.xfs /dev/sda1
; mount -a" and you're ready to go.

Using it is as simple as creating new directories (e.g.
nas-0-0:/state/partition1/data) editing /etc/auto.home and running
"rocks sync users". I have also added lines to /etc/auto.master to
cover new directories to be under autofs' control, e.g. /data (specified
in /etc/auto.data).

As far as having it appear on the local public LAN, you can use the
usual Rocks tools to configure a 2nd NIC, add a line to /etc/exports,
maybe add <package>samba</package> to extend-compute.xml, and so on.

So I guess the question boils down to whether you want to stick with the
Rocks way of using auto-mounting, or go with static mounts. If you want
to use autofs, I think you'll want to use the NAS appliance type in
Rocks. If you don't, then you can go with Jon's method of using an
external NAS.

You had some specific questions:

> The new one will include a separate 8 TB NAS node (hardware RAID6).

You'll probably want to use xfs on this, as I think it's too big for
ext3, and xfs is much faster.

> I assume a Rocks NAS appliance gets its name and IP address
distributed
> to the entire cluster, and is included in 411 updates of user account
> files.

Yes, correct.

> I don't know if Rocks assumes /home is on the Rocks NAS
> appliance,

No, not by default. You can mount dirs from nas-0-0 at home if you
want, as specified by /etc/auto.home. For example, you might want to
place a whole lot of data in nas-0-0:/state/partition1/data, and have it
mounted at /home/data. You could accomplish that via a line in
/etc/auto.home like this:

data -rw,soft,intr,rsize=32768,wsize=32768
nas-0-0.local:/state/partition1/&

> if Rocks specifies the partitions/volumes, or if Rocks
> makes fstab entries or automounter entries for a Rocks NAS appliance.

No, you have to specify the mount point (mkdir, add a line to
/etc/fstab) as in my example xml above. I used "/state/partition1" just
to be Rocksy, but you could use anything.

Bart


This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

Jon Forrest

unread,
Aug 10, 2010, 8:28:26 PM8/10/10
to npaci-rocks...@sdsc.edu
On 8/10/2010 4:35 PM, Bart Brashers wrote:

> This discussion opens the door to the question about auto-mounting, and
> whether it's still needed. Certainly a few (many?) years back, many
> suffered from stale NFS mounts and autofs was implemented in Rocks to
> solve those problems (and others). Perhaps the NFS portions of
> RHEL/CentOS have gotten better, and auto-mounting is no longer needed --
> it seems to be working for you, Jon.

I initially started fooling around turning off the automounter
because I didn't understand why it was necessary in principal
on the private network in a Rocks cluster. Other situations
would be different. Then, ironically, I started having bizarre
problems that were causing long running batch jobs to die because
they couldn't write output files due to automounter problems.
So, what started out as a philosophical exercise became a practical
necessity.

> I believe the reason your NFS server doesn't need 411 is because you use
> static mounts. Were you using autofs, then your NFS server would need
> to know about the users who are requesting the mounts. Am I right?

I don't think so. The automounter is only present on an NFS client.
The NFS server just sees mount requests. Whether
these mount requests are the result of the automounter or someone
typing 'mount' makes no difference to the NFS server.

On an NFS client either you statically mount a file system or you
run the automounter to mount it on demand. One client can use
both techniques as long as only one technique is used per mount point.

Larry Baker

unread,
Aug 12, 2010, 1:39:26 PM8/12/10
to Philip Papadopoulos, Discussion of Rocks Clusters
Phil (or anyone else),

1. Our new cluster NAS node will have two quad-core Nehalem processors
and a RAID hardware controller. For ZFS, do you recommend using the
hardware RAID to create a single RAID group, so ZFS sees one big
disk? Or, do you recommend setting up the hardware RAID as JBOD,
letting ZFS take advantage of the 8 cores to parallelize the disk I/
O? Expansion is not an issue -- we will fully populate all the hard
drive slots in the chassis.

2. Do your clients see the 4 ZFS servers as a single storage pool, or
as 4 separate pools? Does this require more than just Solaris/ZFS?

3. From what I can tell, OpenSolaris 2009.6 has Infiniband support and
talks about integration with Linux boxes on the same Infiniband. Do
you know if IP-over-Infiniband between x86-64 Rocks clients and an
OpenSolaris 2009.6 ZFS NAS works? That would enable us to avoid
purchasing Ethernet switches with 10GbE ports.

Thanks,

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

-------------- next part --------------


An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100812/75bbcbd7/attachment.html

Laotsao

unread,
Aug 12, 2010, 4:55:21 PM8/12/10
to Discussion of Rocks Clusters, Philip Papadopoulos, Discussion of Rocks Clusters

Sent from my iPad

On Aug 12, 2010, at 1:39 PM, Larry Baker <ba...@usgs.gov> wrote:

> Phil (or anyone else),
>
> 1. Our new cluster NAS node will have two quad-core Nehalem processors
> and a RAID hardware controller. For ZFS, do you recommend using the
> hardware RAID to create a single RAID group, so ZFS sees one big
> disk? Or, do you recommend setting up the hardware RAID as JBOD,
> letting ZFS take advantage of the 8 cores to parallelize the disk I/
> O? Expansion is not an issue -- we will fully populate all the hard
> drive slots in the chassis.

Zfs like Jbods u want ifs to do the raid


>
> 2. Do your clients see the 4 ZFS servers as a single storage pool, or
> as 4 separate pools? Does this require more than just Solaris/ZFS?

U could use opensolaris and zfs then use iscsi over ib to connect to one server then u can see all zpool in one mount point


>
> 3. From what I can tell, OpenSolaris 2009.6 has Infiniband support and
> talks about integration with Linux boxes on the same Infiniband. Do
> you know if IP-over-Infiniband between x86-64 Rocks clients and an
> OpenSolaris 2009.6 ZFS NAS works? That would enable us to avoid
> purchasing Ethernet switches with 10GbE ports.
>

Should just work

Philip Papadopoulos

unread,
Aug 12, 2010, 5:49:48 PM8/12/10
to Larry Baker, Discussion of Rocks Clusters
On Thu, Aug 12, 2010 at 10:39 AM, Larry Baker <ba...@usgs.gov> wrote:

> Phil (or anyone else),
>
> 1. Our new cluster NAS node will have two quad-core Nehalem processors and
> a RAID hardware controller. For ZFS, do you recommend using the hardware
> RAID to create a single RAID group, so ZFS sees one big disk? Or, do you
> recommend setting up the hardware RAID as JBOD, letting ZFS take advantage

> of the 8 cores to parallelize the disk I/O? Expansion is not an issue -- we


> will fully populate all the hard drive slots in the chassis.
>

I -MUCH- prefer software RAID for one very very important reason: Hardware
Raid is proprietary format, software raid is not. If (like we have seen)
your HW raid controller fails 3 years out, the chances that you lost all
data is very high. With S/W Raid, you can move all disks and rebuild the
raid on new hardware. This is irrespective of Solaris or Linux. I personally
ditched HW RAID years ago.

>
> 2. Do your clients see the 4 ZFS servers as a single storage pool, or as 4
> separate pools? Does this require more than just Solaris/ZFS?
>

We run them as Primary + Replica Storage pairs. (Not hot failover, just
simple, reliable disk-to-disk backup). They are separate file systems. For
home area this is not a problem unless a particular
user needs more that 1 storage server's worth of capacity. We build a file
system for every user
(yes, on Triton that is 100s of file systems).

>
> 3. From what I can tell, OpenSolaris 2009.6 has Infiniband support and
> talks about integration with Linux boxes on the same Infiniband. Do you
> know if IP-over-Infiniband between x86-64 Rocks clients and an OpenSolaris
> 2009.6 ZFS NAS works? That would enable us to avoid purchasing Ethernet
> switches with 10GbE ports.
>

Maybe. We've tried this on a different project, but later found out that we
had the wrong IB card
for the Solaris side (it was a mem-free Mellanox instead of mem-full.). We
had intermittent connectivity. We just received a replacement card and will
know in a week or so if it works (need a QDR to CX4 cable because we are
connecting to older Cisco IB switch) . We do all of our Solaris work on
Solaris 10, not OpenSolaris (though we may have to move to OpenSolaris).

I would be reticent to say IB Linux to IB Solaris will "just work". IB is
really not a network. It's a cluster fabric where small differences in
client side software can mean work or no work.

-P

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)

-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100812/43dca17c/attachment.html

laotsao 老曹

unread,
Aug 12, 2010, 6:42:50 PM8/12/10
to npaci-rocks...@sdsc.edu
for IB and other features of network and opensolaris
please check this link
http://www.opensolaris.com/learn/features/networking/networkall/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100812/09fa4b56/laotsao.vcf

laotsao 老曹

unread,
Aug 12, 2010, 6:43:55 PM8/12/10
to npaci-rocks...@sdsc.edu

Jon Forrest

unread,
Aug 13, 2010, 9:06:07 PM8/13/10
to npaci-rocks...@sdsc.edu
On 8/12/2010 2:49 PM, Philip Papadopoulos wrote:

> We do all of our Solaris work on Solaris 10,
> not OpenSolaris (though we may have to move to OpenSolaris).

I don't think so. OpenSolaris R.I.P.
http://opensolaris.org/jive/thread.jspa?messageID=496203&tstart=0

Jon

LaoTsao 老曹

unread,
Aug 14, 2010, 9:07:33 AM8/14/10
to npaci-rocks...@sdsc.edu
By reading the link:
it seems that opensolaris as one knows is dead, no more 2010.05 binary
release
no more nightly build release (onnv) , the last was 134 and 2010.03 preview
But there will be Solaris 11 express , free for developer by the end of
this year
Some form of the source (not all) will be available, not sure after
Solaris 11 (next year) or Solaris 11 express (later this year).
Some may know that opensolaris is the OS for ZFS storage (7000 unified
storage), could be for Lustre 2.0
regards

-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard

Size: 221 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100814/960c4bed/laotsao.vcf

Philip Papadopoulos

unread,
Aug 14, 2010, 2:13:49 PM8/14/10
to Discussion of Rocks Clusters
On Sat, Aug 14, 2010 at 6:07 AM, LaoTsao 老曹 <lao...@gmail.com> wrote:

> By reading the link:
> it seems that opensolaris as one knows is dead, no more 2010.05 binary
> release
> no more nightly build release (onnv) , the last was 134 and 2010.03 preview
> But there will be Solaris 11 express , free for developer by the end of
> this year
> Some form of the source (not all) will be available, not sure after
> Solaris 11 (next year) or Solaris 11 express (later this year).
> Some may know that opensolaris is the OS for ZFS storage (7000 unified
> storage), could be for Lustre 2.0
> regards
>
>
>
> On 8/13/2010 9:06 PM, Jon Forrest wrote:
> > On 8/12/2010 2:49 PM, Philip Papadopoulos wrote:
> >
> >> We do all of our Solaris work on Solaris 10,
> >> not OpenSolaris (though we may have to move to OpenSolaris).
> >
> > I don't think so. OpenSolaris R.I.P.
> > http://opensolaris.org/jive/thread.jspa?messageID=496203&tstart=0
> >
>

Well I guess that means that our decision to not go OpenSolaris a few years
ago was the right one.
The uncertainty is the cost of Solaris OS support (we have quotes that are
quite reasonable), but Oracle
won't sell us multi-year support contracts (actually I just want right to
use, right to posted updates like
RHEL licenses). They will sell us only a year-at-a-time. Solaris is quite
robust, but eventually, I think the Linux community will have a file system
that is competitive with ZFS in terms of both features and stability.

Jon, thanks for the quick repost of Opensolaris going the way of the dodo.

Why is it after 1.5 decades of Linux that storage is still "bad".

-P

> Jon
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: laotsao.vcf
> Type: text/x-vcard
> Size: 221 bytes
> Desc: not available
> Url :
> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100814/960c4bed/laotsao.vcf
>
>

--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100814/ecddae87/attachment.html

Jon Forrest

unread,
Aug 14, 2010, 3:21:22 PM8/14/10
to npaci-rocks...@sdsc.edu
On 8/14/2010 11:13 AM, Philip Papadopoulos wrote:

> Why is it after 1.5 decades of Linux that storage is still "bad".

It depends on what aspect you're claiming is
bad.

Compared to the very early days of Unix, file systems
are much more reliable. It's rare for a file system
to become corrupt due to bugs in file system implementation.
This is good.

For most desktop uses, Linux storage is fine.
Even ext3 isn't a bottleneck for most desktop
users who spend most of their time in web browsers,
email programs, or picking their nose.

But, us HPC guys require most storage than
the average bear so we see things and do
things that most people don't. That's why from
our perspective things look so bad.

Jon

Larry Baker

unread,
Sep 10, 2010, 7:52:05 PM9/10/10
to Discussion of Rocks Clusters
I am configuring a test Rocks 5.3 cluster with a roll-your-own NFS
server that I want to incorporate into the rest of the Rocks cluster.
I have read Jon Forrest's e-mail from July 25, Using a Non-Rocks NFS
Server in a Rocks Cluster, which is similar to what I want to do.
Except, I don't want user home directories there. From the point of
view of the cluster, I want it to look like a Rocks NAS appliance.
From the point of view of our LAN, I want it to look like a normal
NFS file server. The private LAN for the cluster is 10.170.47.128/26;
the front end is 10.170.47.129. At this point I have created the NAS
appliance and I have created the Rocks front end. On the Rocks front
end, I typed:

> # rocks add host nas-0-0
> # rocks set host interface ip nas-0-0 eth0 10.170.47.130
> # rocks set host interface ip nas-0-0 eth1 <public IP address>
> # rocks set host boot nas-0-0 action=os
> # rocks set host attr nas-0-0 managed false

I thought "rocks sync dns" would update /etc/hosts, but it didn't:

> # cat /etc/hosts
> 127.0.0.1 localhost.localdomain localhost
> 10.170.47.129 thera.local thera # originally frontend-0-0
> <public IP address> thera.wr.usgs.gov
>
> # rocks sync dns
>
> # cat /etc/hosts
> 127.0.0.1 localhost.localdomain localhost
> 10.170.47.129 thera.local thera # originally frontend-0-0
> <public IP address> thera.wr.usgs.gov

So, I did a "rocks sync config". That didn't help either:

> # rocks sync config
>
> # cat /etc/hosts
> 127.0.0.1 localhost.localdomain localhost
> 10.170.47.129 thera.local thera # originally frontend-0-0
> <public IP address> thera.wr.usgs.gov

What else do I need to do for Rocks to set up the DNS for nas-0-0 as a
host in the cluster (also, whose IP address is no longer available for
insert-ethers to assign)? Should I have left nas-0-0 as "managed"? I
don't know exactly what that means for a roll-your-own NAS
"appliance". I didn't want Rocks to possible mess with anything on
nas-0-0, e.g., respond to a DHCP boot request by reinstalled Rocks
over the CentOS roll-your-own NAS I want.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Jon Forrest

unread,
Sep 10, 2010, 8:59:20 PM9/10/10
to npaci-rocks...@sdsc.edu
On 9/10/2010 4:52 PM, Larry Baker wrote:

> I don't want user home directories there.

What are you going to put there, data?

> From the point of view
> of the cluster, I want it to look like a Rocks NAS appliance.

From what you describe down below, the only aspect of
a Rocks NAS appliance you want is for your appliance
to have a Rocks-assigned IP address and DNS name.

> What else do I need to do for Rocks to set up the DNS for nas-0-0 as a
> host in the cluster (also, whose IP address is no longer available for
> insert-ethers to assign)? Should I have left nas-0-0 as "managed"? I
> don't know exactly what that means for a roll-your-own NAS "appliance".
> I didn't want Rocks to possible mess with anything on nas-0-0, e.g.,
> respond to a DHCP boot request by reinstalled Rocks over the CentOS
> roll-your-own NAS I want.

One way to do what you want would be for you to set up
your NAS appliance to PXE boot. Start up 'insert-ethers'
and pick the NAS appliance option. Do the boot but as soon as
you see that 'insert-ethers' has assigned the IP address,
then reboot your appliance from the local disk(s). If you're
super paranoid then you can remove your disks when you
do the PXE boot. Once you've booted from the local disk
then configure your appliance to use the IP address assigned
by insert-ethers as its static IP address. You'd still use nas-0-0
as its DNS name. You can remove the "managed" attribute from
nas-0-0 so it won't participate in the Rocks management
commands.

I haven't tried this but I think it will work.

Cordially,

Jon Forrest

unread,
Sep 12, 2010, 12:25:32 PM9/12/10
to npaci-rocks...@sdsc.edu
I had said:

"One way to do what you want would be for you to set up
your NAS appliance to PXE boot. Start up 'insert-ethers'
and pick the NAS appliance option. Do the boot but as soon as
you see that 'insert-ethers' has assigned the IP address,
then reboot your appliance from the local disk(s). If you're
super paranoid then you can remove your disks when you
do the PXE boot."

I woke up in the middle of the night realizing
that doing a PXE boot was probably excessive.
I suspect you could accomplish the same
thing by just setting your appliance
to do a DHCP boot. This way you won't
have to worry about your appliance actually
loading a Rocks kernel. Since all you
want is an assigned name and address this
should work fine.

Jon

Mike Hanby

unread,
Sep 13, 2010, 11:25:13 AM9/13/10
to npaci-rocks...@sdsc.edu
For that to work, you'd also have to add the MAC address for eth0 to the database, something like "rocks set host interface mac nas-0-1 eth0 aa:bb:cc..."

Don't quote me on the syntax :-)

Larry Baker

unread,
Sep 13, 2010, 1:43:52 PM9/13/10
to Discussion of Rocks Clusters

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Sep 10, 2010, at 5:59 PM, Jon Forrest wrote:

> On 9/10/2010 4:52 PM, Larry Baker wrote:
>
>> I don't want user home directories there.
>
> What are you going to put there, data?

Yes.

>
>> From the point of view
>> of the cluster, I want it to look like a Rocks NAS appliance.
>
> From what you describe down below, the only aspect of
> a Rocks NAS appliance you want is for your appliance
> to have a Rocks-assigned IP address and DNS name.

On the private subnet, yes. I want to IP address to be marked
unavailable for any compute nodes on the private subnet. The name/IP
address should appear in /etc/hosts on the cluster noes. I have
already assigned a public DNS name on our network (thera-nas), as is
also done for the cluster front end (thera).

Reply all
Reply to author
Forward
0 new messages