Marc Langlois.
Key Seismic Solutions, Calgary, AB, Canada.
> We are experiencing an itermitent NFS performance slowdown on a
> Solaris 8 Ultra-80 when writing to filesystems mounted from a Redhat 8
> NFS server. The write performance is good initially, then after some
> period of time (10's of minutes), write performance slows by at least
> a factor of 10 and the Solaris system CPU usage jumps to > 40%.
> Strangely, if any access to the target filesystem on the NFS server is
> done (e.g. ls -l), the client system CPU drops, and the NFS write
> performance returns to normal. But, usually after another 10 - 20
> minutes, the slowdown re-occurs, requiring another "ls -l" to speed it
> up again. The output files are fairly large (> 6 GBytes) and the
> writes are purely sequential.
>
> We spent a fair amount of time getting Solaris NFS clients to work
> properly with a Linux server, so I'm quite familiar with most of the
> config options, but this one has me stumped.
Hmm. Is current Linux NFS still broken? If so, there might not be
much you can do - apart from use a working NFS server, like Solaris.
--
Rich Teer, SCNA, SCSA
President,
Rite Online Inc.
Voice: +1 (250) 979-1638
URL: http://www.rite-online.net
Are there a lot of files in the directories that you run the ls -l on?
(i.e. > 4096 files per directory).
How much memory does the Linux server have? I am seeing contention between
the file system cache and the memory profile of the nfsd processes for the
available ram. Its gotten to the point where I think 1gb of ram is kind of
a minimum for a heavily used Linux nfs server.
Can you do anything to reduce the memory usage on the Linux server such as
not running X, cutting out some of the unneeded daemons, etc?
Got FreeBSD? :-)
Good luck!
Later
Mark Hittinger
bu...@pu.net
I've seen this problem on 2 hosts that did not know eachother, the NFS
mount was done via IP, and the relevant host in the other end was not in
/etc/hosts(actually, /etc/hosts contained only loopback and localhost).
Once we entered the hosts in their /etc/hosts files (on both client and
server) everything worked after mounting NFS via hostname.
Might be your problem if ou're not running NIS or something like that.
HTH
/tony
Marc Langlois wrote:
> We are experiencing an itermitent NFS performance slowdown on a
> Solaris 8 Ultra-80 when writing to filesystems mounted from a Redhat 8
> NFS server. The write performance is good initially, then after some
> period of time (10's of minutes), write performance slows by at least
> a factor of 10 and the Solaris system CPU usage jumps to > 40%.
Use lockstat(1M) kernel profiling and we should get a feel for where
the Solaris client is spending it's time.
> Strangely, if any access to the target filesystem on the NFS server is
> done (e.g. ls -l), the client system CPU drops, and the NFS write
> performance returns to normal. But, usually after another 10 - 20
> minutes, the slowdown re-occurs, requiring another "ls -l" to speed it
> up again. The output files are fairly large (> 6 GBytes) and the
> writes are purely sequential.
Weird. Does an 'ls' alone kick if out of the stupor? I ask because
a ls -l may require an attribute lookup (to get the date, size etc info)
whereas ls can just use cached fileanames and may not perform any/much
over the wire activity.
> We spent a fair amount of time getting Solaris NFS clients to work
> properly with a Linux server, so I'm quite familiar with most of the
> config options, but this one has me stumped.
>
> Has anyone seen this behaviour, or have some tips on where to start
> looking to try to fix it? The same problem occurred consistently
> before applying the Solaris 108528-27 kernel ugrade, but it's still
> there, although it's now much harder to reproduce.
> System details are:
>
> Solaris 8 NFS client:
> Ultra-80, Kernel version: SunOS 5.8 Generic 108528-27 Nov 2003
> mount options: vers=3,proto=udp,sec=none,hard,intr,link,symlink,acl,
> rsize=8192,wsize=8192,retrans=5,timeo=11
> Gigabit Ethernet controller.
Why proto=udp (unless the server does not support TCP transport for NFS?)?
You should snoop activity between client and server while the problem is
being exhibited. You can also watch nfsstat -c on the Solaris client
which may be counting bad rpc calls etc.
It would also help to characterize the failure. eg, if the client is left
idle does this happen. Does it happen only after large file write or reads etc.
Can you force it to happen by running some io benchmark om the nfs mounted
filesystem?
> Redhat 8 NFS server:
> Dual Pentium Xeon 2.8 GHz, Redhat 8, kernel: 2.4.20-20.8smp
> nfs-utils-1.0.5-1
> export options: rw,insecure,sync,no_subtree_check,insecure_locks
> Intel Pro/1000 Gigabit Ethernet, e1000 driver vers. 5.2.20.
>
Gavin
I have. Linux is doing it. It's flooding the Solaris machine with UDP
packets, until it eventually brings it down on its knees. If you run
`snoop` on the Solaris machine, you'll quickly figure out what is going on.
Do yourself a favor: don't use Linux for NFS until they implement it
properly. Up to this point Linux NFS sucks dead bunnies through a bent
straw.
The only NFS client/server that Linux works good with is - Linux. Does that
remind anyone of Microsoft Windows?
> Got FreeBSD? :-)
No (:-) We want stuff working yesterday instead of two centuries from now,
without having to build it ourselves. Got Solaris, or even better, got
IRIX? (;-)
Thanks for the responses. Here's some additional info:
I've dumped some packets with snoop in both the "slow" and "fast"
modes. The NFS requests are definitely different between the two (I
didn't realize that both read and write requests were being done).
From a sample of 10,000 packet headers, I found the following
distribution:
"fast" mode:
NFS READ3: 4259 packets
NFS WRITE3: 0
NFS GETATTR3: 0
NFS COMMIT3: 0
UDP: 5741
RPC: 0
"slow" mode:
NFS READ3: 7039 packets
NFS WRITE3: 66
NFS GETATTR3: 772
NFS COMMIT3: 211
UDP: 1900
RPC: 12
I think this shows that the "slow" phase results from the additional
NFS GETATTR requests. Possibly, the "ls -l" on the client host forces
an attribute lookup that refreshes the NFS client attribute cache, so
they don't have to be fetched from the server machine, which speeds
things up. But why does it revert back to slow mode? I'll try
increasing the acregmin and acregmax mount options to see if that
changes anything.
> > We are experiencing an itermitent NFS performance slowdown on a
> > Solaris 8 Ultra-80 when writing to filesystems mounted from a Redhat 8
> > NFS server. The write performance is good initially, then after some
> > period of time (10's of minutes), write performance slows by at least
> > a factor of 10 and the Solaris system CPU usage jumps to > 40%.
>
> Use lockstat(1M) kernel profiling and we should get a feel for where
> the Solaris client is spending it's time.
The Solaris client seems to be behaving properly i.e. normal read(),
write() and lseek() calls.
>
> > Strangely, if any access to the target filesystem on the NFS server is
> > done (e.g. ls -l), the client system CPU drops, and the NFS write
> > performance returns to normal. But, usually after another 10 - 20
> > minutes, the slowdown re-occurs, requiring another "ls -l" to speed it
> > up again. The output files are fairly large (> 6 GBytes) and the
> > writes are purely sequential.
>
> Weird. Does an 'ls' alone kick if out of the stupor? I ask because
> a ls -l may require an attribute lookup (to get the date, size etc info)
> whereas ls can just use cached fileanames and may not perform any/much
> over the wire activity.
No, a plain "ls" has no effect. I agree, it looks like an attribute
lookup is required to get things going. I've also noticed that the "ls
-l" can also toggle from "fast" to "slow" mode.
[ snip...]
> > System details are:
> >
> > Solaris 8 NFS client:
> > Ultra-80, Kernel version: SunOS 5.8 Generic 108528-27 Nov 2003
> > mount options: vers=3,proto=udp,sec=none,hard,intr,link,symlink,acl,
> > rsize=8192,wsize=8192,retrans=5,timeo=11
> > Gigabit Ethernet controller.
>
> Why proto=udp (unless the server does not support TCP transport for NFS?)?
>
The NFS server in the Linux 2.4.20 kernel doesn't support TCP yet. I
think it's in the latest 2.6 kernel, but we can't upgrade due to the
risk of breaking 3rd party apps.
> You should snoop activity between client and server while the problem is
> being exhibited. You can also watch nfsstat -c on the Solaris client
> which may be counting bad rpc calls etc.
>
see above. There aren't any bad rpc calls. The only change is the
increased number of getattr requests while in slow mode.
> Can you force it to happen by running some io benchmark om the nfs mounted
> filesystem?
I haven't been able to reproduce the problem outside of one of our
(critical) 3rd party apps. I've tried bonnie and my own test copy
program, but they all show fast performance.
Marc.
I would have agreed with you a year ago, but the latest Linux NFS
implementation has improved a lot. I'm still not sure if the problem
lies in the Solaris client or Linux server, or both. Running snoop on
the Solaris box shows normal UDP rates (see my other post on this
thread for details).
Marc.
Very true. I just replaced a E5500 (Sol7) with an inexpensive Linux
based NFS server. We're getting anywhere from 20 - 60% better
(yes... better) performance.
Actually, tcp v3 support has been in since kernel 2.4.19.
You probably meant v4... that's in 2.6.
> Very true. I just replaced a E5500 (Sol7) with an inexpensive Linux
> based NFS server. We're getting anywhere from 20 - 60% better
> (yes... better) performance.
And what about a modern version of Solaris on that same x86
hardware?
> And what about a modern version of Solaris on that same x86
> hardware?
I doubt you'll read anything from him on that. Anyone with a mind of
replacing Solaris with Linux for an NFS server can't be taken seriously
anyway.
If its possible and your willing, could you please list what parts
you used to build the inexpensive Linux NFS server? I'm considering
switching from a sun centric cad system ( which I'm extremely used to)
to the pc hardware/Linux which I know didly about.
Thanks
> switching from a sun centric cad system ( which I'm extremely used to)
> to the pc hardware/Linux which I know didly about.
So why not use a PeeCee/Solaris x86 solution? At least you'll know
the OS.
The traditional reason for this (may have changed in more recent Linux
impls?) is that the linux nfs server would ack stuff as committed to
stable storage when it wasn't. It's like using 'fastfs' on Solaris
UFS - fast but unsafe. This mention always stirs a religious storm,
but Solaris chooses to stick to the protocol - data integrity is
paramount, way above performance.
I think Solaris 7 NFS over high bandwidth links (eg gigabit, atm, even
100mbit in some cases) required some tuning to do well. First off you
had to fiddle tcp windows, and that alone gave a huge boost.
Gavin
Close minded people can't be taken seriously at all.
We're running a Compaq DL380 with dual 2.4G Xeons and 4G memory.
The disk is a Nexsan 4TB unit SCSI attached. We are running
SUSE 8.2 on that server.
We actually have two of these setups. One is for disk cached
backups (we snaphost incrementals and backup to a fairly
fast LTO 72 tape stacker with 2 drives) and the other actually
has 2 DL380's connected to one 4TB unit each viewing different
LUNs and it used for our home dirs across the net. We can fail
over to one unit if needed which allows us to do radical
upgrades when necessary.
Main reason for ditching the E5500 (though the performance
and capacity wasn't too great either) is because the machine
is leased. Also, because the original implementation (before my time)
did not include any reasonable fail over mechanism, it was a
critical single point of failure and upgrades were way too
risky... just too many variables... AND at $5K/mo... just not
worth it... the parts are going to start failing very soon.
The DL380's are 2U units... we have a PCI-X fibre NIC (similar
to what we had on the E5500.. except a WHOLE lot faster btw).
Now the obvious question is why didn't you spend $$$$$ on upgrades
to the E5500... well...duh... why should I buy parts a 4x
normal cost, when I can get better supported parts for the
DL380. I know a lot of people like to talk about how
great Sun quality is... but I just don't see it. The only
Seagate drives we've ever lost have been on Sun boxes (well
we did lose an IBM drive... no surprise there). Can't
tell you the number of bad memory boards we've had to
replace (on Sun) over time. The price of the x86 architecture allows
you to VERY easily rip up the whole thing after a couple
of years (to do radical upgrades)... where with Sun.. it's a
fairly expensive proposition (thus with Sun we're left with
less than up-to-date HW/SW configs most of the time).
Now.. if money is infinite... you are welcome to implement
Sun, IBM, HP (PA or Itanium2)... but that's not always
the best solution. All depends on your budget and requirements.
We have about 70 Solaris hosts still... you just have to
use the right solution for the right tasks. We run everything
from Solaris 2.5.1 (yep) to Solaris 9 (no 10 yet). I
think there's even a SunOS box out there.. and a few
Solaris on Intel that we manage. So we are a pro Sun shop...
but we're not blind zealots who don't use anything else.
We've got about 20 HPUX and 20 AIX (and a few NCR, Tru64, Sequent
and odds and ends).. the Linux side is
growing very fast... we even have about 30 Linux DESKTOPs
deployed at this site (bye bye Windows). I look for that number to
increase over the coming year. These are just the machines
my team of 3 manages... at just ONE our sites.
> Close minded people can't be taken seriously at all.
If you're alluding at me, you just picked a wrong person to label. I've
worked on various versions of Linux on various platforms, and professionally
to boot.
> We're running a Compaq DL380 with dual 2.4G Xeons and 4G memory.
> The disk is a Nexsan 4TB unit SCSI attached. We are running
> SUSE 8.2 on that server.
>
> We actually have two of these setups. One is for disk cached
> backups (we snaphost incrementals and backup to a fairly
> fast LTO 72 tape stacker with 2 drives) and the other actually
> has 2 DL380's connected to one 4TB unit each viewing different
> LUNs and it used for our home dirs across the net. We can fail
> over to one unit if needed which allows us to do radical
> upgrades when necessary.
>
> Main reason for ditching the E5500 (though the performance
> and capacity wasn't too great either) is because the machine
> is leased. Also, because the original implementation (before my time)
> did not include any reasonable fail over mechanism, it was a
> critical single point of failure and upgrades were way too
> risky... just too many variables... AND at $5K/mo... just not
> worth it... the parts are going to start failing very soon.
Well, DUH. It's time to upgrade to a newer Sun platform, not replace it
with PC-bucket hardware.
> The DL380's are 2U units... we have a PCI-X fibre NIC (similar
> to what we had on the E5500.. except a WHOLE lot faster btw).
> Now the obvious question is why didn't you spend $$$$$ on upgrades
> to the E5500... well...duh... why should I buy parts a 4x
> normal cost, when I can get better supported parts for the
> DL380. I know a lot of people like to talk about how
> great Sun quality is... but I just don't see it. The only
> Seagate drives we've ever lost have been on Sun boxes (well
> we did lose an IBM drive... no surprise there). Can't
> tell you the number of bad memory boards we've had to
> replace (on Sun) over time. The price of the x86 architecture allows
> you to VERY easily rip up the whole thing after a couple
> of years (to do radical upgrades)... where with Sun.. it's a
> fairly expensive proposition (thus with Sun we're left with
> less than up-to-date HW/SW configs most of the time).
>
> Now.. if money is infinite... you are welcome to implement
> Sun, IBM, HP (PA or Itanium2)... but that's not always
> the best solution. All depends on your budget and requirements.
One of the questions I always asked at interviews is what their yearly
budget for IT is. If it's under 1,000,000 Euro or USD, it's very likely a
company I don't want to be working for, because it means I'll be stuck
supporting/buying PC-bucket crap hardware like in the above scenario.
And if I had a sysadmin that wasn't capable of procuring modern Sun hardware
at reasonable or below PC-bucket prices, I'd fire him because of his
incompetence, no questions asked.
>> Now.. if money is infinite... you are welcome to implement
>> Sun, IBM, HP (PA or Itanium2)... but that's not always
>> the best solution. All depends on your budget and requirements.
>
> One of the questions I always asked at interviews is what their yearly
> budget for IT is. If it's under 1,000,000 Euro or USD, it's very likely a
> company I don't want to be working for, because it means I'll be stuck
> supporting/buying PC-bucket crap hardware like in the above scenario.
>
Unfortunately, some of us definitely do not have a 1M budget for hardware.
We are forced to buy the most bang for the buck for a handfull of seats.
Where I'm planning for a startup, whose life expectancy (good or bad) is
a few years, a lower cost, faster, PC ( shudder!) is looking more like the
correct choice. I've worked with Suns for over 10 years. That's my
comfort zone. Its just that we need the fastest machines that we can
afford. If they die in 3 years, OK. Electronic design automation needs
grow so fast that the sun machines that where bought for the last startup
3.5 years ago (Ultra 10s, 60s, 450) should not be used for our current
designs.
> And if I had a sysadmin that wasn't capable of procuring modern Sun hardware
> at reasonable or below PC-bucket prices, I'd fire him because of his
> incompetence, no questions asked.
When we have bought sun workstations in the numbers that I'm used
to (up to 20, maybe 5 at a time), we never got deals like that.
What sort of total dollar amounts enabled the discounts that you
refer to?
> Unfortunately, some of us definitely do not have a 1M budget for hardware.
> We are forced to buy the most bang for the buck for a handfull of seats.
> Where I'm planning for a startup, whose life expectancy (good or bad) is
> a few years, a lower cost, faster, PC ( shudder!) is looking more like the
> correct choice. I've worked with Suns for over 10 years. That's my
> comfort zone. Its just that we need the fastest machines that we can
> afford. If they die in 3 years, OK. Electronic design automation needs
> grow so fast that the sun machines that where bought for the last startup
> 3.5 years ago (Ultra 10s, 60s, 450) should not be used for our current
> designs.
Perhaps so, but if you're running a startup, then surely you have heard of
this thing called "tax writeoff" and "depriciation".
If you're looking for cheap number crunching, it's the PC-bucket. If you're
looking for overall throughput and you bought a bunch of PC-buckets to do
the job, you just blew your money on bunch of cheapass hardware. And as an
added "bonus", you should be able to write off LESS tax on your accounting
ledgers. Now that's what I'd call doing yourself a "favor".
> When we have bought sun workstations in the numbers that I'm used
> to (up to 20, maybe 5 at a time), we never got deals like that.
> What sort of total dollar amounts enabled the discounts that you
> refer to?
You shouldn't have any workstations. "The network is the computer", did you
forget your history? A couple of 24/32-bit X-capable terminals can do the
job. And this day and age, any machine can be an X terminal. So again, you
wasted precious funding for your startup by putting computing resources at
the desktop, when they should reside on the server(s). A 100mbps/copper
gigabit switch, several 19" 1U clustered SPARC servers and a few $600
PC-buckets with Linux as the X server/AutoClient would have worked quite
well. It's called cost consolidation, and maximizing your return on
investment. You didn't do either, and I have to wonder why, when there are
so many obvious options right in front of your nose. If you're starting a
company, you need every Dollar/Quid/Euro you can spare. From what you
wrote, it looks like you did the exact oposite. Thinking you're saving
money you spent more than you were supposed to. (And no, I'm not affiliated
with Sun, unfortunately.)
Now for the options. You can get SPARC based servers for as low as $995!
Cluster few of those together and you have a small supercomputing node. For
~$2500 you can get even bigger and better servers. Now $2500 is a pretty
damn good deal considering that a) it's a Sun server class machine and b)
you're not gonna touch a serious PC-bucket server for under $3500, unless
you're building 1-3U form factor PC-buckets with redundant power supplies,
disks and the like. And I'd love to see someone do that on their own for
under $3000 USD, let alone $2500 or even $995.
Next option: register as a developer with Sun, you should be able to get
aggressive discounts.
Yet another option: hire a good UNIX system administrator with logistics and
network design experience that is capable of doing the things I described
above.
If your startup ever takes off and starts making money, switch to SGI
clustered Origin servers, since it looks like you need number crunching and
visualization capabilities.
Just thought I'd let you all know, as parts of this thread have
degraded into
a (IMHO, uninformed & close-minded) Linux/PC hardware bashing rant.
Thanks for the useful input,
Marc.
Gavin Maltby <G_a_v_i_n....@sun.com> wrote in message news:<c29rtl$5oj$1...@new-usenet.uk.sun.com>...
Welcome to comp.unix.solaris where 90% of the posts are
Linux/PC hardware bashing rants!! (didn't you know this
is really comp.unix.solaris.advocacy.everything.else.sucks?)
... well, now you know.
Figures it was a Solaris bug.. and thanks for posting the patch info!
I'm going to check my Solaris clients for this.
>
> Thanks for the useful input,
You found the best solution .. comp.protocols.nfs (if you want
useful replies).