Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Slow file transfers on network

253 views
Skip to first unread message
Message has been deleted

Kalyan Manchikanti

unread,
Apr 6, 2007, 9:40:37 PM4/6/07
to
On Apr 6, 8:10 pm, l...@the.net wrote:
> Okie, here's what I have hardware wise...
>
> Netra T 1405 (similar to Ultra 80 and 420R).
> four 440MHz processors.
> 4 Gigs RAM.
> Onboard NIC.
> LAN is 100mbit- HP Procurve switched, not hubbed.
> LAN has local DHCP probvided by router.
> 6 computers on LAN, 5 windows, 1 Solaris 10. All are current on
> updates.
>
> What Happens:
>
> Using FTP to get into the Netra, max file transfer rate is about
> 3mb/s. Using Samba and Windows networking it's still about 3mb/s. This
> is true for traffic going in or out of the Netra.
> On a very good day, it's 6mb/s but that never lasts more than a minute
> or so, then it's back to 2 or 3mb/s.
>
> Transfer rates in a windows-to-windows connection on this netweork
> exceeds 60mb/s.
>
> Connecting to the internet is no problem, and I can max out my cable
> modem downloading from the internet to the Netra, but my cable modem
> is limited at 3mb/s.
>
> Why is the Netra so slow on in-network transfers??
> I've swapped ports on the switch, changed all the cables, even plugged
> the Netra into a 'known good' connection where a windows box gets
> 60mb/s.
>
> Server load is less than 2%. (there ain't nuthin running on it yet..)
>
> No other network traffic during tests. The Solaris install is fresh.
>
> I get no errors in long ping sessions with any of the other computers,
> even when I send large ping packets (16384 bytes).
>
> Any ideas from the gurus here?
>
> Thanks!


What onboard NIC is this ? Did you check the duplex settings on the
server ? I do a lot of solaris to windows transfers on a lan and I
usually top out at 6 or 7 mbps as well..

Message has been deleted

Tim Bradshaw

unread,
Apr 7, 2007, 4:10:03 AM4/7/07
to
On 2007-04-07 02:10:30 +0100, lo...@the.net said:
> LAN is 100mbit- HP Procurve switched, not hubbed.
> [...]

> Transfer rates in a windows-to-windows connection on this netweork
> exceeds 60mb/s.

Do you mean 60 M *bits*/sec or 60 Mbytes/sec? If, as I suspect, the
latter then there is something a it wrong with your measurement
technique, since a 100Mbit LAN has an optimistic peak performance of
10Mbytes/sec.

--tim

Kalyan Manchikanti

unread,
Apr 7, 2007, 10:50:10 AM4/7/07
to
On Apr 6, 10:55 pm, l...@the.net wrote:
> On 6 Apr 2007 18:40:37 -0700, "Kalyan Manchikanti"

>
>
>
> <kalyan.manchika...@gmail.com> wrote:
> >On Apr 6, 8:10 pm, l...@the.net wrote:
> >> Okie, here's what I have hardware wise...
>
> >> Netra T 1405 (similar to Ultra 80 and 420R).
> >> four 440MHz processors.
> >> 4 Gigs RAM.
> >> Onboard NIC.
> >> LAN is 100mbit- HP Procurve switched, not hubbed.
> >> LAN has local DHCP provided by router.
> The onboard NIC is the single 100mb port embedded on the mainboard of
> the computer. The network interface was recognized as 100mb and set
> up by the OS during the install.(and the "100" light on the network
> switch is lit) How would I check/change the duplex setting? I'm
> kinda new to SPARC and Solaris.

I meant the "type" of NIC. Do a "ifconfig -a" and it will show you
what type of interface you have. If it is hme0, then "ndd -get /dev/
hme0 link_speed and ndd -get /dev/hme0 link_duplex" should show 100
and 1 respectively..


> (BTW- the Netra is using an 18Gig 10,000 RPM SCSI drive as boot device
> and extended storage is 3 more 9Gig 10,000 RPM drives (all drives are
> Sun parts) so I don't think there's a bottleneck in disk I/O.. I get
> the same slow speeds transfering files to/from either the boot disk or
> the extended storage.)
>
> One rumor I've heard that setting the MTU higher than ~1500 can help
> improve transfer speeds, but again for me the newbie, I'm still
> digging through man pages trying to figure out how and if this would
> even help..
>
> I just don't see how a machine designed as a server has such low
> network I/O capacity- a consumer grade windows box is 10 times
> faster... How could this Sun box survive in a real world situation
> serving up web pages and hosting a FTP site for a dozen clients?


I'd revisit the measurements here again. I am just not sure about the
60 Mb/s windows transfer. As Tim suggested is it M bits / sec or M
bytes /sec? 6 mbytes /sec though not stellar is a very acceptable
speed of transfer on a 100 mbps lan.


> What am I doing wrong?

Message has been deleted
Message has been deleted

Tim Bradshaw

unread,
Apr 7, 2007, 4:55:41 PM4/7/07
to
On 2007-04-07 19:48:03 +0100, lo...@the.net said:
> Further experimentation done:
> I built a small lan, with the Netra, and a windows box and a windows
> laptop on a Netgear router/switch. No internet connection, just the 3
> computers. windows-windows transfers rermain at ~60mb/s. Windows-Sun
> and Sun-Windows transfers are still <6mb/s.

You have a duplex issue, almost certainly, or possibly a broken
NIC/switch/switch port/cabke. But look at duplex first.

Dirk Munk

unread,
Apr 7, 2007, 5:19:54 PM4/7/07
to
lo...@the.net wrote:
> bits. sorry- sonetimes i'm sloppy on caps, but this time it is bits.
> I've been considering upgrading to GigaBIT lan, but now I'm
> questioning if it will be worth it if performance of the Netra is
> still only 1/10th the speed of a windows box.

>
> Further experimentation done:
> I built a small lan, with the Netra, and a windows box and a windows
> laptop on a Netgear router/switch. No internet connection, just the 3
> computers. windows-windows transfers rermain at ~60mb/s. Windows-Sun
> and Sun-Windows transfers are still <6mb/s.
>

The Netgear switch is a unmanaged switch I suppose. If so, then keep the
following in mind:

These unmanaged SoHo switches are meant to be used in combination with
nics that are set to autonegotiating. If this is not the case, the
switch port will *allways* fall back to half-duplex mode. So if you set
your nic to full-duplex, the switch port will be set to half-duplex.
Result: packet loss etc. and a lousy transfer rate.

So with a unmanaged switch, you have two posibilities for the nic
setting on your host, autonegotiating or half-duplex. But *never*
full-duplex.

The speed setting (autosensing) is never a problem.

Dan Foster

unread,
Apr 7, 2007, 5:32:19 PM4/7/07
to
In article <rspf13t58bhbq4ki9...@4ax.com>, lo...@the.net <lo...@the.net> wrote:
>>
>>I meant the "type" of NIC. Do a "ifconfig -a" and it will show you
>>what type of interface you have. If it is hme0, then "ndd -get /dev/
>>hme0 link_speed and ndd -get /dev/hme0 link_duplex" should show 100
>>and 1 respectively..

The person whom suggested these commands made slight typos. The correct
commands would be:

# ndd -set /dev/hme instance 0
# ndd -get /dev/hme link_speed
# ndd -get /dev/hme link_mode
# ndd -get /dev/hme link_status

Here's what these values mean:

link_speed: 0 = 10 Mbit, 1 = 100 Mbit
link_mode: 0 = half duplex, 1 = full duplex
link_status: 0 = link down, 1 = link up

I would normally expect all three to return '1'. If any says 0, you have
a problem that could easily explain the performance hit.

---------------------------------------------------------------------------

Are you autonegotiating the speed and duplex setting on both sides?
Or are you hardcoding it on both sides?

You can tell if you have hardcoded speed/duplex on Solaris side by:

# grep hme /etc/system
set hme:hme_adv_autoneg_cap = 0
set hme:hme_adv_100fdx_cap = 1

If you see these lines, it's hardcoded. If you don't see any mention of
hme_adv_ stuff, it's autonegotiating.

For this stuff to work, you need BOTH sides of the Ethernet cable to be
hardcoded... OR... BOTH sides to be autonegotiating.

Is the HP Procurve switch managed or unmanaged? Can you check the switch
side to see if the port is set to autoneg or fixed speed and duplex?

---------------------------------------------------------------------------

In case you were wondering, I can easily max out the 1400/1405 at ~60-70
Mbit/sec via a switch. So the hardware is certainly capable of it. You
just appear to have some sort of issue preventing it. I'm confident it
can be found by working through various commands.

You may also want to note your current TCP settings then increase
certain parameters relating to TCP sliding windows:

# ndd -get /dev/tcp tcp_max_buf
# ndd -get /dev/tcp tcp_xmit_hiwat
# ndd -get /dev/tcp tcp_recv_hiwat

# ndd -set /dev/tcp tcp_max_buf 2097152
# ndd -set /dev/tcp tcp_xmit_hiwat 2097152
# ndd -set /dev/tcp tcp_recv_hiwat 2097152

# ndd -set /dev/tcp tcp_wscale_always 1
# ndd -set /dev/tcp tcp_tstamp_if_wscale 1

---------------------------------------------------------------------------

You also want to remove the disk I/O from measuring network throughput.

Easy way to do that would be use of a network performance test tool such
as ttcp (which requires running it on both ends; ttcp also exists for
Windows).

These types of tool do not do any disk I/O at all, unlike FTP or scp. So
they give a more accurate picture of the network's capabilities while
being unencumbered by protocols.

-Dan

Message has been deleted
Message has been deleted

Dan Foster

unread,
Apr 8, 2007, 5:39:17 AM4/8/07
to
In article <hkjg13lbemi2fnu4j...@4ax.com>, lo...@the.net <lo...@the.net> wrote:
>>
>># ndd -get /dev/tcp tcp_max_buf
> This comes back as "1048576" (setup default?)

>># ndd -get /dev/tcp tcp_xmit_hiwat
> Comes back as "49152" (setup default?)

>># ndd -get /dev/tcp tcp_recv_hiwat
> Also "49152" (setup default?)

Ouch!!! No wonder. I do all my Solaris installs with a tuned
Jumpstart setup so I'd long since forgotten what the OS defaults were. :-)

> _____Here is where the magic worked:______

Great! That's good news; thanks for letting us know.

# ndd -set /dev/tcp tcp_max_buf 2097152

# ndd -set /dev/tcp tcp_xmit_hiwat 1048576
# ndd -set /dev/tcp tcp_recv_hiwat 1048576


# ndd -set /dev/tcp tcp_wscale_always 1
# ndd -set /dev/tcp tcp_tstamp_if_wscale 1

Going to demystify the above a bit -- I don't know what you know
or don't know about TCP/IP tuning, so I'm going to assume you don't know
anything -- please don't take it the wrong way (it will help others, too).

What you were doing with these commands was to basically enable
and then tune something called TCP sliding windows. It's a valuable
thing to tune for your machine; getting this right will basically mean
that packets can move at full speed without needing to stop often to
wait for other cars in a convoy to catch up (so to speak).

The last two commands involving wscale (window scaling) are
necessary to enable TCP/IP sliding windows feature. I don't recall if
Solaris 10 has it set by default or not.

How did I pick the number for the hiwat (high watermark) stuff?
Well, I ran a lottery number generator program... ;) No, there's method
to madness:

<numbers of bits per second> * <roundtrip latency> * 8 [bits/byte]

Why do we convert from bits to bytes? Because the buffer is
sized in bytes.

In this case, you were optimizing for a 100 Mbit/sec network. I
took a guess the latency would be 0.9ms on a small switched Fast
Ethernet network. Just ping the Windows box and see what the latency is.
Let's do the math:

100000000 * 0.0009 * 8

Why 0.00009? 0.9ms is just under 1 ms, or 1/1000th of a second.
1/1000 would be 0.0010, right? (You know: tenths, hundredths,
thousandths.) But since this is slightly less, that's 0.0009.

Answer: 720,000 bytes needed. I must've goofed with my earlier
calculations. It's better to round up the buffer size to the next
nearest power of 2. From experience, I know which power of 2 this is,
but if you didn't, then:

$ bc
2^16
(returns 65536; too low, keep trying)
2^24
(returns 16777216; too high, keep trying)
2^20
(returns 1048576; maybe? try one less)
2^19
(returns 524288; too low. So 2^20 is the winner)
<ctrl-d on a blank line>

Well, looks like 1048576 bytes for the buffer won. So you then
do 'ndd -set /dev/tcp/tcp_xmit_hiwat 1048576' and stuff like that.

If you oversize these buffers, the Solaris kernel will reserve
the amount of memory for these buffers and make it unavailable for reuse
by other applications. So you are decreasing amount of available memory.

On top of that, this buffer applies on a per-socket basis -- so
if you have a lot of TCP connections, it can be consumed even more.
Meaning: you want to get this sized right! It's also better to use a
little too much than too little.

The buffer required for a LAN may be different from the buffer
required for a WAN (e.g. New York City to Los Angeles TCP/IP traffic)
because the network circuits has different bandwidth and latency.

You should tune this stuff to match your normal setup, which is
probably mostly LAN-based traffic and not worry about the WAN traffic
unless you have a very large Internet network pipe (T-3, OC-3, etc).

If you had a T-1 (DS1) circuit to the Internet and mostly did
WAN traffic between NYC and LA instead of LAN traffic then you'd
calculate the requird buffer size as:

1536000 * 0.065 * 8

Why 1536000? T-1 is 1.544 Mbps but 0.008 Mbps is used for
framing so 1.536 Mbps is theoretical maximum available data bandwidth.
And why 65ms? My ISP's normal NYC-LA round trip time is 65ms over a high
speed fiber optic network. You can't really go much faster than that due
to that danged pesky speed of light limit getting in the way. Darn that
Einstein and his e=mc^2 ;)

That works out to 798720 bytes. You could redo all these
calculations with the bc utility but we already know that 1048576 is the
next nearest power of 2.

I'm a little unsure about the most optimal setting for
tcp_max_buf, but a Sun document says tcp_max_buf should be GREATER than
either tcp_xmit_hiwat or tcp_recv_hiwat. So I made tcp_max_buf to match
the combined value of tcp_xmit_hiwat and tcp_recv_hiwat.

http://developers.sun.com/solaris/articles/tuning_for_streaming.html

Sliding windows buffering provides a reasonable insurance
against worst case situations where things gets slow at times or if
there might be a 'dirty' link somewhere, and manages to keep traffic
moving smoothly without giving up reliability and delivery features.

Solaris also has an advanced feature where you can set these
buffers on a per-IP basis with ndd, but I'm not going into that here.

You're NOT done just yet! Now you have to make these changes
stick next time you boot. ndd changes are NOT permanent. So you probably
need to make a custom startup script to make these changes and have it
take effect at boot time. Easy, though:

# vi /etc/init.d/tuning
#!/sbin/sh

case "$1" in
'start')
if [ -x /usr/sbin/ndd ]; then
echo "Setting up TCP sliding windows..."
/usr/sbin/ndd -set /dev/tcp tcp_max_buf 2097152
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 1048576
/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 1048576
/usr/sbin/ndd -set /dev/tcp tcp_wscale_always 1
/usr/sbin/ndd -set /dev/tcp tcp_tstamp_if_wscale 1
fi
;;
esac

exit 0

Save and exit vi. (:wq!) Or use your favorite editor (emacs,
pico, nano, joe, sed, whatever floats your boat) instead of vi if you
like. Then do a few more steps:

# chmod a+rx /etc/init.d/tuning
# ln -s /etc/init.d/tuning /etc/rc3.d/S10tuning

Test it to make sure no syntax errors:

# /etc/init.d/tuning

(It shouldn't complain about anything.)

If you REALLY want to make sure, you could reboot and then do
the ndd -get checks. But this is UNIX, not Windows, and the thought of
rebooting normally makes most UNIX admins cringe. :-) Besides, I'm
confident this script will work just fine since I use the same thing on
my own Solaris machines.

The reason why performance suffered with the undersized buffers
originally was because either sliding windows wasn't enabled or the
buffers were set too small. Either way, TCP/IP had to pause for a moment
to wait for an acknowledgement before sending the next packet. This can
be a major performance killer.

It's the equivalent of being forced to run in an obstacle course
at full speed, hopping in and out of tightly spaced tires/tyres. You
aren't going to navigate that obstacle course as fast as you could if
you could run unfettered with large spacing between each contact with
the ground.

NOW you're done! Enjoy the bubbly. ;)

-Dan

P.S. In Solaris 10, I actually have a homemade SMF script do this stuff,
but explaining how to set up a SMF script is more involved than I want
to go into right now. It's not hard, but has a slight learning curve and
may appear a little overwhelming to someone whom is not used to Solaris
system administration yet.

Liam Greenwood

unread,
Apr 8, 2007, 9:57:21 AM4/8/07
to
On Sun, 08 Apr 2007 04:39:17 -0500, Dan Foster <use...@evilphb.org> wrote:
> P.S. In Solaris 10, I actually have a homemade SMF script do this stuff,
> but explaining how to set up a SMF script is more involved than I want
> to go into right now. It's not hard, but has a slight learning curve and
> may appear a little overwhelming to someone whom is not used to Solaris
> system administration yet.
>
Hi Dan

Is there any chance of you making the SMF script available?

Cheers, Liam

Tim Bradshaw

unread,
Apr 8, 2007, 11:40:27 AM4/8/07
to
On 2007-04-08 10:39:17 +0100, Dan Foster <use...@evilphb.org> said:

>
> Great! That's good news; thanks for letting us know.


[Elided a lot of interesting and useful stuff about TCp tuning.]

I still don't understand what the original issue was. Obviously tuning
is important when you want optimal throughput on machines, but he was
getting something like 3% of the available bandwidth on a 100Mbit LAN
(so presumably very small latency), which is awful. I'm quite sure
that untuned Solaris TCP stacks normally do much, much better than that!

--tim

Tim Bradshaw

unread,
Apr 8, 2007, 2:00:35 PM4/8/07
to
On 2007-04-08 16:40:27 +0100, Tim Bradshaw <t...@tfeb.org> said:

> I still don't understand what the original issue was. Obviously tuning
> is important when you want optimal throughput on machines, but he was
> getting something like 3% of the available bandwidth on a 100Mbit LAN
> (so presumably very small latency), which is awful. I'm quite sure
> that untuned Solaris TCP stacks normally do much, much better than that!

As a trivial data point, a 100Mb switched ethernet, I can get nearly
4MB/sec from a Blade 100, *including* ssh overhead (which I imagine is
the bottleneck!) but no disk access (basically a pipeline that is dd
if=/dev/zero bs=1k count=100000 | ssh target dd of=/dev/null, source
machine being a mac) with no tuning at all.

Message has been deleted
Message has been deleted

Dan Foster

unread,
Apr 8, 2007, 3:53:02 PM4/8/07
to
In article <slrnf1ht6...@nessie.xinqu.net>, Liam Greenwood <li...@nessie.xinqu.net> wrote:
> On Sun, 08 Apr 2007 04:39:17 -0500, Dan Foster <use...@evilphb.org> wrote:
>> P.S. In Solaris 10, I actually have a homemade SMF script do this stuff,
>> but explaining how to set up a SMF script is more involved than I want
>> to go into right now. It's not hard, but has a slight learning curve and
>> may appear a little overwhelming to someone whom is not used to Solaris
>> system administration yet.
>>
>
> Is there any chance of you making the SMF script available?

Sure. It's nothing fancy but works. You need two files: one is the
actual SMF script (which SMF calls at boot time), and the other is a XML
file defines the service in SMF and is only used by svccfg once.

Filename: my-tuning

------------------------------------------------------------------------
#!/sbin/sh

# Set various TCP network options here to change default settings.
#
# Runs once, at boot time, after network is up, but prior to completion
# of multiuser state initialization.
#
# Please update this on the jumpstart install servers, then distribute
# the updated script to target hosts running Solaris 10 (or later) at:
#
# /lib/svc/method/my-tuning
#
# They will take effect at next system boot, or you can do the ndd
# commands manually to take effect immediately if needed.
#
# There are no properties to edit for this service due to its generalized
# yet highly-specific nature that makes named properties useless.

# The name of our service (aka 'FMRI')
FMRI=svc:/network/my-tuning

# We are *required* to run this first before doing any other processing!
# This defines the various SMF_EXIT values, amongst other things.

. /lib/svc/share/smf_include.sh

# Be paranoid. Does ndd exist? If so, go to work.

if [ -x /usr/sbin/ndd ]; then
# sliding windows to 256K
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 262144
/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 262144

# Max buffer size for application controlled setsockopt = 80Mb
/usr/sbin/ndd -set /dev/tcp tcp_max_buf 83886080
fi

# We are *required* to return SOMETHING. Since we are doing this as a
# best effort service that should never fail, we can safely assume we
# will exit successfully.

exit $SMF_EXIT_OK
------------------------------------------------------------------------

Filename: my-tuning.xml

------------------------------------------------------------------------
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='MY:tuning'>

<service
name='network/my-tuning'
type='service'
version='1'>

<create_default_instance enabled='true' />

<single_instance />

<dependency name='network'
grouping='require_all'
restart_on='none'
type='service'>
<service_fmri value='svc:/network/service' />
</dependency>

<exec_method
type='method'
name='start'
exec='/lib/svc/method/my-tuning'
timeout_seconds='5'>
<method_context>
<method_credential
user='root'
group='root'
privileges='all'
/>
</method_context>
</exec_method>

<exec_method
type='method'
name='stop'
exec=':true'
timeout_seconds='5' />

<property_group name='startd'
type='framework'>
<propval name='duration'
type='astring' value='transient' />
</property_group>

<stability value='Unstable' />

<template>
<common_name>
<loctext xml:lang='C'>
ndd
</loctext>
</common_name>
<description>
<loctext xml:lang='C'>
ndd - set network driver config parameters
</loctext>
</description>
<documentation>
<manpage
title='ndd'
section='1M'
manpath='/usr/share/man' />
</documentation>
</template>
</service>

</service_bundle>
------------------------------------------------------------------------

I like to view XML files with vim (with colours and syntax highlighting
enabled) since it once saved me much bafflement after I found it was off
by a single character somewhere. xmllint wasn't much help as it told me
there was an error but not where it was. :P vim told me where.

For the two files, omit the -----'d lines. They're there only to make it
easier to visually tell where they start and where they end in the news
reader. (File attachments don't always make it through in USENET, so I
do short text files in-line.)

Then all you need to do is:

# cp my-tuning /lib/svc/method/
# chmod a+rx /lib/svc/method/my-tuning
# svccfg import /path/to/my-tuning.xml

You can verify it was imported and then brought online by doing:

# svcs my-tuning

...and it should subsequently take effect at boot time. You can then
rm my-tuning.xml if you want, or move it somewhere else in case you
later need to delete and then re-import the service.

Another key SMF command: svcadm. Has a man page.

Why did I name it 'my-tuning'? I didn't want to name it 'tuning' in case
Sun later releases a service by that name.

The chmod might not be necessary, but I like to be able to run my
scripts from the CLI as needed for debugging so I do it.

If you ever want to adjust the SMF service properties after importing,
use 'svccfg -s my-tuning'. Then 'listprop' to see settings. 'help' to
see svccfg commands. 'quit' to exit svccfg.

If you want to add more ndd tweaks or adjust existing code, just edit
/lib/svc/method/gx-tcp-settings.

Unlike the Sun-provided SMF scripts, this is safe to edit because Sun
will never overwrite well-named user-supplied scripts. I do wish Sun had
chosen a prefix (e.g. sun-) for their SMF services or script names, to
reduce chances of a potential namespace collision in the future.

This is a very simple SMF script because it was my first SMF service
done way back when Solaris 10 was a few days old at FCS (First Customer
Ship[ment]) and I was setting up our Jumpstart server to Solaris 10-ize
the various finish scripts for our first S10 jumpstart client. I later
made more 'involved' SMF services, but this one is nice and simple, and
illustrates the general aspects well. Heh... you're lucky because
there's lots of docs and sample scripts/services now and people with
more SMF experience. I had to poke around myself since docs weren't
quite ready at FCS but figured out this stuff.

Sun knew there would be quite a few init scripts that weren't SMF-ized
and that people would need time to make the transition for their own
scripts. (And that some would never bother.) So they made rc?.d scripts
still work, but you lose almost all of the benefits you'd have gotten by
using SMF. If in a hurry or when explaining Solaris init scripts to
novices, I tend to tell them the traditional way. If I had an
opportunity to talk solely about SMF, I cover the SMF way -- like here.

Something to keep in mind: SMF scripts doesn't actually have to be sh
scripts. Could be compiled binary. The key thing is that it has to hook
into the SMF header files to determine what return values to use and
other things of that nature. Since this was trivial, I used sh here.

Some additional links for more information:

http://en.wikipedia.org/wiki/Service_Management_Facility
http://www.sun.com/bigadmin/content/selfheal/smf-quickstart.html
http://www.petertribble.co.uk/Solaris/smf.html
http://www.oreillynet.com/pub/a/sysadmin/2006/04/13/using-solaris-smf.html

Enjoy; SMF's a lot of fun.

Cheers,

-Dan

Tim Bradshaw

unread,
Apr 8, 2007, 4:00:43 PM4/8/07
to
On 2007-04-08 19:53:24 +0100, lo...@the.net said:

> First guess: the 'default' numbers are "worst case" settings designed
> to maintain the connection under otherwise intolerable circumstances
> rather than optimized for speed.

I don't think that can be right, or lots of people would be
complaining! (Also, see my other post of just now, I can get 10x that
performance on a slow machine with no tuning and non-trivial
computation (ssh) in the way).

>
> Second guess: Could not it be possible that since I'm not using the
> what seems to be the normal /hme0/ network interface, but just /hme/
> (That's how it was detected during the Solaris install.) That may have
> tripped up something in the install process??

Dunno. Something is odd, anyway. Still, if it works now...

Dan Foster

unread,
Apr 8, 2007, 4:11:14 PM4/8/07
to
In article <mf7i13lurraqlskib...@4ax.com>, lo...@the.net <lo...@the.net> wrote:
>
> THANK YOU DAN!

You're welcome; my pleasure.

I see Tim's posts and have read, but unfortunately have no new insight
as my only two 1400 or 1405 are running Solaris 9 and is in full
production and have no spares to tinker with. We may have them retired
by end of this year; if so, I'd be happy to revisit at that time.

Every single Solaris box I install is done via standardized Jumpstart
network-based installations with our customizations so a well-optimized
system is done out of the box, every single time. I don't think I've
seen a default Solaris installation in about 10 years. :) So I am at a
loss to explain Tim's observations without having identical hardware as
yours to tinker with.

> The wife and kids will appreciate this too. Many nights the wife will
> be watching a movie, and the kids and I will be watching different
> time-shifted TV shows, and we won't have to worry about pixelated
> video or slide projector video frame rates or int-err-up-up-up-up-ted
> sound because of a computer that wasn't really meant to be a server in
> the first place.. The server I'm replacing is a 2.6GHz P4 homebuilt
> box. It's got a big 7200 RPM SATA drive in it, an expensive HP network
> adapter, 4 Gigs of RAM, but the thing just can't keep up with 4 users
> switching programs, fast forwarding, rewinding, and a FTP transfer
> running in the background.

:-)

> Trash windows and intel, bring out the REAL hardware and UNIX.

Fer sure! ;)

> Hey Dan!
> When is your next class on Sun and Solaris start?
> I want to enroll!

You can sign up for the 'Learning Basics of SMF' right here in c.u.s. ;)

SMF is one of the things that brings a more modern flavor to Solaris,
and is kind of like driving a Ferrari instead of a VW Beetle. Both gets
you where you're going, but one has so much more capabilities.

-Dan

P.S. Sun engineers regularly drive Ferraris. Here's proof:

http://regmedia.co.uk/2004/08/18/acer_ferrari_1.jpg

Joke aside, that's one of the most popular laptops at Sun, I heard.

Liam Greenwood

unread,
Apr 8, 2007, 5:15:46 PM4/8/07
to
On Sun, 08 Apr 2007 14:53:02 -0500, Dan Foster <use...@evilphb.org> wrote:
> In article <slrnf1ht6...@nessie.xinqu.net>, Liam Greenwood <li...@nessie.xinqu.net> wrote:
>> Is there any chance of you making the SMF script available?
>
> Sure. It's nothing fancy but works. You need two files: one is the
>
Cool, thank you!

> Unlike the Sun-provided SMF scripts, this is safe to edit because Sun
> will never overwrite well-named user-supplied scripts. I do wish Sun had
> chosen a prefix (e.g. sun-) for their SMF services or script names, to
> reduce chances of a potential namespace collision in the future.

Sun did, but kind of th opposite to the way you are loking at it. They
provide a namespace called "site" which is specifically for scripts
like yours. Sun won't put anything in as a svc://site/name, so that
you can have sensible names.
>
Cheers, Liam

Dan Foster

unread,
Apr 8, 2007, 5:21:21 PM4/8/07
to
In article <slrnf1ims...@nessie.xinqu.net>, Liam Greenwood <li...@nessie.xinqu.net> wrote:
>
> Cool, thank you!

You're welcome.

>> Unlike the Sun-provided SMF scripts, this is safe to edit because Sun
>> will never overwrite well-named user-supplied scripts. I do wish Sun had
>> chosen a prefix (e.g. sun-) for their SMF services or script names, to
>> reduce chances of a potential namespace collision in the future.
>
> Sun did, but kind of th opposite to the way you are loking at it. They
> provide a namespace called "site" which is specifically for scripts
> like yours. Sun won't put anything in as a svc://site/name, so that
> you can have sensible names.

Ahh, I see. Very nice; thank you for pointing it out. I'll have to
adjust that.

I still think you can have namespace collision even within 'site' such
as if you get SMFized apps from vendors doing common functionality. But
that can be worked around by still tacking on a prefix within 'site'.

E.g.: svc://site/my-jabberd, svc://site:/my-tuning, etc.

Cheers,

-Dan

Liam Greenwood

unread,
Apr 8, 2007, 5:21:23 PM4/8/07
to
On Sun, 08 Apr 2007 14:53:02 -0500, Dan Foster <use...@evilphb.org> wrote:
> if [ -x /usr/sbin/ndd ]; then
> # sliding windows to 256K
> /usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 262144
> /usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 262144
>
> # Max buffer size for application controlled setsockopt = 80Mb
> /usr/sbin/ndd -set /dev/tcp tcp_max_buf 83886080
> fi
>
> # We are *required* to return SOMETHING. Since we are doing this as a
> # best effort service that should never fail, we can safely assume we
> # will exit successfully.
>
> exit $SMF_EXIT_OK

In a perfect world :-) a service isn't suppposed to report that it's online
unless it's sure it is - so my reading of the docs says that we should
really do ndd gets and check the values. If the ones we set are there
then we return the exit ok, but if the defaults (or other values) are there
then we should really have the service come up in degraded mode.

In the real world I would have done it exactly as you did for something like
this :-). Nice example, thank you.

Cheers, Liam

Thomas H Jones II

unread,
Apr 8, 2007, 6:13:03 PM4/8/07
to
In article <evb2db$irn$1$8300...@news.demon.co.uk>,

Tim Bradshaw <t...@tfeb.org> wrote:
>I still don't understand what the original issue was. Obviously tuning
>is important when you want optimal throughput on machines, but he was
>getting something like 3% of the available bandwidth on a 100Mbit LAN
>(so presumably very small latency), which is awful. I'm quite sure
>that untuned Solaris TCP stacks normally do much, much better than that!

Having worked extensively in mixed Windows/Sun environments, I can give
you what my observations were. Sun systems are default tuned to work with
other Sun systems. Windows systems are default tuned to work with other
Windows systems. Sun-to-Sun communications will tend to outperform the
equivalent Windows-to-Windows communications. That said, Sun-to-Sun and
Windows-to-Windows communications will tend to *crush* the performance
numbers of Windows-to-Sun or Sun-to-Windows communications. Using tools
like `netperf` were always a good way of demonstrating this. Typically,
one had to tune the Sun systems to "play nice" with the Windows systems,
but it also tended to cost you on the "Sun-to-Sun" maximum speeds. Now,
theoretically, you could also tune the Windows systems, but, at least
prior to Windows 2003, changing the registry values for the networking
parameters didn't always result in the values actually being or staying
changed.

Overall, I would have been curious to see what kind of performance the
OP would have had on a Sun-to-Sun transfers prior to tuning. However,
it doesn't sound like he has a multi-Sun environment to test with.

-tom

--

"You can only be -so- accurate with a claw-hammer." --me

Dan Foster

unread,
Apr 8, 2007, 6:21:19 PM4/8/07
to
In article <slrnf1in6...@nessie.xinqu.net>, Liam Greenwood

<li...@nessie.xinqu.net> wrote:
>
> In a perfect world :-) a service isn't suppposed to report that it's
> online unless it's sure it is - so my reading of the docs says that we
> should really do ndd gets and check the values. If the ones we set are
> there then we return the exit ok, but if the defaults (or other
> values) are there then we should really have the service come up in
> degraded mode.

No disagreement! :) That is probably the only time I've ever told a
little white lie in scripts in the last 15 years. My scripts normally
check everything -- presence of variables, of files, of directories, of
username, uid, or group, of return codes, of input and report accordingly.

> In the real world I would have done it exactly as you did for
> something like this :-). Nice example, thank you.

:)

I don't recall all of the factors into that, but probably involved:

- Probably afraid of the then-unfamiliar-with-S10 junior admins
running into issues with clearing service state for issues if
a machine crashed and tried to come up not too well.

- Probably also didn't want my script to be the one service
that prevented a machine from finishing its boot cleanly,
especially when it was an utterly trivial bit of functionality.

- Figured that if it failed, the admin would be fixing the
machine anyhow -- it would be VERY obvious since rest of the
services weren't likely to start at all or very cleanly.

I'll probably go back and adjust the script to tell the truth since
admins seems to be more familiar with S10 (and dealing with failed SMF
services) now.

I didn't find the S10 learning curve to be that large, but it was a bit
more intimidating for the junior admins whom needed more time to
acclimatise back then.

Cheers,

-Dan

Kalyan Manchikanti

unread,
Apr 8, 2007, 7:37:33 PM4/8/07
to
On Apr 8, 3:00 pm, Tim Bradshaw <t...@tfeb.org> wrote:

> On 2007-04-08 19:53:24 +0100, l...@the.net said:
>
> > First guess: the 'default' numbers are "worst case" settings designed
> > to maintain the connection under otherwise intolerable circumstances
> > rather than optimized for speed.
>
> I don't think that can be right, or lots of people would be
> complaining! (Also, see my other post of just now, I can get 10x that
> performance on a slow machine with no tuning and non-trivial
> computation (ssh) in the way).


My thoughts exactly until I looked up the Solaris 10 tunable
parameters reference guide at http://docs.sun.com/app/docs/doc/817-0404
Of particular interest is the tcp_cwnd_max parameter the default of
which is 1048576 bytes. In the "When to change" section..it mentions
when the tcp_max_buf should be greater than the tcp_cwnd_max parameter
( which ironically is also set at 1048576 bytes). I believe this is
particulrarly true in cases of a windows to solaris transfer or vice
versa and hence the marked performance improvement in OP's ftp speeds.
after changing the tcp_max_buf size to atleast twice the cwnd_max
( congestion window). The tcp_xmit_hiwat ( send window size) and the
tcp_recv_hiwat ( recieve window size), I dont think made any
difference in this case..

A similar issue and discussion is at
http://www.sunmanagers.org/pipermail/summaries/2001-December/000510.html
. The only difference in this case being the OP 's environment was a
mainframe to solaris ftp..


Tim Bradshaw

unread,
Apr 8, 2007, 8:48:16 PM4/8/07
to
On 2007-04-08 23:13:03 +0100, fer...@xanthia.com (Thomas H Jones II) said:

> Having worked extensively in mixed Windows/Sun environments, I can give
> you what my observations were. Sun systems are default tuned to work with
> other Sun systems. Windows systems are default tuned to work with other
> Windows systems.

I agree with this in theory, and I am sure there is some practical
effect, but like (I guess) lots of people I often work with Windows
desktops and Solaris servers, and I get very acceptable performance
between them with no tuning of either stack (not just for toy
transfers, for things like shipping DVD images around). I am sure it
could be improved, but typically not by a factor of 2, let alone the
factor of 30 the OP had room for...

The places I have seen extensive room for network stack tuning have
typically been things like web servers with a load of concurrent
connections over flaky, high-latency networks.

Thomas H Jones II

unread,
Apr 8, 2007, 10:44:32 PM4/8/07
to
In article <evc2gg$488$1$8302...@news.demon.co.uk>,

Tim Bradshaw <t...@tfeb.org> wrote:
>On 2007-04-08 23:13:03 +0100, fer...@xanthia.com (Thomas H Jones II) said:
>
>> Having worked extensively in mixed Windows/Sun environments, I can give
>> you what my observations were. Sun systems are default tuned to work with
>> other Sun systems. Windows systems are default tuned to work with other
>> Windows systems.
>
>I agree with this in theory, and I am sure there is some practical
>effect, but like (I guess) lots of people I often work with Windows
>desktops and Solaris servers, and I get very acceptable performance
>between them with no tuning of either stack (not just for toy
>transfers, for things like shipping DVD images around). I am sure it
>could be improved, but typically not by a factor of 2, let alone the
>factor of 30 the OP had room for...

Yeah, should have been clearer (and qualified that the last time I had
to deal with performance of Windows to Sun). It wasn't orders of magnitudes
but it was definitely noticeable. One specific example that comes to mind
was GigE performance for one customer's application. We saw about 90% of
theoretical max on Sun-to-Sun, 75% theoretical on Windows-to-Windows and
50% theoretical on Windows/Sun (either direction).

Message has been deleted

Rick Jones

unread,
Apr 10, 2007, 1:39:01 PM4/10/07
to
Dan Foster <use...@evilphb.org> wrote:
> Easy way to do that would be use of a network performance test tool
> such as ttcp (which requires running it on both ends; ttcp also
> exists for Windows).

Strictly speaking, ttcp can be pointed at a 'discard' port on a remote
and not require a ttcp at each end. Or at least I was told as much by
the folks who convinced me to put similar functionality into netperf -
the ability to point a netperf process at [discard|chargen|echo] on a
remote system (depending on the type of netperf test).

Duplex mismatch is indicated by FCS errors being reported on the side
in full-dplex and _late_ collisions being reported by the side in
half-duplex. If there is a duplex mis-match, then for an FTP transfer
that should also show-up in netstat's TCP statistics as either
segments retransmitted on the sending side, or out-of-order segments
being received on the receiving side.

rick jones
mr netperf
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Rick Jones

unread,
Apr 10, 2007, 1:48:26 PM4/10/07
to
Dan Foster <use...@evilphb.org> wrote:
> In article <hkjg13lbemi2fnu4j...@4ax.com>, lo...@the.net <lo...@the.net> wrote:
> >>
> >># ndd -get /dev/tcp tcp_max_buf
> > This comes back as "1048576" (setup default?)
> >># ndd -get /dev/tcp tcp_xmit_hiwat
> > Comes back as "49152" (setup default?)
> >># ndd -get /dev/tcp tcp_recv_hiwat
> > Also "49152" (setup default?)

> Ouch!!! No wonder. I do all my Solaris installs with a tuned
> Jumpstart setup so I'd long since forgotten what the OS defaults were. :-)

> > _____Here is where the magic worked:______

> Great! That's good news; thanks for letting us know.

> # ndd -set /dev/tcp tcp_max_buf 2097152
> # ndd -set /dev/tcp tcp_xmit_hiwat 1048576
> # ndd -set /dev/tcp tcp_recv_hiwat 1048576
> # ndd -set /dev/tcp tcp_wscale_always 1
> # ndd -set /dev/tcp tcp_tstamp_if_wscale 1

> Going to demystify the above a bit -- I don't know what you know
> or don't know about TCP/IP tuning, so I'm going to assume you don't know
> anything -- please don't take it the wrong way (it will help others, too).

> What you were doing with these commands was to basically
> enable and then tune something called TCP sliding windows.

Perhaps you were trying to simplify, but TCP already does "sliding
windows" even without window scaling.

> It's a valuable thing to tune for your machine; getting this right
> will basically mean that packets can move at full speed without
> needing to stop often to wait for other cars in a convoy to catch up
> (so to speak).

> The last two commands involving wscale (window scaling) are
> necessary to enable TCP/IP sliding windows feature. I don't recall if
> Solaris 10 has it set by default or not.

Well, that nixes the "trying to simplify" idea.

> If you oversize these buffers, the Solaris kernel will
> reserve the amount of memory for these buffers and make it
> unavailable for reuse by other applications. So you are decreasing
> amount of available memory.

Does it? I thought that those were limits, not preallocations.

> The reason why performance suffered with the undersized
> buffers originally was because either sliding windows wasn't enabled
> or the buffers were set too small. Either way, TCP/IP had to pause
> for a moment to wait for an acknowledgement before sending the next
> packet. This can be a major performance killer.

The TCP sender is waiting for a window update, which happens
(generally) to be piggy-backed on a ACK. TCP may send no more than
the reveiver's advertised window's worth of data before it must stop
and wait for a window update. An ACK without a window update is
insufficient. That leads to the variation on the bandwidth-delay
product calculation you had:

Tput <= Window/RTT

Now, there are two other limits to what can be sent which do only
await an ACK. One is the "congestion window" which is the sending
TCP's attempt to guess how much data it can have outstanding at one
time without overloading the network. The second is the "SO_SNDBUF"
(with a little handwaving in a Streams environment) which is setting
the limit to the quantity of data the sending TCP can track while
waiting for an ACK from the remote. So the "effective window" in that
equation above is the minimum of the receiver's advertised window, the
sender's calculated congestion window (aka cwnd) and the sender's
SO_SNDBUF size.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window

0 new messages