OneFS 8.0.0.4 upgrade: nfs3 very unstable

1,506 views
Skip to first unread message

Jean-Baptiste Denis

unread,
Jun 6, 2017, 12:58:32 PM6/6/17
to isilon-u...@googlegroups.com
Hello everybody,

we have upgraded our Isilon cluster (32 nodes) from 7.1.1.5 to 8.0.0.4 with our DSE and the help of
DellEMC support team a few weeks ago.

Since the upgrade the nfs3 service is extremely unstable (nfs production is down for the most part).
From the linux nfs client side, dmesg shows a lot of "nfs server not responding, still trying" and
"OK" a few minutes after.

We did a *lot* of pcap (client and server side) requested by the support since two weeks (+ the pcap
we did on our own trying to figure it out) and uploaded hundreds (!) gigabytes of logs. The support
team is completely clueless at the moment.

No complaints on the cifs side.

We spotted that the flexnet config is changing continuously on every node :

2017-06-06T18:40:04+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:05+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750207 to local
(old rev:1750206)
2017-06-06T18:40:10+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:10+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750208 to local
(old rev:1750207)
2017-06-06T18:40:15+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:15+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750209 to local
(old rev:1750208)
2017-06-06T18:40:20+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:20+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750210 to local
(old rev:1750209)
2017-06-06T18:40:25+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:26+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750211 to local
(old rev:1750210)
2017-06-06T18:40:31+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
2017-06-06T18:40:31+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750212 to local
(old rev:1750211)

I've tried to compare the different xml network related file I've found to spot the difference, but
I didn't found anything (only the "revision" attribute).

There is also another bunch of /var/log files updated continuously:

=======================

# tail /var/log/hardening_engine.log
2017-06-06 18:47:49,775 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:47:55,308 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:01,129 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:05,984 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:11,192 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:16,368 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:21,983 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:27,810 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:32,093 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98
2017-06-06 18:48:37,969 - INFO - Hardening not enabled: Not reconfiguring Network Interface
Hardening items. - hardening_net_reconfig.py - 98

# tail /var/log/isi_smartconnect
2017-06-06T18:50:15+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 4 more
seconds: flx_changed=0, grp_changed=0, sem_agg_signaled=0
2017-06-06T18:50:20+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: On-event rebalance: flx_changed=0,
grp_changed=0, willmove=0 coalesced_rebalance: 18, node_unsuspended: 0, sem_agg_signaled: 0
2017-06-06T18:50:20+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Loaded flx_config correctly
2017-06-06T18:50:20+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 9 more
seconds: flx_changed=1, grp_changed=0, sem_agg_signaled=0
2017-06-06T18:50:20+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 9 more
seconds: flx_changed=0, grp_changed=0, sem_agg_signaled=0
2017-06-06T18:50:25+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 4 more
seconds: flx_changed=0, grp_changed=0, sem_agg_signaled=0
2017-06-06T18:50:25+02:00 <3.6> ATLAS-24 last message repeated 7 times
2017-06-06T18:50:25+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Loaded flx_config correctly
2017-06-06T18:50:25+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 4 more
seconds: flx_changed=1, grp_changed=0, sem_agg_signaled=0
2017-06-06T18:50:25+02:00 <3.6> ATLAS-24 isi_smartconnect[75689]: Rebalance delay for 4 more
seconds: flx_changed=0, grp_changed=0, sem_agg_signaled=0

# tail /var/log/isi_cbind_d.log
2017-06-06T18:50:27+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Received HUP signal: 1
2017-06-06T18:50:27+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Loading configuration
2017-06-06T18:50:32+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Received HUP signal: 1
2017-06-06T18:50:32+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Loading configuration
2017-06-06T18:50:38+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Received HUP signal: 1
2017-06-06T18:50:38+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Loading configuration
2017-06-06T18:50:43+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Received HUP signal: 1
2017-06-06T18:50:43+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Loading configuration
2017-06-06T18:50:49+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Received HUP signal: 1
2017-06-06T18:50:49+02:00 <3.6> ATLAS-24 isi_cbind_d[4795]: [0x800704400]bind: Loading configuration

# tail /var/log/isi_mcp
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Executing 'hardening-netlisten-reconfig'
actions for FILEGROUP /etc/mcp/sys/files/hardening-netlisten-reconfig.
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3694]: Started execution of action
'/usr/bin/isi_hardening/hardening_net_reconfig.py' (id=790301, pid=52863)
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[52863]: Executing
'/usr/bin/isi_hardening/hardening_net_reconfig.py' command.
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Action list 'hardening-netlisten-reconfig':
action 1/1 FILEGROUP /etc/mcp/sys/files/hardening-netlisten-reconfig completed (id=790301)
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Action list 'hardening-netlisten-reconfig'
has completed. Releasing shared lock 0x800713280 (id=790301)
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Executing 'ntpd-reconfig' actions for
FILEGROUP /etc/mcp/sys/files/ntpd-reconfig.
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[3694]: Started execution of action
'/etc/mcp/scripts/ntpd.py /etc/mcp/templates/ntp.conf /etc/ntp.conf' (id=790302, pid=52865)
2017-06-06T18:51:20+02:00 <3.6> ATLAS-24 isi_mcp[52865]: Executing '/etc/mcp/scripts/ntpd.py
/etc/mcp/templates/ntp.conf /etc/ntp.conf' command.
2017-06-06T18:51:21+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Action list 'ntpd-reconfig': action 1/1
FILEGROUP /etc/mcp/sys/files/ntpd-reconfig completed (id=790302)
2017-06-06T18:51:21+02:00 <3.6> ATLAS-24 isi_mcp[3875]: Action list 'ntpd-reconfig' has completed.
Releasing shared lock 0x800713280 (id=790302)

=======================

The support didn't gave us an explanation yet.

Any ideas ?

Thank you !

Jean-Baptiste

Steve Bogdanski

unread,
Jun 7, 2017, 10:07:36 PM6/7/17
to Isilon Technical User Group, jbd...@pasteur.fr
What are your settings for your SC pools used for NFS? i.e. Static/Dynamic, Round-Robin Load Balancing or other, Manual/Automatic failback?

Jean-Baptiste Denis

unread,
Jun 8, 2017, 1:58:44 AM6/8/17
to Steve Bogdanski, Isilon Technical User Group
On 06/08/2017 04:07 AM, Steve Bogdanski wrote:
> What are your settings for your SC pools used for NFS? i.e. Static/Dynamic, Round-Robin Load
> Balancing or other, Manual/Automatic failback?

Dynamic smartconnect pool, round-robin and automatic failback.

Yesterday, the (US) technical support found something in the nfs debug logs :

"[nfs] NSM Notify call from client xxx.xxx.xxx.pasteur.fr failed with status
0xc00000cc(STATUS_BAD_NETWORK_NAME)"

Apparently, there is an issue "whereby NSM hostnames that exceed a certain length (i.e.
domain.subdomain.deeper.etc) can longer be interpreted. This issue manifests itself in a manner
where clients can mount but see periodic disruption when attempting to lock/access a file they could
previously access (preupgrade)."

This is known as internal DellEMC bug 199068

It will be fixed in 8.0.0.6. There is a workaround :

# isi_gconfig registry.Services.lwio.Parameters.Drivers.nfs.nsm.AllowNonstandardHostnames=1
# isi_for_array -s /usr/likewise/bin/lwsm restart onefs_nfs

We implemented this 16 hours ago, let's see what happens.

The flexnet log (and the others) are still rotating quickly but for the first time in weeks we have
been told that "it is not normal". We are investigation this issue in parallel.

I'll keep you posted !

Jean-Baptiste






Jean-Baptiste Denis

unread,
Jun 18, 2017, 9:21:21 AM6/18/17
to isilon-u...@googlegroups.com
> ...
> We spotted that the flexnet config is changing continuously on every node :
>
> 2017-06-06T18:40:04+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Master FlexNet config changed
> 2017-06-06T18:40:05+02:00 <3.6> ATLAS-26 isi_flexnet_d[35417]: Pushing revision 1750207 to local
> (old rev:1750206)
> ...

This is just one part of the story, but the always changing flexnet configuration has been
identified has a bug due to some dynamic pool having less IPs than associated nodes. If I understood
correctly, one of the consequence is that smartconnect is triggering a flexnet configuration change.

We have one of those for accessing the Isilon API using cookie authentification (the requests have
to go to the same node that did the initial authentication). It is a valid configuration and
shouldn't have led to our problems.

We temporarily removed nodes from those pools : no more log spamming about configuration changes and
no more "nfs server not responding, still trying" since then. Connectivity is more
or less OK now (depending on the client workflows).

On 06/08/2017 04:07 AM, Steve Bogdanski wrote:
> What are your settings for your SC pools used for NFS? i.e. Static/Dynamic, Round-Robin Load
> Balancing or other, Manual/Automatic failback?

Steve, is that what you had in mind ?

We still have some issues on nodes with only gateway-less subnets and a nasty one regarding nfs
locks. This last issue impact is quite heavy for some applications. It still under investigation by
EMC.

I'll keep you posted.

Jean-Baptiste

Jean-Baptiste Denis

unread,
Jul 4, 2017, 6:17:07 AM7/4/17
to isilon-u...@googlegroups.com
> We still have some issues on nodes with only gateway-less subnets and a nasty one regarding nfs
> locks. This last issue impact is quite heavy for some applications. It still under investigation by
> EMC.

The flexnet reloading issue was not the root cause of our "NFS server not responding" deluge. The
problem happened again.

Further investigation led to some of our nodes not having a gateway and not able to access our dns
servers. We are in this configuration since day 1 (4.5 years ago).

From what I understand, all nodes in an Isilon cluster refresh the nfs export everytime in a while.
When an export has an fqdn in one of its clients list, a name resolution is happening. The kernel
implementation of NFS didn't seem to care is the resolution didn't happen. The userland
implementation seems to need it to happen smoothly otherwise nfs threads become busy waiting for
resolution, hence not available to handle nfs traffic.

Local name resolution is managed locally on each node (see /etc/resolv.conf) by the isi_cbind_d
resolver before trying the ones defined in the groupnet configuration. While investigating network
trace, we saw that isi_cbind_d ask some infiniband node peers for an answser (we guessed it, because
it was nothing wireshark would recognize). It was confirmed by EMC support. If the request was sent
to a node without a gateway, isi_cbind_d never received an answer.

It has been identified as a bug and will be corrected. In the meantime, we changed the configuration
of our nodes so they can reach our DNS.

Again, I'm not 100% certain of the exact behaviour of all that, but I think the big picture is correct.

"""
It's not DNS
There's no way it's DNS
It was DNS
"""

We still have other NFS issues ongoing regarding network link aggregation. A suggested workaround
from the support is to implement what we call "cron of shame" running every 4 minutes (flock
/mnt/nlm/lckfile -c "sleep 10") to keep the connection open.

Jean-Baptiste


Josh

unread,
Jul 5, 2017, 12:21:58 PM7/5/17
to Isilon Technical User Group, jbd...@pasteur.fr
When you say some of your nodes do not have a gateway, do you mean that they are "archival" nodes and have no 10/1gig connectivity?  I'm a little confused as to your network configuration there.

Jean-Baptiste Denis

unread,
Jul 5, 2017, 12:40:59 PM7/5/17
to isilon-u...@googlegroups.com
On 07/05/2017 06:21 PM, Josh wrote:
> When you say some of your nodes do not have a gateway, do you mean that they are "archival" nodes
> and have no 10/1gig connectivity? I'm a little confused as to your network configuration there.

We have 3 types of nodes :

1. archival node (IB only)
2. nodes with only one subnet (with MTU 9000) without a gateway
3. nodes with a subnet with a gateway

The problem was on type 2 node.

Jean-Baptiste

Jean-Baptiste Denis

unread,
Jul 5, 2017, 12:51:01 PM7/5/17
to isilon-u...@googlegroups.com
> We have 3 types of nodes :
>
> 1. archival node (IB only)
> 2. nodes with only one subnet (with MTU 9000) without a gateway
> 3. nodes with a subnet with a gateway
>
> The problem was on type 2 node.

The point is that you should be able to run this command on your cluster whatever the type of node
you have.

# isi_for_array "dig +short +noedns @127.42.0.1 www.google.fr A"

When our problem occured, it was not the case.

Jean-Baptiste

Alistair Stewart

unread,
Jul 5, 2017, 1:39:58 PM7/5/17
to isilon-u...@googlegroups.com
Have you considered installing Patch-191603 which addresses multiple issues with the NFS and SmartConnect services?

It's always worth checking the document Current Isilon OneFS Patches to see what's available.

You might also want to consider setting this sysctl which reverts change-notify to pre-OneFS 8 behaviour and reduces the chances of experiencing a cluster deadlock.

isi_sysctl_cluster efs.bam.rename_event_coherency=0

Finally, you could also consider upgrading to OneFS 8.0.0.5 which has all the fixes in patch-191603 and more and doesn't currently require any patches which means that you could leave the upgrade uncommitted and roll-back if you suffer any new issues.

Al...




Jean-Baptiste

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Regards

Al…

Josh

unread,
Jul 6, 2017, 2:46:27 PM7/6/17
to Isilon Technical User Group, jbd...@pasteur.fr
I'm curious what your workflow is that requires you to have a jumbo frame subnet with no routing capability.

Chris Pepper

unread,
Jul 6, 2017, 3:01:34 PM7/6/17
to isilon-u...@googlegroups.com, jbd...@pasteur.fr
We did for years for an HPC network. Login nodes were dual-homed with the campus network (routed but non-jumbo). We have since connected several single-subnet cluster networks so we have static routes on the HPC network, but we still have one cluster with no routing.

Chris
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.

Jean-Baptiste Denis

unread,
Jul 6, 2017, 4:03:22 PM7/6/17
to isilon-u...@googlegroups.com
On 07/06/2017 08:46 PM, Josh wrote:
> I'm curious what your workflow is that requires you to have a jumbo frame subnet with no routing
> capability.

There is a 350 nodes HPC cluster on this subnet. No external access required. Why jumbo frame ? We
don't have any benchmark to justify it, but it looks like a best practice. It is mentionned in a
whitepaper and also in the documentation :

"Although OneFS supports both 1500 MTU and 9000 MTU, using a larger frame size for network traffic
permits more efficient communication on the external network between clients and cluster nodes. For
example, if a subnet is connected through a 10 GbE interface and NIC aggregation is configured for
IP address pools in the subnet, it is recommended you set the MTU to 9000. To benefit from using
jumbo frames, all devices in the network path must be configured to use jumbo frames."

The "EMC ISILON GUIDELINES FOR LARGE WORKLOADS" gives some number :

10GbE: 3 Gb/sec with MTU 1500
10GbE aggregate: 6 Gb/sec with MTU 9000
10GbE aggregate: 6 Gb/sec

And we didn't want any friction point with the support regarding performance problem so we did
everything by the book at the time we setup everything (except the gateway you could say, but we
didn't have any red warning sincce 4.5 years...).

Jean-Baptiste

Jean-Baptiste Denis

unread,
Jul 6, 2017, 4:15:01 PM7/6/17
to isilon-u...@googlegroups.com
On 07/05/2017 07:39 PM, Alistair Stewart wrote:
> Have you considered installing Patch-191603
> <https://download.emc.com/downloads/DL83387_Isilon_OneFS_Patch-191603.tgz?source=EMAIL> which
> addresses multiple issues with the NFS and SmartConnect services?
>
> It's always worth checking the document Current Isilon OneFS Patches
> <https://support.emc.com/docu50781> to see what's available.

Our DSE tried a *lot* of things the first week, including installing different patches, setting
sysctl without luck. I'm not the one who managed the different webex sessions and patching parties,
so I can't be sure of what has been tested or not. But I'm sure nothing worked =)

> You might also want to consider setting this sysctl which reverts change-notify to pre-OneFS 8
> behaviour and reduces the chances of experiencing a cluster deadlock.
>
> isi_sysctl_cluster efs.bam.rename_event_coherency=0

What does it do exactly ?

> Finally, you could also consider upgrading to OneFS 8.0.0.5 which has all the fixes in patch-191603
> and more and doesn't currently require any patches which means that you could leave the upgrade
> uncommitted and roll-back if you suffer any new issues.

The engineering team discovered multiple bugs on our 8.0.0.4 cluster affecting all OneFS version and
suggested different workarounds. I guess there will be some patches incoming...

Thank you for your input.

Jean-Baptiste

Dan Pritts

unread,
Jul 7, 2017, 5:30:01 PM7/7/17
to isilon-u...@googlegroups.com
In general, 9000 MTU means 6x less processing and interrupts.  Interrupt coalescing and LRO (large receive offload) and TSO (tcp segmentation offload) on the ethernet cards helps get around these issues.  But LRO and TSO are notoriously buggy, and it is very common for vendors to tell you to disable it (e.g., I just had a vmware ESXi crashing bug that was fixed by disabling TSO). 

Worse, over WANs, the number of TCP packets you can have in flight is limited by the product of bandwidth and latency ("BDP", bandwidth delay product).   Therefore, over a long distance (high latency) connection, you can get 6x more throughput with 9k MTU on a single TCP stream.   See http://fasterdata.es.net/ for much more on the topic.

Bottom line, for high performance applications, LAN *or* WAN, you want jumbo frames.  It is of course difficult or impossible to get jumbo frames on the commercial internet, but big research & education networks all support them. 

Personally, I don't have a high performance shop, so we don't run jumbos.  Keep it Simple, Stupid.  
July 6, 2017 at 4:03 PM

There is a 350 nodes HPC cluster on this subnet. No external access required. Why jumbo frame ? We
don't have any benchmark to justify it, but it looks like a best practice. It is mentionned in a
whitepaper and also in the documentation :

"Although OneFS supports both 1500 MTU and 9000 MTU, using a larger frame size for network traffic
permits more efficient communication on the external network between clients and cluster nodes. For
example, if a subnet is connected through a 10 GbE interface and NIC aggregation is configured for
IP address pools in the subnet, it is recommended you set the MTU to 9000. To benefit from using
jumbo frames, all devices in the network path must be configured to use jumbo frames."

The "EMC ISILON GUIDELINES FOR LARGE WORKLOADS" gives some number :

10GbE: 3 Gb/sec with MTU 1500
10GbE aggregate: 6 Gb/sec with MTU 9000
10GbE aggregate: 6 Gb/sec

And we didn't want any friction point with the support regarding performance problem so we did
everything by the book at the time we setup everything (except the gateway you could say, but we
didn't have any red warning sincce 4.5 years...).

Jean-Baptiste

July 6, 2017 at 2:46 PM
I'm curious what your workflow is that requires you to have a jumbo frame subnet with no routing capability.

On Wednesday, July 5, 2017 at 9:40:59 AM UTC-7, Jean-Baptiste Denis wrote:
--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan

Dan Pritts

unread,
Jul 7, 2017, 5:30:32 PM7/7/17
to isilon-u...@googlegroups.com
In general, 9000 MTU means 6x less processing and interrupts.  Interrupt coalescing and LRO (large receive offload) and TSO (tcp segmentation offload) on the ethernet cards helps get around these issues.  But LRO and TSO are notoriously buggy, and it is very common for vendors to tell you to disable it (e.g., I just had a vmware ESXi crashing bug that was fixed by disabling TSO). 

Worse, over WANs, the number of TCP packets you can have in flight is limited by the product of bandwidth and latency ("BDP", bandwidth delay product).   Therefore, over a long distance (high latency) connection, you can get 6x more throughput with 9k MTU on a single TCP stream.   See http://fasterdata.es.net/ for much more on the topic.

Bottom line, for high performance applications, LAN *or* WAN, you want jumbo frames.  It is of course difficult or impossible to get jumbo frames on the commercial internet, but big research & education networks all support them. 

Personally, I don't have a high performance shop, so we don't run jumbos.  Keep it Simple, Stupid.  
July 6, 2017 at 4:03 PM
There is a 350 nodes HPC cluster on this subnet. No external access required. Why jumbo frame ? We
don't have any benchmark to justify it, but it looks like a best practice. It is mentionned in a
whitepaper and also in the documentation :

"Although OneFS supports both 1500 MTU and 9000 MTU, using a larger frame size for network traffic
permits more efficient communication on the external network between clients and cluster nodes. For
example, if a subnet is connected through a 10 GbE interface and NIC aggregation is configured for
IP address pools in the subnet, it is recommended you set the MTU to 9000. To benefit from using
jumbo frames, all devices in the network path must be configured to use jumbo frames."

The "EMC ISILON GUIDELINES FOR LARGE WORKLOADS" gives some number :

10GbE: 3 Gb/sec with MTU 1500
10GbE aggregate: 6 Gb/sec with MTU 9000
10GbE aggregate: 6 Gb/sec

And we didn't want any friction point with the support regarding performance problem so we did
everything by the book at the time we setup everything (except the gateway you could say, but we
didn't have any red warning sincce 4.5 years...).

Jean-Baptiste

July 6, 2017 at 2:46 PM
I'm curious what your workflow is that requires you to have a jumbo frame subnet with no routing capability.

On Wednesday, July 5, 2017 at 9:40:59 AM UTC-7, Jean-Baptiste Denis wrote:
--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages