ESXi NFSv4.1 client id is nasty

3 views
Skip to first unread message

Rick Macklem

unread,
Jun 17, 2018, 8:39:17 AM6/17/18
to
Hi,

Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi 6.5u1
(VMware) against the FreeBSD server. I have given him a bunch of hackish patches
to try and some of them do help. However not all issues are resolved.
The problem is that these hacks pretty obviously violate the NFSv4.1 RFC (5661).
(Details on these come later, for those interested in such things.)

I can think of three ways to deal with this:
1 - Just leave the server as is and point people to the issues that should be addressed
in the ESXi client.
2 - Put the hacks in, but only enable them based on a sysctl not enabled by default.
(The main problem with this is when the server also has non-ESXi mounts.)
3 - Enable the hacks for ESXi client mounts only, using the implementation ID
it presents at mount time in its ExchangeID arguments.
- This is my preferred solution, but the RFC says:
An example use for implementation identifiers would be diagnostic
software that extracts this information in an attempt to identify
interoperability problems, performance workload behaviors, or general
usage statistics. Since the intent of having access to this
information is for planning or general diagnosis only, the client and
server MUST NOT interpret this implementation identity information in
a way that affects interoperational behavior of the implementation.
The reason is that if clients and servers did such a thing, they
might use fewer capabilities of the protocol than the peer can
support, or the client and server might refuse to interoperate.

Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since the
hacks violate the RFC, then why not enable them in a way that violates the RFC.

Anyhow, I would like to hear from others w.r.t. how they think this should be handled?

Here's details on the breakage and workarounds for those interested, from looking
at packet traces in wireshark:
Fairly benign ones:
- The client does a ReclaimComplete with one_fs == false and then does a
ReclaimComplete with one_fs == true. The server returns
NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client
doesn't like.
Woraround: Don't return an error for the one_fs == true case and just assume
that same as "one_fs == false".
There is also a case where the client only does the
ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of
file systems, this doesn't indicate to the server that all reclaims are done.
(Other extant clients never do the "one_fs == true" variant of
ReclaimComplete.)
This case of just doing the "one_fs == true" variant is actually a limitation
of the server which I don't know how to fix. However the same workaround
as listed about gets around it.

- The client puts random garbage in the delegate_type argument for
Open/ClaimPrevious.
Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it doesn't
want a delegation, so assume OPEN_DELEGATE_NONE or OPEN_DELEGATE_NONE_EXT
instead of garbage. (Not sure which of the two values makes it happier.)

Serious ones:
- The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH
and OPEN_SHARE_DENY_BOTH.
Since OpenDowngrade is supposed to decrease share_access and share_deny,
the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever
conflict with another Open. (A conflict happens when another Open has
set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.)
with NFS4ERR_SHARE_DENIED.
I believe this one is done by the client for something it calls a
"device lock" and really doesn't like this failing.
Workaround: All I can think of is ignore the check for new bits not being set
and reply NFS_OK, when no conflicting Open exists.
When there is a conflicting Open, returning NFS4ERR_INVAL seems to be the
only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade.

- When a server reboots, client does not serialize ExchangeID/CreateSession.
When the server reboots, a client needs to do a serialized set of RPCs
with ExchangeID followed by CreateSession to confirm it. The reply to
ExchangeID has a sequence number (csr_sequence) in it and the
CreateSession needs to have the same value in its csa_sequence argument
to confirm the clientid issued by the ExchangeID.
The client sends many ExchangeIDs and CreateSessions, so they end up failing
many times due to the sequence number not matching the last ExchangeID.
(This might only happen in the trunked case.)
Workaround: Nothing that I can think of.

- ExchangeID sometimes sends eia_clientowner.co_verifier argument as all zeros.
Sometimes the client bogusly fills in the eia_clientowner.co_verifier
argument to ExchangeID with all 0s instead of the correct value.
This indicates to the server that the client has rebooted (it has not)
and results in the server discarding any state for the client and
re-initializing the clientid.
Workaround: The server can ignore the verifier changing and make the recovery
work better. This clearly violates RFC5661 and can only be done for
ESXi clients, since ignoring this breaks a Linux client hard reboot.

- The client doesn't seem to handle NFS4ERR_GRACE errors correctly.
These occur when any non-reclaim operations are done during the grace
period after a server boot.
(A client needs to delay a while and then retry the operation, repeating
for as long as NFS4ERR_GRACE is received from the server. This client
does not do this.)
Workaround: Nothing that I can think of.

Thanks in advance for any comments, rick
_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Benjamin Kaduk

unread,
Jun 17, 2018, 9:52:24 PM6/17/18
to
None of these are great options, but adding in behavior
dependency on the implementation ID feels really bad for the
ecosystem, and I would be unhappy if it was enabled by default.
Is it feasible to do one sysctl per workaround and have the sysctl
set the implementation ID(s) to which to apply?

-Ben

P.S. I feel like the nfsv4 WG list should probably hear about this
sort of issue, in addition to here.

Steve Wills

unread,
Jun 18, 2018, 5:25:13 PM6/18/18
to
Would it be possible or reasonable to use the client ID to log a message
telling the admin to enable a sysctl to enable the hacks?

Steve

Warner Losh

unread,
Jun 18, 2018, 5:35:45 PM6/18/18
to
My thoughts on this are mixed.

You need certain workarounds, but they sound like they need to be on a
per-client-type basis.
On the one hand, you don't want to chat with different clients differently,
but on the other you want it to work.

I'd suggest a two-tiered approach.

First, have a sysctl per workaround that's a list of client types to apply
the workaround to. Have these default to ESX client, but allow for others.

Second, have a master sysctl to turn on/off per-client workarounds. Have
this default to off.

And finally, see if you can get ESXi to fix their flaws. This is by far the
best solution. The above should really only be a stop-gap, but would be
extensible should this sort of thing become more of the norm than is
desired.

Warner

Rick Macklem

unread,
Jun 18, 2018, 5:46:31 PM6/18/18
to
Steve Wills wrote:
>Would it be possible or reasonable to use the client ID to log a message
>telling the admin to enable a sysctl to enable the hacks?
Yes. However, this client implementation id is only seen by the server
when the client makes a mount attempt.

I suppose it could log the message and fail the mount, if the "hack" sysctl isn't
set?

rick
[stuff snipped]

________________________________________
From: Steve Wills <swi...@FreeBSD.org>
Sent: Monday, June 18, 2018 5:21:10 PM
To: Rick Macklem; freebsd...@freebsd.org
Cc: andrea...@frequentis.com
Subject: Re: ESXi NFSv4.1 client id is nasty

Steve Wills

unread,
Jun 18, 2018, 5:59:19 PM6/18/18
to
Hi,

On 06/18/18 17:42, Rick Macklem wrote:
> Steve Wills wrote:
>> Would it be possible or reasonable to use the client ID to log a message
>> telling the admin to enable a sysctl to enable the hacks?
> Yes. However, this client implementation id is only seen by the server
> when the client makes a mount attempt.
>
> I suppose it could log the message and fail the mount, if the "hack" sysctl isn't
> set?

I hadn't thought of failing the mount, just defaulting not enabling the
hacks unless the admin chooses to enable them. But at the same time
being proactive about telling the admin to enable them.

I.E. keep the implementation RFC compliant since we wouldn't be changing
the behavior based on the implementation ID, only based upon the admin
setting the sysctl, which we told them to do based on the implementation ID.

Just an idea, maybe Warner's suggestion is a better one.

Steve

Rick Macklem

unread,
Jun 19, 2018, 7:15:18 AM6/19/18
to
Steve Wills wrote:
On 06/18/18 17:42, Rick Macklem wrote:
>> Steve Wills wrote:
>>> Would it be possible or reasonable to use the client ID to log a message
>>> telling the admin to enable a sysctl to enable the hacks?
>> Yes. However, this client implementation id is only seen by the server
>> when the client makes a mount attempt.
>>
>> I suppose it could log the message and fail the mount, if the "hack" sysctl isn't
>> set?
>
>I hadn't thought of failing the mount, just defaulting not enabling the
>hacks unless the admin chooses to enable them. But at the same time
>being proactive about telling the admin to enable them.
>
>I.E. keep the implementation RFC compliant since we wouldn't be changing
>the behavior based on the implementation ID, only based upon the admin
>setting the sysctl, which we told them to do based on the implementation ID.
Well, without one of the hacks (as head currently is) the mounts always fail,
so ESXi mounts failing is a feature of the "unhacked" server.
(The ReclaimComplete failure fails the mount.)

>Just an idea, maybe Warner's suggestion is a better one.
Yes, I think Warner has the right idea, although logging a message w.r.t. the
ReclaimComplete failure (which fails these mounts) when the hacks are turned
off sounds like a good one to me.

>Steve

rick

Warner Losh

unread,
Jun 19, 2018, 10:20:41 AM6/19/18
to
I think so too, rate limited, with an invitation to turn on the hack :)

Warner
Reply all
Reply to author
Forward
0 new messages