Thanks you, really great how fast you adapt the source/make patches for this. Saw so many posts were people did not get NFS41 working with ESXi and FreeBSD and now I have it already running with your changes.
I have now compiled the kernel with all 4 patches, and it works now.
Some problems are still left:
- the "Server returned improper reason for no delegation: 2" warnings are still in the vmkernel.log.
2018-03-08T11:41:20.290Z cpu0:68011 opID=488969b0)WARNING: NFS41: NFS41ValidateDelegation:608: Server returned improper reason for no delegation: 2
- can't delete a folder with the VMware host client datastore browser:
2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient file system condition, suggest retry
2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: UserFile: 2155: hostd-worker: Directory changing too often to perform readdir operation (11 retries), returning busy
- after a reboot of the FreeBSD machine the ESXi does not restore the NFS datastore again with following warning (just disconnecting the links is fine)
2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: NFS41_Bug:2361: BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
Actually I have only made some quick benchmarks with ATTO in a Windows VM which has a vmdk on the NFS41 datastore which is mounted over two 1GB links in different subnets.
Read is nearly the double of just a single connection and write is just a bit faster. Don't know if write speed could be improved, actually the share is UFS on a HW raid controller which has local write speeds about 500MB/s.
At following link is the vmkernel.log from mouning the NFS share, attaching a vmdk from the share to a Win VM, running ATTO benchmark on it, disconnecting/reconnecting network and also the problem with the BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP after reboot.
Till the reboot I have also made a trace on one of the two links. (nfs41_trace_before_reboot.pcap and nfs41_trace_after_reboot.pcap)
https://files.fm/u/wvybmdmc
>attached the trace. If I see it correct it uses FORE_OR_BOTH.
>(bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))
Yes. The scary part is the ExchangeID before the BindConnectiontoSession.
(Normally that is only done at the beginning of a new mount to get a ClientID, followed immediately by a CreateSession. I don't know why it would do this?)
The attached patch might get BindConnectiontoSession to work. I have no way to test it beyond seeing it compile. Hopefully it will apply cleanly.
>The trace is only with the first patch, have not compiled the wantdeleg patches so >far.
That's fine. I don't think that matters much.
>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the ESXi >cannot connect to the NFS also with this warning:
>2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361:
>>BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
If the attached patch works, you'll find out what it fixes.
>Another thing I noticed today is that it is not possible to delete a folder with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, but with >NFS3 it works.
>
>Here the vmkernel.log with only one connection contains mounting, trying to >delete a folder and disconnect:
>
>2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC
>opID >c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z
>cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server:
>10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: ,
>options: <none> 2018-03-07T16:46:04.543Z cpu12:68008
>opID=55bea165)StorageApdHandler: >977: APD Handle Created with
>lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z
>cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming
>state, cluster 0x43046e4c7ee0 >[7] 2018-03-07T16:46:04.545Z cpu12:68008
>opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41FSCompleteMount:3792: Max read xfer size: 0x20000
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41FSCompleteMount:3793: Max write xfer size: 0x20000
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41FSCompleteMount:3794: Max file size: 0x800000000000
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z
>cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800:
>The max file name size (255) of file system is >larger than that of FSS
>(128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225
>mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000
>("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41:
>>NFS41_VSIMountSet:435: nfsds1 mounted successfully
>2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC
>opID >c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z
>cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728:
>Failed to process READDIR result for fh 0x43046e4c6
I have no idea if getting BindConnectiontoSession working will fix this or not?