Samba3 (FHGFS patched) Still Seems Very Slow

David .

unread,

Dec 5, 2013, 10:15:41 PM12/5/13

to fhgfs...@googlegroups.com

G'day Guys,

Our environment is: FHGFS 2012.10.r9-el6, on Centos6.5_64.

I've set up clients with patched samba3.6.21. The following being the patch in source3/smbd/reply.c

if(fsp_stat(fsp) == -1) {

reply_nterror(req, map_nt_error_from_unix(errno));

goto strict_unlock;

}

if (!S_ISREG(fsp->fsp_name->st.st_ex_mode) ||

(startpos > fsp->fsp_name->st.st_ex_size) ||

(smb_maxcnt > (fsp->fsp_name->st.st_ex_size - startpos))) {

/*

* We already know that we would do a short read, so don't

* try the sendfile() path.

*/

goto nosendfile_read;

}

/*

* Set up the packet header before send. We

* assume here the sendfile will work (get the

The patch version works faster than the default Centos6 version on file reads and writes. However, when I try to "Get Info" on a Mac, or "properties" in W7 on one of my large folders, the result comes back in 17 minutes.

When I do the same on our standard ext4/samba file server on a copy of the same folder, it finished in 40 seconds.

When I "du -sh Folder" on the command line of the same fhgfs client it takes 20 seconds.

Any ideas?

Cheers.

Pete Sero

unread,

Dec 6, 2013, 3:21:00 AM12/6/13

to fhgfs...@googlegroups.com

You would need to analyse the SMB traffic to see

wether your Mac (or W7) client

- does a different sequence of operations

when talking to the shared FhGFS and to the regular server

- or does the same sequence, but "slower"

In the first case, compare the SMB server configurations:

some difference must exist that explains the

different use of the SMB protocol (request sizes,

overlapping requests, etc)

(Note that OS X 10.9 is much smarter in using SMB than

10.8 and below.)

In the second case, there might be too many roundtrips

SMB client <-> SMB server/FhGFS client <-> FhGFS servers

Say a full roundtrip would take 10ms,

and if one roundtrip is taken per file(!),

then we have 17 minutes / 10ms = 102 thousand files.

Does this (roughly) match the actual number of files in the test?

Cheers

-- Peter

--
You received this message because you are subscribed to the Google Groups "fhgfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Pete Sero

unread,

Dec 8, 2013, 11:19:20 PM12/8/13

to David ., fhgfs...@googlegroups.com

David

You would need to analyse the SMB traffic to see

wether your Mac (or W7) client

- does a different sequence of operations

when talking to the shared FhGFS and to the regular server

- or does the same sequence, but "slower"

In the first case, compare the SMB server configurations:

some difference must exist that explains the

different use of the SMB protocol (request sizes,

overlapping requests, etc)

(Note that OS X 10.9 is much smarter in using SMB than

10.8 and below.)

In the second case, there might be too many roundtrips

SMB client <-> SMB server/FhGFS client <-> FhGFS servers

Say a full roundtrip would take 10ms,

and if one roundtrip is taken per file(!),

then we have 17 minutes / 10ms = 102 thousand files.

Does this (roughly) match the actual number of files in the test?

Cheers

-- Peter

On 2013 Dec 6. md, at 11:15 st, David . wrote:

Bernd Schubert

unread,

Dec 9, 2013, 4:44:40 AM12/9/13

to fhgfs...@googlegroups.com, David .

David, Peter,

another choice is to use

fhgfs-ctl --clientstats --nodetype=meta
fhgfs-ctl --clientstats --nodetype=storage

to see which request are being send to the servers.

Best regards,
Bernd

David Minard

unread,

Dec 11, 2013, 6:20:47 AM12/11/13

to fhgfs...@googlegroups.com

G'day Peter and Bernd,

I ran the tests Bernd suggested, and have attached the results. They are for a "Get info" on an OSX10.6 box, via samba3 on an FHGFS client.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

meta_stats

storage_stats

Pete Sero

unread,

Dec 11, 2013, 7:45:00 AM12/11/13

to fhgfs...@googlegroups.com

So this is what takes 17 minutes on the Mac?
Most of the show is over after just 10s, some aftermath takes
another 60s, so WTH is going on at the SMB level?

Cheers

Peter

> If that doesn't show anything up, I'll tackle the SMB traffic analysis.
>
> Cheers,

>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>

> David Minard.
> Ph: 0247 360 155
> Fax: 0247 360 770
>
> School of Computing, Engineering, and Mathematics
> Building Y - Penrith Campus (Kingswood)
> Locked bag 1797
> Penrith South DC
> NSW 1797
>
> [Sometimes waking up just isn't worth the insult of the day to come.]
>

meta_stats

storage_stats

Bernd Schubert

unread,

Dec 11, 2013, 8:24:31 AM12/11/13

to fhgfs...@googlegroups.com

Hello Peter, hello David,

the values shown at 0s are the summary of all operations since
meta/storage server restart. Values are raw values, so you need to
devide the values by the given interval (10s) here.

Is 10.0.0.147 the MacOSX smb-client? We can see 100 to 250 read-dir
requests/s from that client, with a max of 100 dir-entries per
request-reply and 200 to 300 stat requests/s. So for 100000 files you
would need about 300 to 500s (5 to 8min). So already rather slow, but
not the reported 17 min.
Unfortunately I don't know why it is that slow and where the difference
between the calculated and real time comes from.

Hmm, maybe you run into the infamous 64-bit seekdir/telldir bug?

https://bugzilla.samba.org/show_bug.cgi?id=2662
https://bugzilla.redhat.com/show_bug.cgi?id=843765

(I have to admit that it was I who brought to this issue to the surface,
when I fixed an ext3/4 readdir issue in the kernel, which introduced
64-bit telldir/seekdir offsets. This fixed the issue for FhGFS, NFS and
other network file system, but made it worse for non-posix compliant
implementations, such as samba and gluster (I think gluster fixed it in
the mean time), which expect only 32-bit offsets).

If you should indeed run into this bug, the only practical way would be
to fix samba or to use an underlying meta-file system, which is unlikely
to use 64-bit directory-offsets, i.e. xfs.
The impractical (in other words, very slow) way would be to disable
ext3/ext4 directory-indexes (htree).

Best regards,
Bernd

Pete Sero

unread,

Dec 11, 2013, 8:58:20 AM12/11/13

to fhgfs...@googlegroups.com

On Wed 11 Dec '13 md, at 21:24 st, Bernd Schubert <bernd.s...@itwm.fraunhofer.de> wrote:

> Hello Peter, hello David,
>
> the values shown at 0s are the summary of all operations since meta/storage server restart. Values are raw values, so you need to devide the values by the given interval (10s) here.

I see...

>
> Is 10.0.0.147 the MacOSX smb-client? We can see 100 to 250 read-dir requests/s from that client, with a max of 100 dir-entries per request-reply and 200 to 300 stat requests/s.

You mean smb-server? which is passing traffic over to the actual Mac client?

> So for 100000 files you would need about 300 to 500s (5 to 8min). So already rather slow, but not the reported 17 min.

the rddirs are over after after 1040s or 17mins, so we got this one right now.
100000 was my wild guess earlier, never mentioned by david.
now we see it's probably around 300000 files, at least within one order of
magnitude to my guess ;-)

Possible next steps:
- 64-bit seekdir/telldir bug?
- check out Max OSX 10.9 - enable SMB2
- back to SMB packet tracing…

Cheers

— Peter

Bernd Schubert

unread,

Dec 11, 2013, 9:08:57 AM12/11/13

to fhgfs...@googlegroups.com

On 12/11/2013 02:58 PM, Pete Sero wrote:
>
> On Wed 11 Dec '13 md, at 21:24 st, Bernd Schubert <bernd.s...@itwm.fraunhofer.de> wrote:
>
>> Hello Peter, hello David,
>>
>> the values shown at 0s are the summary of all operations since meta/storage server restart. Values are raw values, so you need to devide the values by the given interval (10s) here.
>
> I see...
>
>>
>> Is 10.0.0.147 the MacOSX smb-client? We can see 100 to 250 read-dir requests/s from that client, with a max of 100 dir-entries per request-reply and 200 to 300 stat requests/s.
>
> You mean smb-server? which is passing traffic over to the actual Mac client?

Er sorry, yes, the smb-server running on a fhgfs-client.

>
>
>> So for 100000 files you would need about 300 to 500s (5 to 8min). So already rather slow, but not the reported 17 min.
>
>
> the rddirs are over after after 1040s or 17mins, so we got this one right now.
> 100000 was my wild guess earlier, never mentioned by david.
> now we see it's probably around 300000 files, at least within one order of
> magnitude to my guess ;-)
>
>
> Possible next steps:
> - 64-bit seekdir/telldir bug?
> - check out Max OSX 10.9 - enable SMB2
> - back to SMB packet tracing…

Interesting also would be clientstats when "du -sh" is running on this
client.
Also, is the fhgfs-client running over tcp or ib-rdma (fhgfs-net output).

Thanks,
Bernd

David Minard

unread,

Dec 11, 2013, 6:51:18 PM12/11/13

to fhgfs...@googlegroups.com

On 12/12/2013, at 12:24 AM, Bernd Schubert wrote:

> Hello Peter, hello David,
>
> the values shown at 0s are the summary of all operations since meta/storage server restart. Values are raw values, so you need to devide the values by the given interval (10s) here.
>
> Is 10.0.0.147 the MacOSX smb-client? We can see 100 to 250 read-dir requests/s from that client, with a max of 100 dir-entries per request-reply and 200 to 300 stat requests/s. So for 100000 files you would need about 300 to 500s (5 to 8min). So already rather slow, but not the reported 17 min.
> Unfortunately I don't know why it is that slow and where the difference between the calculated and real time comes from.

10.0.0.215 is the Mac client. Strange it's not mentioned in the stats at all....

10.0.0.146 - fhgfs client2/samba
10.0.0.147 - fhgfs client1/samba
10.0.0.199 - imaps server (Dovecot)/fhgfs client - only my mail client is using it at the moment.
10.0.0.137 - Storage1(XFS), management
10.0.0.138 - Storage2(XFS), meta1(ext4)
10.0.0.139 - Storage3(XFS), meta2(ext4)
10.0.0.140 - Storage4(XFS) - interesting that this one isn't mentioned at all. All data should be evenly spread across all 4 storage nodes.

>
> Hmm, maybe you run into the infamous 64-bit seekdir/telldir bug?
>
> https://bugzilla.samba.org/show_bug.cgi?id=2662
> https://bugzilla.redhat.com/show_bug.cgi?id=843765
>
> (I have to admit that it was I who brought to this issue to the surface, when I fixed an ext3/4 readdir issue in the kernel, which introduced 64-bit telldir/seekdir offsets. This fixed the issue for FhGFS, NFS and other network file system, but made it worse for non-posix compliant implementations, such as samba and gluster (I think gluster fixed it in the mean time), which expect only 32-bit offsets).
>
> If you should indeed run into this bug, the only practical way would be to fix samba or to use an underlying meta-file system, which is unlikely to use 64-bit directory-offsets, i.e. xfs.

I could change the two meta servers from ext4 to XFS after I get the "du -s" and W7 stats. See what shows up.

David Minard

unread,

Dec 11, 2013, 6:51:23 PM12/11/13

to fhgfs...@googlegroups.com

I'll get that done. I'll also get the stats with a W7 box.

> Also, is the fhgfs-client running over tcp or ib-rdma (fhgfs-net output).
>

The environment we're using is all TCP. Gigabit. We're running everything on VMs (XCP). The NICs on the hosts are LACP bonded (2 NICs). The other samba server is also in this environment, and is quite quite. Admittedly, it's running S3.6.9-151.el6 and the version on the fhgfs client is S3.6.21.

>
> Thanks,
> Bernd

>
> --
> You received this message because you are subscribed to the Google Groups "fhgfs-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>

David Minard.
Ph: 0247 360 155
Fax: 0247 360 770

School of Computing, Engineering, and Mathematics
Building Y - Penrith Campus (Kingswood)
Locked bag 1797
Penrith South DC
NSW 1797

[Sometimes waking up just isn't worth the insult of the day to come.]

--

David Minard

unread,

Dec 11, 2013, 7:01:34 PM12/11/13

to fhgfs...@googlegroups.com

Just a quick one. The directory I'm testing has 32452 files/directories in it.

Sven Breuner

unread,

Dec 12, 2013, 5:48:23 AM12/12/13

to fhgfs...@googlegroups.com, David Minard

hi david,

* for clarification on the "fhgfs-ctl --clientstats" output: it shows IP
addresses of hosts that are talking to the fhgfs servers.

for "--clientstats --nodetype=meta", this can be fhgfs clients or other
metadata servers (because the metadata servers are also talking to each
other, e.g. when a new directory is created).

for "--clientstats --nodetype=storage", this can again be fhgfs clients
or metadata servers, but no other storage servers (because the storage
servers are not talking to each other).

so it's normal that you won't see your samba client IP or the IP of a
storage server in this output.

* regarding your directory with 32452 files/subdirs: from the
clientstats output is seems like samba is doing more than one stat()
operation per file/subdir, because we see 200-300 stat() ops per second
and thus the listing should be complete after roughly two minutes.
so the only conclusion i can see now is that we have more than one stat
operation per file/subdir, which might have a good reason, but
intuitively seems rather unnecessary, of course.
(one reason could be that the mac is also looking into subdirs to
generate thumbnails or so)

i'm not a mac user, so i have no idea how things work there (and if
there is something like strace or so), and thus i will say what i would
do on linux now (which is similar to the approach of how we got the
other samba patch):

1) switch to a smaller directory for testing. there's probably no reason
to always wait 17 minutes for the results ;-)

2) strace the application on the samba client. on linux, that would be
something like
$ strace ls -l 2>/tmp/strace.out

3) strace the samba server for the same workload

4) if you see that the server is doing more I/O operations than the
application on the client (e.g. one stat() from the client application
results in two or more stat() operations on the server), then you could
report to the samba mailing list that you see operations that seem
unnecessary and ask if they see a way to avoid them. (or to make it
easier for them, you could also try to identify the corresponding line
in the samba code first and then include that in the report.)

if, on the other hand, you see from the application (in my example the
"ls -l" above) strace that the application already does multiple stat()
operations per file/subdir, then this is obviously not a samba problem
and you might want to consider talking to the application people - which
might not be that easy in your case, but at least for "ls -l" we
actually did that a while ago and it also resulted in ls avoiding some
unnecessary calls in more recent versions of the coreutils package.

best regards,
sven

Sven Breuner

unread,

Dec 12, 2013, 6:54:11 AM12/12/13

to fhgfs...@googlegroups.com, David Minard

p.s.: of course, the directory offset loop problem that bernd mentioned
would also be an explanation for they this use-case results in multiple
stat() ops per file/subdir, but that should also be visible in the
strace if the testdir is sufficiently large.

best regards,
sven

David Minard

unread,

Dec 12, 2013, 6:49:58 PM12/12/13

to fhgfs...@googlegroups.com

G'day Sven, Bernd, and Pete,

I'll try all your suggestions and see what happens. It may take me a while as I can only work on this from time to time - until we can go production that is.

Cheers,

Reply all

Reply to author

Forward