On Mon, Sep 22, 2014 at 11:29 AM, Summers, James B. II <
jsum...@ou.edu> wrote:
>
> The first one is writing to one of the fhgfs filesystems, second writing to an
> nfs mount, third our second fhgfs filesystem, and lastly to a local scratch
> partition that is a raid5. This provides some interesting numbers in that some
> other benchmarking I did had the fhgfs and nfs basically about the same. That
> one was reading in a large binary file and writing one back out. The users are
> really focused on running an application from nasa named Ledaps (
>
http://ledapsweb.nascom.nasa.gov/ ). The benchmarking they provided show
> totally different results, in that r/w to fhgfs was a lot slower than NFS and of
> course local disk. So I am starting to lean toward it being how the application
> itself is doing it's I/O.
>
> For example the following are the time to complete a ledap run / scene:
>
> NFS: 824s
> fhgfs0: 7415s
> fhgfs1: 4980s
> local: 703s
>
> The ledaps software they use is a set of C programs. I wonder if there are some
> C I/O statements that could be slower than others or the parameters are not set
> correctly for r/w to network filesystems?
I think what is happening is the way ledaps writes to disk -- write
pattern. I would recommend profiling the application and see what its
doing (or email the developers). I have been told oprofile is a good
tool. Can also try strace and count the number of write calls or even
gdb.
For example, it was very common, and surprisingly still is, that
software write very small files to disk as checkpoint or temporary
data rather then use system memory. For some applications, you can
specify where tmp data will live, if so, always select local hard
drive or something like a tmpfs filesystem (/dev/shm). Any sort of
filesystem will suffer from small writes and a lot of them, the best
solution would be to cache requests. If this is the case, consider
caching more requests on the client side
http://www.fhgfs.com/wiki/Caching
This is all conjecture, but would help explain whats happening. I know
that NFS is very good at caching requests on the client side before
writing it out to the remote server. When we use to run gluster, this
was a trick we did to help solve this issue (NFS server / client loop
back)
http://gluster.org/pipermail/gluster-users/2012-September/011324.html
The numbers you provided are a bit low in my opinion. I think it would
be worth while to run the same dd commands on the storage servers
(without going via network mounts or FhgFS / BeeGFS) and compare them
with the dd commands over the network mount. They should not be _that_
far off. I would recommend start debugging here and work your way out
towards a client mounting BeeGFS. Another conjecture is to make sure
your network is not saturated.
Your third dd result, writing to your second BeeGFS instance, is
pretty bad: 25.0 MB/s. While your first result to BeeGFS: 99.9 MB/s I
think is pretty close to GigE write speeds -- makes me wonder if the
request was cached somehow. I would take the average of those as a
better compression.
Clearly the NFS request was cached as 830 MB/s exceeds GigE wire speeds.
Hope that helps. Maybe one of the developers can comment on this as well,
-Adam