Hi Frank,
thank your very much for your reply!
In the meanwhile found some surprising values. It seems that one of our
worker nodes shows a very poor performance for fhgfs, while it behaves
normal with lustre, panasas and nfs. One other worker node shows also a
normal behaviour for fhgfs. See the reuslts at the end of this email.
>
> Could you start this bonnie command: "bonnie++ -s0 -n 1:1:1:1 -r0 -d
> /FHGFS_MOUNT -u root". This command will finish in a few seconds. If
> this will not finish strace the execution of bonnie and send us the
> output. Use the following command: "strace -Ttt bonnie++ -s0 -n 1:1:1:1
> -r0 -d /FHGFS_MOUNT -u root"
>
wn001 is our "buggy" worker node
[root@wn001 ~]# bonnie++ -s0 -n 1:1:1:1 -r0 -d /mnt/fhgfs/ -u root
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Create------ --------Random
Create--------
wn001 -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
1:1:1 231 1 320 2 491 2 273 1 202 1
557 2
Latency 205ms 206ms 210ms 205ms 209ms
204ms
1.96,1.96,wn001,1,1348210770,,,,,,,,,,,,,,1,1,1,,,231,1,320,2,491,2,273,1,202,1,557,2,,,,,,,205ms,206ms,210ms,205ms,209ms,204ms
[root@wn001 ~]#
[root@wn002 ~]# bonnie++ -s0 -n 1:1:1:1 -r0 -d /mnt/fhgfs/ -u root
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.96 ------Sequential Create------ --------Random
Create--------
wn002 -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
1:1:1 446 2 536 4 741 2 496 4 535 4
734 2
Latency 2431us 3340us 28355us 4159us 36860us
1663us
1.96,1.96,wn002,1,1348214493,,,,,,,,,,,,,,1,1,1,,,446,2,536,4,741,2,496,4,535,4,734,2,,,,,,,2431us,3340us,28355us,4159us,36860us,1663us
The results of our kernel (only kernel, no tools) compilation are:
wn001 panasas:
real 7m14s
user 5m17s
wn001 fhgfs:
real 310m26s
user 5m50s
wn002 fhgfs: (we got the same values for a node with a 10Gb connection)
real 34m26s
user 5m45s
wn002 panasas:
real 8m1s
user 5m18s
> In your first email you wrote that your fhgfs environment has 2 servers
> one server for metadata and one server for storage. In the output of
> fhgfs-net both servers are metadata server and storage server. Did you
> reconfigure fhgfs or is there something wrong?
>
Yes, in the meanwhile I reconfigured our installation in order to find
our bug.