Striping across two storage servers

Jeffrey Layton

unread,

Jan 19, 2017, 4:18:04 PM1/19/17

to beegfs-user

Good afternoon,

I've been running BeeGFS on CentOS 7 with a single metadata server and two storage servers. Each storage server has 1 volume of 10TB. There is a master node and 32 clients with everything connected via 10GigE.

I'm running some tests with iozone on the storage system including sequential and random IOPS tests. I started running the tests with 32 clients running IOzone against the BeeGFS file system at the same time. I can see the load on the storage servers going up when running the 4K random IOPS tests. However, the load on the first storage server reaches about 12 while the load on the second storage server never really gets about 0.1 (all checked with 'uptime').

I believe I set the number of targets for the directory correctly.

[centos@ip-10-0-0-10 ~]$ ssh 10.0.0.100 sudo beegfs-ctl --getentryinfo /mnt/beegfs
Path:
Mount: /mnt/beegfs
EntryID: root
Metadata node: ip-10-0-0-20 [ID: 1]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 1M
+ Number of storage targets: desired: 2
Connection to 10.0.0.100 closed.

In the case of the sequential tests, a total of 120GB of data is written/read (much smaller than the volume on each storage server). For the 4K random IOPS test, a total of 40GB is written/read, which is also much less than the space on each storage server.

Any comments or tips on what to check to make sure BeeGFS is writing to both storage servers?

Thanks!

Jeff

Sven Breuner

unread,

Jan 19, 2017, 6:40:10 PM1/19/17

to fhgfs...@googlegroups.com, Jeffrey Layton

Hi Jeff,

$ beegfs-ctl --serverstats --perserver --names --interval=1
...can show you live statistics for each of the servers to confirm that they are
both serving requests in your test.

Using "beegfs-ctl --getentryinfo" with a path to a specific file can also
confirm that the file gets striped across the desired number of targets.
Just to give an example: One reason why BeeGFS would not stripe files across the
desired number of targets is when one of the targets is running out of disk
space - but that seems of course rather unlikely in your test case.

$ beegfs-df
...can show you whether BeeGFS thinks that all targets are in the "normal" free
space pool and the total space of each target. Sometimes a mistake is that a
RAID volume is not mounted when BeeGFS is started or there is a typo in the path
to the BeeGFS storage target and thus BeeGFS is unintendedly using the OS
partition instead of the RAID volume.

$ beegfs-ctl --listtargets --longnodes --state
...can show you whether the BeeGFS management service thinks that all the
storage targets are online. If storage targets are not online, that would be
another reason for BeeGFS not to assign the desired number of targets for new files.

Best regards,
Sven

Jeff Layton

unread,

Jan 20, 2017, 7:43:43 AM1/20/17

to fhgfs...@googlegroups.com, Jeffrey Layton

Sven,

Thanks for the tips! I'm trying the commands out now to see why the
second storage server isn't being used.

Thanks!

Jeff

Jeff Layton

unread,

Jan 20, 2017, 9:21:51 AM1/20/17

to fhgfs...@googlegroups.com

This morning, I rebuilt BeeGFS and took a look at the storage configuration.

[centos@ip-10-0-0-10 ~]$ ssh 10.0.0.100 beegfs-ctl --listnodes
--nodetype=storage --details
ip-10-0-0-30 [ID: 1]
Ports: UDP: 8003; TCP: 8003
Interfaces: eth0(TCP)

Number of nodes: 1
Connection to 10.0.0.100 closed.

It's only showing 1 storage server (1 node) and I have 2 storage nodes
(the second storage server is 10.0.0.31). The beegfs logs on the second
storage server are:

(3) Jan20 14:12:40 Main [RegDGramLis] >> Listening for UDP datagrams:
Port 8003
(1) Jan20 14:12:40 Main [App] >> Waiting for beegfs...@10.0.0.10:8008...
(2) Jan20 14:12:40 RegDGramLis [Heartbeat incoming] >> New node:
beegfs-mgmtd ip-10-0-0-10 [ID: 1];
(3) Jan20 14:12:40 Main [NodeConn (acquire stream)] >> Connected:
beegfs...@10.0.0.10:8008 (protocol: TCP)
(1) Jan20 14:12:40 Main [App] >> Version: 6.3
(2) Jan20 14:12:40 Main [App] >> LocalNode: beegfs-storage ip-10-0-0-31
[ID: 1]
(2) Jan20 14:12:40 Main [App] >> Usable NICs: eth0(TCP)
(2) Jan20 14:12:40 Main [App] >> Storage targets: 1
(3) Jan20 14:12:40 Main [RegDGramLis] >> Listening for UDP datagrams:
Port 8003
(1) Jan20 14:12:42 Main [Register node] >> Node registration not
successful. Management node offline? Will keep on trying...

I can ssh to the management node (10.0.0.10) and vice-versa. All the
ports between 10.0.0.31 and 10.0.0.10 are open (particularly 8003)/

I built BeeGFS with the following commands:

sudo /etc/init.d/beegfs-mgmtd start
sleep 2

#meta# - metadata servers
ssh 10.0.0.20 "sudo /opt/beegfs/sbin/beegfs-setup-meta -p /beegfs-meta0
-s 1 -m 10.0.0.10"

sleep 2
#data# - data servers
ssh 10.0.0.31 "sudo /opt/beegfs/sbin/beegfs-setup-storage -p
/beegfs-data1 -s 1 -i 101 -m 10.0.0.10"
ssh 10.0.0.30 "sudo /opt/beegfs/sbin/beegfs-setup-storage -p
/beegfs-data0 -s 0 -i 001 -m 10.0.0.10"

#= Start services
sleep 2
#=meta= metadata servers
ssh 10.0.0.20 "sudo /etc/init.d/beegfs-meta start"

sleep 2
#=data= data servers
ssh 10.0.0.31 "sudo /etc/init.d/beegfs-storage start"
ssh 10.0.0.30 "sudo /etc/init.d/beegfs-storage start"

ssh 10.0.0.100 "sudo /etc/init.d/beegfs-helperd start"

sleep 2
sudo /etc/init.d/beegfs-client start
#=client= clients
ssh 10.0.0.100 "sudo /etc/init.d/beegfs-client start"

Any suggestions on where to look for issues?

Thanks!

Jeff

Sven Breuner

unread,

Jan 24, 2017, 5:24:19 AM1/24/17

to fhgfs...@googlegroups.com, Jeff Layton

Hi Jeff,

the problem is that you tried to use "-s 0" (zero) as an ID. The valid ID range
is 1..65535.

>> ssh 10.0.0.30 "sudo /opt/beegfs/sbin/beegfs-setup-storage -p /beegfs-data0 -s 0
>> -i 001 -m 10.0.0.10"

You can simply "rm -rf /beegfs-data0" on 10.0.0.30 and run the
beegfs-setup-storage command again with a different ID, e.g. "-s 2 -i 201".

(Yes, I know what you're thinking and agree: There should be an error message
notifying you about the fact that 0 is invalid ...but at least it's documented
in "beegfs-setup-storage -h" ;-) ).

Best regards,
Sven

Jeff Layton

unread,

Jan 24, 2017, 2:09:15 PM1/24/17

to fhgfs...@googlegroups.com

Thanks you!

I pleased guilty to not reading all the documentation and help. I looked
at it but I definitely missed this important information.

Thanks!

Jeff

Reply all

Reply to author

Forward