BeeGFS Error

2,598 views
Skip to first unread message

Junaid Zulfiqar

unread,
Jul 26, 2017, 4:45:05 PM7/26/17
to fhgfs...@googlegroups.com
Hi all,

I am trying to setup BeeGFS using 8 servers but I am getting following error when starting beegfs-client. Command and output are given below.

#sudo invoke-rc.d beegfs-helperd start

Starting BeeGFS Client:
Loading BeeGFS modules
Mounting directories from /etc/beegfs/beegfs-mounts.conf
mount: Operation canceled
invoke-rc.d: initscript beegfs-client, action "start" failed.

Does anyone know how to fix the error?

Thank you,
Junaid

Sven Breuner

unread,
Jul 26, 2017, 7:34:29 PM7/26/17
to fhgfs...@googlegroups.com, Junaid Zulfiqar

Hi Junaid,

do you see an error message in /var/log/beegfs-client.log?

Best regards
Sven

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Junaid Zulfiqar

unread,
Jul 27, 2017, 10:44:53 AM7/27/17
to Sven Breuner, fhgfs...@googlegroups.com
Hi Sven,

Thank you for responding, yes I do see the following error in log file. Will it fix the issue if I add cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us to the known hosts list? However, it already has cufs1 in the known hosts.


(3) Jul26 16:25:18 *mount(28823) [DatagramListener (init sock)] >> Listening for UDP datagrams: Port 8004
(1) Jul26 16:25:18 *mount(28823) [App_logInfos] >> BeeGFS Client Version: 2015.03-r25
(2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> ClientID: 7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
(2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Usable NICs: ib0(RDMA) ib0(TCP) eth0(TCP) eth1.4080(TCP)
(2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Net filters: 1
(3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)] >> Connected: beegfs-...@127.0.0.1:8006 (protocol: TCP)
(2) Jul26 16:25:18 *beegfs_DGramLis(28824) [Heartbeat incoming] >> New node: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us [ID: 1];
(3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Management node found. Downloading node groups...
(1) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)] >> Connect failed on all available routes: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.clo
udlab.us [ID: 1]
(3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Node registration...
(2) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Unable to connect to: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us [ID: 1]
(3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Retrying communication with node: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us [ID: 1]
(3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Init complete.
(0) Jul26 16:25:18 *mount(28823) [Stat root dir] >> Unable to proceed without a working root metadata node
(0) Jul26 16:25:18 *mount(28823) [Mount sanity check] >> Retrieval of root directory entry failed. Are all metadata servers running and registered at the management dae
mon? (Error: Unknown node)
(2) Jul26 16:25:18 *mount(28823) [App (stop components)] >> Stopping components...
(2) Jul26 16:25:20 *mount(28823) [App (wait for component termination)] >> Still waiting for this component to stop: beegfs_AckMgr
(2) Jul26 16:25:21 *mount(28823) [App (wait for component termination)] >> Component stopped: beegfs_AckMgr
(1) Jul26 16:25:21 *mount(28823) [App (stop)] >> All components stopped.

Thank you,
Junaid

To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com.

Sven Breuner

unread,
Jul 27, 2017, 10:54:50 AM7/27/17
to fhgfs...@googlegroups.com, Junaid Zulfiqar
Hi Junaid,

it looks like TCP connection with the management service is failing/blocked.

Is there any firewall on the client host or on the management service host
enabled that could block connections from BeeGFS?

You can also set logLevel=5 in /etc/beegfs/beegfs-client.conf to see more
detailed information about which IP addresses and ports the client tries to
connect to.

Best regards,
Sven

Junaid Zulfiqar wrote on 27.07.2017 16:44:
> Hi Sven,
>
> Thank you for responding, yes I do see the following error in log file. Will it
> fix the issue if I add cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us> to the known hosts
> list? However, it already has cufs1 in the known hosts.
>
>
> (3) Jul26 16:25:18 *mount(28823) [DatagramListener (init sock)] >> Listening for
> UDP datagrams: Port 8004
> (1) Jul26 16:25:18 *mount(28823) [App_logInfos] >> BeeGFS Client Version:
> 2015.03-r25
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> ClientID:
> 7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Usable NICs: ib0(RDMA)
> ib0(TCP) eth0(TCP) eth1.4080(TCP)
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Net filters: 1
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)] >>
> Connected: beegfs-...@127.0.0.1:8006 <http://beegfs-...@127.0.0.1:8006>
> (protocol: TCP)
> (2) Jul26 16:25:18 *beegfs_DGramLis(28824) [Heartbeat incoming] >> New node:
> beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us> [ID: 1];
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Management node found.
> Downloading node groups...
> (1) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)] >>
> Connect failed on all available routes: beegfs-mgmtd
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.clo
> udlab.us <http://udlab.us> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Node registration...
> (2) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Unable to
> connect to: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Retrying
> communication with node: beegfs-mgmtd
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Init complete.
> (0) Jul26 16:25:18 *mount(28823) [Stat root dir] >> Unable to proceed without a
> working root metadata node
> (0) Jul26 16:25:18 *mount(28823) [Mount sanity check] >> Retrieval of root
> directory entry failed. Are all metadata servers running and registered at the
> management dae
> mon? (Error: Unknown node)
> (2) Jul26 16:25:18 *mount(28823) [App (stop components)] >> Stopping components...
> (2) Jul26 16:25:20 *mount(28823) [App (wait for component termination)] >> Still
> waiting for this component to stop: beegfs_AckMgr
> (2) Jul26 16:25:21 *mount(28823) [App (wait for component termination)] >>
> Component stopped: beegfs_AckMgr
> (1) Jul26 16:25:21 *mount(28823) [App (stop)] >> All components stopped.
>
> Thank you,
> Junaid
>
>
> On Wed, Jul 26, 2017 at 7:34 PM, Sven Breuner <sven.b...@thinkparq.com
> <mailto:sven.b...@thinkparq.com>> wrote:
>
> Hi Junaid,
>
> do you see an error message in /var/log/beegfs-client.log?
>
> Best regards
> Sven
>
> Am 26. Juli 2017 10:45:09 nachm. schrieb Junaid Zulfiqar
> <jzu...@g.clemson.edu <mailto:jzu...@g.clemson.edu>>:
>
>> Hi all,
>>
>> I am trying to setup BeeGFS using 8 servers but I am getting following
>> error when starting beegfs-client. Command and output are given below.
>>
>> #sudo invoke-rc.d beegfs-helperd start
>>
>> Starting BeeGFS Client:
>> Loading BeeGFS modules
>> Mounting directories from /etc/beegfs/beegfs-mounts.conf
>> mount: Operation canceled
>> invoke-rc.d: initscript beegfs-client, action "start" failed.
>>
>> Does anyone know how to fix the error?
>>
>> Thank you,
>> Junaid
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "beegfs-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to fhgfs-user+...@googlegroups.com
>> <mailto:fhgfs-user+...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "beegfs-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to fhgfs-user+...@googlegroups.com
> <mailto:fhgfs-user+...@googlegroups.com>.

Junaid Zulfiqar

unread,
Jul 27, 2017, 11:27:53 AM7/27/17
to Sven Breuner, fhgfs...@googlegroups.com
Hi Sven,

I was able to fix the connection issue and it is connecting to the management node. However, now I am getting following error of root directory. 

(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >> Establishing new TCP connection to: beegfs-...@127.0.0.1:8006
(3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >> Connected: beegfs-...@127.0.0.1:8006 (protocol: TCP)
(2) Jul27 11:20:21 *beegfs_DGramLis(36306) [Heartbeat incoming] >> New node: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us [ID: 1];
(3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Init] >> Management node found. Downloading node groups...
(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >> Establishing new TCP connection to: beegfs...@169.254.92.101:8008
(3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >> Connected: beegfs...@169.254.92.101:8008 (protocol: TCP)
(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [InternodeSyncer] >> Metadata node states synced.
(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Update states and mirror groups] >> Target states synced.
(3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Init] >> Node registration...
(2) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Registration] >> Node registration successful.
(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [InternodeSyncer] >> Metadata node states synced.
(4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Update states and mirror groups] >> Target states synced.
(3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Init] >> Init complete.
(0) Jul27 11:20:21 *mount(36305) [Stat root dir] >> Unable to proceed without a working root metadata node
(0) Jul27 11:20:21 *mount(36305) [Mount sanity check] >> Retrieval of root directory entry failed. Are all metadata servers running and registered at the management daemon? (Error: Unknown node)
(2) Jul27 11:20:21 *mount(36305) [App (stop components)] >> Stopping components...

Am I missing some steps? or do I need to reconfigure everything? If so, what is the best way to delete configurations?

Thank you,
Junaid



--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com <mailto:fhgfs-user+unsubscribe@googlegroups.com>.

Sven Breuner

unread,
Jul 27, 2017, 4:39:10 PM7/27/17
to fhgfs...@googlegroups.com, Junaid Zulfiqar
Hi Junaid,

how did you fix the connection problem of the client?

Did the metadata service successfully register at the management service or does
it maybe also encounter a connection problem similar to the client before?

If it successfully registered, you should see a corresponding message in
/var/log/beegfs-meta.log

You can also list the registered metadata servers by using...
$ beegfs-ctl --listnodes --nodetype=meta --nicdetails
...on a client host.

$ beegfs-check-servers
...on a client host will check whether this client can connect to all registered
servers (which also implicitly checks whether all registered services are
running, hence the name of the tool).

Best regards,
Sven


Junaid Zulfiqar wrote on 27.07.2017 17:27:
> Hi Sven,
>
> I was able to fix the connection issue and it is connecting to the management
> node. However, now I am getting following error of root directory.
>
> (4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >>
> Establishing new TCP connection to: beegfs-...@127.0.0.1:8006
> <http://beegfs-...@127.0.0.1:8006>
> (3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >>
> (2) Jul27 11:20:21 *beegfs_DGramLis(36306) [Heartbeat incoming] >> New node:
> beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us> [ID: 1];
> (3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [Init] >> Management node found.
> Downloading node groups...
> (4) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >>
> Establishing new TCP connection to: beegfs...@169.254.92.101:8008
> <http://beegfs...@169.254.92.101:8008>
> (3) Jul27 11:20:21 *beegfs_XNodeSyn(36364) [NodeConn (acquire stream)] >>
> Connected: beegfs...@169.254.92.101:8008
> <http://beegfs...@169.254.92.101:8008> (protocol: TCP)
> Junaid Zulfiqar wrote on 27.07.2017 16 <tel:27.07.2017%2016>:44:
>
> Hi Sven,
>
> Thank you for responding, yes I do see the following error in log file.
> Will it fix the issue if I add
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>> to the known
> hosts list? However, it already has cufs1 in the known hosts.
>
>
> (3) Jul26 16:25:18 *mount(28823) [DatagramListener (init sock)] >>
> Listening for UDP datagrams: Port 8004
> (1) Jul26 16:25:18 *mount(28823) [App_logInfos] >> BeeGFS Client
> Version: 2015.03-r25
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> ClientID:
> 7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> <http://7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://7097-5978FAAE-cuclient.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>>
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Usable NICs:
> ib0(RDMA) ib0(TCP) eth0(TCP) eth1.4080(TCP)
> (2) Jul26 16:25:18 *mount(28823) [App_logInfos] >> Net filters: 1
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)]
> >> Connected: beegfs-...@127.0.0.1:8006
> <http://beegfs-...@127.0.0.1:8006>
> <http://beegfs-...@127.0.0.1:8006
> <http://beegfs-...@127.0.0.1:8006>> (protocol: TCP)
> (2) Jul26 16:25:18 *beegfs_DGramLis(28824) [Heartbeat incoming] >> New
> node: beegfs-mgmtd cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>> [ID: 1];
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Management node
> found. Downloading node groups...
> (1) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [NodeConn (acquire stream)]
> >> Connect failed on all available routes: beegfs-mgmtd
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.clo
> udlab.us <http://udlab.us> <http://udlab.us> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Node registration...
> (2) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Unable
> to connect to: beegfs-mgmtd
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Messaging (RPC)] >> Retrying
> communication with node: beegfs-mgmtd
> cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us
> <http://cufs1.e2e-sos.strd-of-srv-pg0.clemson.cloudlab.us>> [ID: 1]
> (3) Jul26 16:25:18 *beegfs_XNodeSyn(28882) [Init] >> Init complete.
> (0) Jul26 16:25:18 *mount(28823) [Stat root dir] >> Unable to proceed
> without a working root metadata node
> (0) Jul26 16:25:18 *mount(28823) [Mount sanity check] >> Retrieval of
> root directory entry failed. Are all metadata servers running and
> registered at the management dae
> mon? (Error: Unknown node)
> (2) Jul26 16:25:18 *mount(28823) [App (stop components)] >> Stopping
> components...
> (2) Jul26 16:25:20 *mount(28823) [App (wait for component termination)]
> >> Still waiting for this component to stop: beegfs_AckMgr
> (2) Jul26 16:25:21 *mount(28823) [App (wait for component termination)]
> >> Component stopped: beegfs_AckMgr
> (1) Jul26 16:25:21 *mount(28823) [App (stop)] >> All components stopped.
>
> Thank you,
> Junaid
>
>
> On Wed, Jul 26, 2017 at 7:34 PM, Sven Breuner
> <sven.b...@thinkparq.com <mailto:sven.b...@thinkparq.com>
> <mailto:sven.b...@thinkparq.com <mailto:sven.b...@thinkparq.com>>>
> wrote:
>
> Hi Junaid,
>
> do you see an error message in /var/log/beegfs-client.log?
>
> Best regards
> Sven
>
> Am 26. Juli 2017 10:45:09 nachm. schrieb Junaid Zulfiqar
> <jzu...@g.clemson.edu <mailto:jzu...@g.clemson.edu>
> <mailto:jzu...@g.clemson.edu <mailto:jzu...@g.clemson.edu>>>:

Junaid Zulfiqar

unread,
Jul 28, 2017, 12:46:10 PM7/28/17
to Sven Breuner, fhgfs...@googlegroups.com
On Thu, Jul 27, 2017 at 4:39 PM, Sven Breuner <sven.b...@thinkparq.com> wrote:
Hi Junaid,

how did you fix the connection problem of the client?
 
I was using wrong subnet in beegfs/netfilter.
 

Did the metadata service successfully register at the management service or does it maybe also encounter a connection problem similar to the client before?

If it successfully registered, you should see a corresponding message in /var/log/beegfs-meta.log

This log file is not present, so I assume it didn't connect successfully. 
 
You can also list the registered metadata servers by using...
$ beegfs-ctl --listnodes --nodetype=meta --nicdetails
...on a client host.

$ beegfs-check-servers
...on a client host will check whether this client can connect to all registered servers (which also implicitly checks whether all registered services are running, hence the name of the tool).


I only see management node with this command. Metadata and storage lists are empty. Should I use the above command to add these nodes?
 
Reply all
Reply to author
Forward
0 new messages