rocks sync config not working properly

117 views
Skip to first unread message

Scott Givan

unread,
Oct 24, 2017, 11:23:03 AM10/24/17
to rocks7-beta
Hi,

When I run 'rocks sync config', changes to /etc/auto.share are only propagated to one of the two compute nodes:


[root@login-0-0 install]# rocks run host compute 'hostname; cat /etc/auto.share'
compute
-0-2.local
apps login
-0-0.local:/export/&
org login
-0-0.local:/export/&
compute
-0-5.local
apps login
-0-0.local:/export/&
org login
-0-0.local:/export/&
[root@login-0-0 install]# echo 'test login-0-0.local:/export/&' >> /etc/auto.share
[root@login-0-0 install]# cat /etc/auto.share
apps login
-0-0.local:/export/&
org login
-0-0.local:/export/&
test login
-0-0.local:/export/&
[root@login-0-0 install]# rocks sync config
[root@login-0-0 install]# rocks run host compute 'hostname; cat /etc/auto.share'
compute
-0-2.local
apps login
-0-0.local:/export/&
org login
-0-0.local:/export/&
compute
-0-5.local
apps login
-0-0.local:/export/&
org login
-0-0.local:/export/&
test login
-0-0.local:/export/&



Trevor Cooper

unread,
Oct 24, 2017, 12:03:19 PM10/24/17
to rocks7-beta
Scott,

Make sure channeld is running on compute-0-2. If it's not running the node doesn't receive the signal to pull the updated file(s). This can happen in Rocks 6 as well.

Also, please verify the MD5 sum of the service411.py file with the following command...

# find /opt/rocks -type f -name service411.py -exec md5sum '{}' \;
8460551915ab7829d96e5ec221039b32  /opt/rocks/lib/python2.7/site-packages/rocks/service411.py

...matches what I've provided.

It wouldn't hurt to also know when you installed your frontend...

# cat /etc/motd
Rocks 7.0 (Manzanita)
Profile built 16:22 16-Oct-2017

Kickstarted 16:42 16-Oct-2017

And what the build date of the base roll installed on frontend is...

# grep timestamp /export/rocks/install/rolls/base/7.0/x86_64/roll-base.xml
<timestamp time="13:14:32" date="October 11 2017" tz="EDT"/>

There is a fix for 411[1] that you may not have depending on when you downloaded the kernel roll and/or installed your frontend.

Trevor

Philip Papadopoulos

unread,
Oct 24, 2017, 12:25:16 PM10/24/17
to Scott Givan, rocks7-beta
I think this is an issue with rpc's -- I'm probably going to change the way notifications are done to tell nodes to pull
new 411 information.  

If you do the following:
rocks run host compute "411get --all"

does that "fix" your issue? that's not a "fix" -- see below.

TLDR;
What rocks sync config does

1. recreates the files in /etc/411.d
2. uses channeld to "notify" nodes of changes

What I'm "piloting" locally
1. create the files in /etc/411.d
2. use serf (https://www.serf.io/intro/index.html) to listen for a "411get event"
     --> when a node sees a 411get event it will pull files.


serf is a cluster membership system that is "serverless". All nodes are peers.   One can post events
and serf will use a "gossip" protocol to deliver events to all nodes in the cluster.  Everything runs in memory.


-P



--
You received this message because you are subscribed to the Google Groups "rocks7-beta" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta+unsubscribe@googlegroups.com.
To post to this group, send email to rocks...@googlegroups.com.
Visit this group at https://groups.google.com/group/rocks7-beta.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocks7-beta/eeccc884-b349-409c-9035-770fa2101fb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Philip Papadopoulos, Ph.D

Scott Givan

unread,
Oct 24, 2017, 12:32:59 PM10/24/17
to rocks7-beta
My dates and checksums are the same as the ones you list.

I can't check the channeld because, for the heck of it, I decided to re-install compute-0-2 by

rocks set host boot compute-0-2 action=install
rocks run host compute-0-2 'reboot now'

And now I'm stuck in some kind of Hell where I can't reinstall the node. Compute-0-2 doesn't seem to PXE boot. It just boots using its local disk. I'm not in the server room, but I guess I'll have to trek over there.

Philip Papadopoulos

unread,
Oct 24, 2017, 1:20:25 PM10/24/17
to Scott Givan, rocks7-beta
Scott,
is this a UEFI node? (not legacy bios).
If legacy, your BIOS boot order needs to be network first.
if UEFI, you should be able to adjust the boot order remotely (if the node is up and running)

More UEFI --
As far as I can tell, UEFI boot targets are just strings and there is NO reliable way to determine which
one is a network boot target.  Furthermore, anaconda explicitly makes the disk you just installed FIRST in the UEFI
boot order, even if you have specified in BIOS that you want network (yep. it diddles it for you).

here's an example:
[root@idigbio-1-0 ~]# efibootmgr
BootCurrent: 0007
BootOrder: 0007,0006,0004,0003,0005,0008,0009
Boot0000* BEV device:        Embedded NIC 1: BRCM MBA Slot 0200 v7.2.3
Boot0001* Harddisk:          WDC WD5003ABYX-18WERA0         
Boot0002* CDROM:             TSSTcorp DVD-ROM SN-108FB      
Boot0003* TSSTcorp DVD-ROM SN-108FB              
Boot0004* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C)
Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C)
Boot0006* Rocks
Boot0007* Rocks
Boot0008* EFI Fixed Disk Boot Device 1
Boot0009* EFI Fixed Disk Boot Device 2


Which one of these is the correct PXE device?  It could be 0000,0004,0005.  Can't tell if there is any difference between
0004 and 0005.
Isn't non-standard "standard" firmware fun?

-P




--
You received this message because you are subscribed to the Google Groups "rocks7-beta" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta+unsubscribe@googlegroups.com.
To post to this group, send email to rocks...@googlegroups.com.
Visit this group at https://groups.google.com/group/rocks7-beta.

For more options, visit https://groups.google.com/d/optout.

Scott Givan

unread,
Oct 24, 2017, 3:55:06 PM10/24/17
to rocks7-beta
Hi,

Yes, running

rocks run host compute "411get --all"

"fixes" the problem.

Is this something that would automatically happen on the node eventually? Is it a time-dependent thing where the message hasn't yet been received by that particular node to pull the changes?
To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta...@googlegroups.com.

Philip Papadopoulos

unread,
Oct 24, 2017, 3:59:56 PM10/24/17
to Scott Givan, rocks7-beta
On Tue, Oct 24, 2017 at 12:55 PM, Scott Givan <f1n...@gmail.com> wrote:
Hi,

Yes, running

rocks run host compute "411get --all"

"fixes" the problem.

Is this something that would automatically happen on the node eventually? Is it a time-dependent thing where the message hasn't yet been received by that particular node to pull the changes?
It's supposed to happen when you run rocks sync config -- it's just not working correctly.
for now, when you run rocks sync config, follow it with a
rocks run host compute "411get --all"

-P
To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta+unsubscribe@googlegroups.com.

To post to this group, send email to rocks...@googlegroups.com.
Visit this group at https://groups.google.com/group/rocks7-beta.

For more options, visit https://groups.google.com/d/optout.

Scott Givan

unread,
Oct 24, 2017, 4:02:54 PM10/24/17
to rocks7-beta
These nodes are using legacy BIOS. It seems like I should switch to UEFI, though.

The problem seemed to be related to the installer not being able to properly partition the OS disks on the node. When I got on the node undergoing installation, it was stuck on selecting the installation disk, even though the partition scheme is clearly layed out in /export/rocks/install/site-profiles/7.0/nodes/replace-partition.xml & has worked numerous times previously. The only way I figured out to get it to work was to go through a manual installation on the node letting the installer decide how to partiton the OS disk (into LVM groups). Then, go back and re-install using the rocks PXE. Now the node is back and the OS disk is partitioned as deliniated in replace-partition.xml. Not really sure why this happened, though.
To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta...@googlegroups.com.

To post to this group, send email to rocks...@googlegroups.com.
Visit this group at https://groups.google.com/group/rocks7-beta.

Philip Papadopoulos

unread,
Oct 24, 2017, 4:14:47 PM10/24/17
to Scott Givan, rocks7-beta
On Tue, Oct 24, 2017 at 1:02 PM, Scott Givan <f1n...@gmail.com> wrote:
These nodes are using legacy BIOS. It seems like I should switch to UEFI, though.

The problem seemed to be related to the installer not being able to properly partition the OS disks on the node. When I got on the node undergoing installation, it was stuck on selecting the installation disk, even though the partition scheme is clearly layed out in /export/rocks/install/site-profiles/7.0/nodes/replace-partition.xml & has worked numerous times previously. The only way I figured out to get it to work was to go through a manual installation on the node letting the installer decide how to partiton the OS disk (into LVM groups). Then, go back and re-install using the rocks PXE. Now the node is back and the OS disk is partitioned as deliniated in replace-partition.xml. Not really sure why this happened, though.
About the only place where documentation has actually caught up is for partitioning. from the updated manual
== raw code in the usersguide ==
In Rocks 7, the &lt;pre&gt; section of any custom partitioning must be executed prior to &lt;pre&gt; section of the node <computeroutput>partition.xml</computeroutput>. In the default configuration graph the node <computeroutput>custom-partition</computeroutput> satisfies this constraint. The simplest place to put your custom partitioning code is to replace this node file.

=== Formatted (by hand) for a bit of readability ===
In Rocks 7, the <pre> section of any custom partitioning must be executed prior to <pre> section of the node partition.xml.  In the default configuration graph the node custom-partition  satisfies this constraint. The simplest place to put your custom partitioning code is to replace this node file.

-P

To unsubscribe from this group and stop receiving emails from it, send an email to rocks7-beta+unsubscribe@googlegroups.com.

To post to this group, send email to rocks...@googlegroups.com.
Visit this group at https://groups.google.com/group/rocks7-beta.

For more options, visit https://groups.google.com/d/optout.

Scott Givan

unread,
Oct 24, 2017, 4:47:46 PM10/24/17
to rocks7-beta
Thanks! I'll give it a try.

Scott Givan

unread,
Oct 24, 2017, 4:48:39 PM10/24/17
to rocks7-beta
That works for now. Thanks!
Reply all
Reply to author
Forward
0 new messages