cwx 3.4.0 provision: no more channels available

3 views
Skip to first unread message

srouberol

unread,
Apr 26, 2008, 3:51:32 AM4/26/08
to Linux Networx Users Group
Hi guys
great idea to create this group!
I currently manage a lnxi cluster (around 300 cores) at www.iap.fr and
my main concerns are hardware issues.

I have a cwx 3.4.0. question though: recently, when i tried to
provision a node, I got

"No more channels available" error.

Does anybody know what is the problem?

I found a workaround by adding more resources in
/opt/cwx/etc/DistributionService.profile & /opt/cwx/etc/system-
Clusterworx.profile and restarting cwx,
but I wonder why I had to do that, because I did not change my config
since a long time.

Regards,
Stephane Rouberol

Gilles

unread,
Apr 26, 2008, 11:43:03 AM4/26/08
to Linux Networx Users Group
Stephane,

at a given time, there can be up to ten images ( default value ) in
the air.

so if you are using non versionned images on several nodes, you can
exhaust the available
distribution channels

running
grep x-slam /etc/dhcpd.conf | sort -u | wc -l
should indicate how many different images are currently being used

you can also
ls -l /opt/cwx/distribution
and check the symlinks, this will indicate which images and payloads
( name and revision ) are currently being used

Best regards,

Gilles

srouberol

unread,
Apr 26, 2008, 1:01:43 PM4/26/08
to Linux Networx Users Group


On Apr 26, 5:43 pm, Gilles <gilles.gouaillar...@gmail.com> wrote:
> Stephane,
>
Hi Gilles
thanx for answering

> at a given time, there can be up to ten images ( default value ) in
> the air.
>
OK
I suspect I can handle more though, now that I have modified
/opt/cwx/etc/DistributionService.profile & /opt/cwx/etc/system-
Clusterworx.profile.

> so if you are using non versionned images on several nodes, you can
> exhaust the available
> distribution channels

I am using only versionned images (non versionned images are too long
to provision :( )
>
> running
> grep x-slam /etc/dhcpd.conf | sort -u | wc -l
> should indicate how many different images are currently being used
>
apparently only 6:

master:~ # grep x-slam /etc/dhcpd.conf | sort -u
filename "x-slam://192.168.10.250:10002/239.192.0.129:10002";
filename "x-slam://192.168.10.250:10004/239.192.0.130:10004";
filename "x-slam://192.168.10.250:10006/239.192.0.131:10006";
filename "x-slam://192.168.10.250:10008/239.192.0.132:10008";
filename "x-slam://192.168.10.250:10010/239.192.0.133:10010";
filename "x-slam://192.168.10.250:10014/239.192.0.135:10014";

> you can also
> ls -l /opt/cwx/distribution
> and check the symlinks, this will indicate which images and payloads
> ( name and revision ) are currently being used
>

master:~ # ls -l /opt/cwx/distribution
total 8
drwxr-xr-x 2 root root 4096 Apr 21 11:15 .
drwxr-xr-x 19 root root 4096 Apr 24 12:57 ..
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-00 -> /opt/cwx/
provisioning/cache/Node-65----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 09:48 provisioning-01 -> /opt/cwx/
provisioning/cache/Node-52--.payload
lrwxrwxrwx 1 root root 42 Apr 22 09:48 provisioning-02 -> /opt/cwx/
provisioning/cache/MDS-48----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-03 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-04 -> /opt/cwx/
provisioning/cache/SERV-24----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 09:48 provisioning-05 -> /opt/cwx/
provisioning/cache/SERV-25--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-06 -> /opt/cwx/
provisioning/cache/IO-50----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-07 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-08 -> /opt/cwx/
provisioning/cache/DB-23----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-09 -> /opt/cwx/
provisioning/cache/DB-25--.payload
lrwxrwxrwx 1 root root 44 Apr 22 10:20 provisioning-10 -> /opt/cwx/
provisioning/cache/Login-60----.ebi
lrwxrwxrwx 1 root root 46 Apr 22 10:20 provisioning-11 -> /opt/cwx/
provisioning/cache/Login-70--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-14 -> /opt/cwx/
provisioning/cache/Node-72----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-15 -> /opt/cwx/
provisioning/cache/Node-59--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-16 -> /opt/cwx/
provisioning/cache/Node-60----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-17 -> /opt/cwx/
provisioning/cache/Node-47--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-18 -> /opt/cwx/
provisioning/cache/Node-68----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-19 -> /opt/cwx/
provisioning/cache/Node-55--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-24 -> /opt/cwx/
provisioning/cache/Node-69----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-25 -> /opt/cwx/
provisioning/cache/Node-56--.payload


master:~ # ls -Ll /opt/cwx/distribution
/bin/ls: /opt/cwx/distribution/provisioning-00: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-01: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-16: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-17: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-18: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-19: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-24: No such file or
directory
/bin/ls: /opt/cwx/distribution/provisioning-25: No such file or
directory
total 4938932
drwxr-xr-x 2 root root 4096 Apr 21 11:15 .
drwxr-xr-x 19 root root 4096 Apr 24 12:57 ..
-r--r--r-- 1 root root 8348096 Apr 21 09:38 provisioning-02
-rw-r--r-- 1 root root 301218969 Apr 21 09:36 provisioning-03
-r--r--r-- 1 root root 8533453 Apr 21 09:42 provisioning-04
-rw-r--r-- 1 root root 409720195 Apr 21 09:41 provisioning-05
-r--r--r-- 1 root root 8389312 Apr 21 09:37 provisioning-06
-rw-r--r-- 1 root root 301218969 Apr 21 09:36 provisioning-07
-r--r--r-- 1 root root 8510060 Apr 21 09:44 provisioning-08
-rw-r--r-- 1 root root 309072485 Apr 21 09:43 provisioning-09
-r--r--r-- 1 root root 8501290 Apr 21 09:55 provisioning-10
-rw-r--r-- 1 root root 2243515331 Apr 21 09:54 provisioning-11
-r--r--r-- 1 root root 8785594 Apr 21 11:15 provisioning-14
-rw-r--r-- 1 root root 1436614106 Apr 21 10:18 provisioning-15

Do you see any problem here?
Any recommendation for a cleanup?

Thanx
Stephane

> Best regards,
>
> Gilles

Cameron Harr

unread,
Apr 26, 2008, 1:42:05 PM4/26/08
to lnx...@googlegroups.com
Stephane,
Sometimes, I would remove all the files in /opt/cwx/provisioning/cache,
allowing them to rebuild when doing a provision and that has helped me
before.
Cameron

Gilles

unread,
Apr 26, 2008, 2:04:33 PM4/26/08
to Linux Networx Users Group
Stephane,

as far as i understand, /opt/cwx/distribution is rebuilt automatically
each time a node is provisionned.
you can definetly remove the broken links, you should only need
provisioning-[02-09,14-15] indeed.

if you remove /opt/cwx/provisioning/cache directory content and a node
crashes, it will not be reprovisionned automatically.

you can run
dbix -x | grep .image:provisioning-

if you see numbers different than 02,04,06,08 or 14 this means there
is something really wrong in the cwx database

otherwise, having removed the unused provisioning-xx links should have
fixed your problems

Best regards,

Gilles

srouberol

unread,
Apr 26, 2008, 5:52:00 PM4/26/08
to Linux Networx Users Group


On Apr 26, 8:04 pm, Gilles <gilles.gouaillar...@gmail.com> wrote:
> Stephane,
>
> as far as i understand, /opt/cwx/distribution is rebuilt automatically
> each time a node is provisionned.
> you can definetly remove the broken links, you should only need
> provisioning-[02-09,14-15] indeed.

ok done
master:/opt/cwx/distribution # ls -l
total 8
drwxr-xr-x 2 root root 4096 Apr 26 23:39 .
drwxr-xr-x 19 root root 4096 Apr 24 12:57 ..
lrwxrwxrwx 1 root root 42 Apr 22 09:48 provisioning-02 -> /opt/cwx/
provisioning/cache/MDS-48----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-03 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-04 -> /opt/cwx/
provisioning/cache/SERV-24----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 09:48 provisioning-05 -> /opt/cwx/
provisioning/cache/SERV-25--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-06 -> /opt/cwx/
provisioning/cache/IO-50----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-07 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-08 -> /opt/cwx/
provisioning/cache/DB-23----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-09 -> /opt/cwx/
provisioning/cache/DB-25--.payload
lrwxrwxrwx 1 root root 44 Apr 22 10:20 provisioning-10 -> /opt/cwx/
provisioning/cache/Login-60----.ebi
lrwxrwxrwx 1 root root 46 Apr 22 10:20 provisioning-11 -> /opt/cwx/
provisioning/cache/Login-70--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-14 -> /opt/cwx/
provisioning/cache/Node-72----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-15 -> /opt/cwx/
provisioning/cache/Node-59--.payload



>
> if you remove /opt/cwx/provisioning/cache directory content and a node
> crashes, it will not be reprovisionned automatically.
>
> you can run
> dbix -x | grep .image:provisioning-
>
> if you see numbers different than 02,04,06,08 or 14 this means there
> is something really wrong in the cwx database

master:/opt/cwx/distribution # dbix -x | grep .image:provisioning- |
awk -F: '{print $2}' | sort -u
dbix - Version 1.2.1 Build 4
Copyright (c) 2006 Linux Networx Incorporated.
XeroOne Framework
Copyright (c) 1998-2005 Curtis Smith. All rights reserved.
=============================================================================
provisioning-00
provisioning-02
provisioning-04
provisioning-06
provisioning-08
provisioning-10
provisioning-14
provisioning-16
provisioning-18
provisioning-24

the cwx database seems indeed to have extra unneeded items :(
but I don't feel comfortable with dbix --delete ...

Stephane

>
> otherwise, having removed the unused provisioning-xx links should have
> fixed your problems
>
> Best regards,
>
> Gilles

BTW thanx Cameron for your suggestion about cleaering the cwx cache,
But I tried before and it had no effect on the provision error about
channels.

srouberol

unread,
Apr 26, 2008, 5:52:21 PM4/26/08
to Linux Networx Users Group


On Apr 26, 8:04 pm, Gilles <gilles.gouaillar...@gmail.com> wrote:
> Stephane,
>
> as far as i understand, /opt/cwx/distribution is rebuilt automatically
> each time a node is provisionned.
> you can definetly remove the broken links, you should only need
> provisioning-[02-09,14-15] indeed.

ok done
master:/opt/cwx/distribution # ls -l
total 8
drwxr-xr-x 2 root root 4096 Apr 26 23:39 .
drwxr-xr-x 19 root root 4096 Apr 24 12:57 ..
lrwxrwxrwx 1 root root 42 Apr 22 09:48 provisioning-02 -> /opt/cwx/
provisioning/cache/MDS-48----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-03 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 43 Apr 22 09:48 provisioning-04 -> /opt/cwx/
provisioning/cache/SERV-24----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 09:48 provisioning-05 -> /opt/cwx/
provisioning/cache/SERV-25--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-06 -> /opt/cwx/
provisioning/cache/IO-50----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-07 -> /opt/cwx/
provisioning/cache/IO-15--.payload
lrwxrwxrwx 1 root root 41 Apr 22 10:20 provisioning-08 -> /opt/cwx/
provisioning/cache/DB-23----.ebi
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-09 -> /opt/cwx/
provisioning/cache/DB-25--.payload
lrwxrwxrwx 1 root root 44 Apr 22 10:20 provisioning-10 -> /opt/cwx/
provisioning/cache/Login-60----.ebi
lrwxrwxrwx 1 root root 46 Apr 22 10:20 provisioning-11 -> /opt/cwx/
provisioning/cache/Login-70--.payload
lrwxrwxrwx 1 root root 43 Apr 22 10:20 provisioning-14 -> /opt/cwx/
provisioning/cache/Node-72----.ebi
lrwxrwxrwx 1 root root 45 Apr 22 10:20 provisioning-15 -> /opt/cwx/
provisioning/cache/Node-59--.payload



>
> if you remove /opt/cwx/provisioning/cache directory content and a node
> crashes, it will not be reprovisionned automatically.
>
> you can run
> dbix -x | grep .image:provisioning-
>
> if you see numbers different than 02,04,06,08 or 14 this means there
> is something really wrong in the cwx database

master:/opt/cwx/distribution # dbix -x | grep .image:provisioning- |
awk -F: '{print $2}' | sort -u
dbix - Version 1.2.1 Build 4
Copyright (c) 2006 Linux Networx Incorporated.
XeroOne Framework
Copyright (c) 1998-2005 Curtis Smith. All rights reserved.
=============================================================================
provisioning-00
provisioning-02
provisioning-04
provisioning-06
provisioning-08
provisioning-10
provisioning-14
provisioning-16
provisioning-18
provisioning-24

the cwx database seems indeed to have extra unneeded items :(
but I don't feel comfortable with dbix --delete ...

Stephane

>
> otherwise, having removed the unused provisioning-xx links should have
> fixed your problems
>
> Best regards,
>
> Gilles

srouberol

unread,
Apr 29, 2008, 11:38:22 AM4/29/08
to Linux Networx Users Group
For the record the "no more channels available" error was due to
useless entries in cwx database
eating channels resources

the solution was to run a "fake" provision command to update
dhcpd.conf, cwx db & /opt/cwx/distribution links
with no real provision on faulty nodes registered with old images

cwprovision -i Node -n <faulty_nodes>

JustinW

unread,
May 7, 2008, 4:11:46 PM5/7/08
to Linux Networx Users Group
FYI, you can add more channels manually in 3.x by editing
DistributionService.profile (It's either in /opt/cwx/sys or /opt/cwx/
etc, can't remember). Just copy the lines that are there, and make
sure to increment the port numbers and multicast addresses. I had to
do this on one of my systems to go from 10 to 20 channels. Also, I
think you need to restart CWX after you make this change. Hope this
helps.

-Justin.

srouberol

unread,
May 8, 2008, 6:27:20 AM5/8/08
to Linux Networx Users Group
Thanx for your answer
I as I said in my first mail, I did found a temporary workaround by
increasing available channels.
I had to modify both /opt/cwx/etc/DistributionService.profile & /opt/
cwx/etc/system-
Clusterworx.profile though.
New channels were indeed available after a cwx restart.
But it turned out that unused channels were unavailable, due
to a corrupted cwx databse, and cwxprovision -n was the real cure.
Stephane
Reply all
Reply to author
Forward
0 new messages