Openstack network format parameter

Tomasz Skowron

unread,

Feb 10, 2021, 4:27:50 PM2/10/21

to elasticluster

Hi,

One hurdle with the on-premises opentack sorted, another one comes up.

In the docs, I could not find information as to where and under which key to specify uuid of the network to use, resulting in error:

ent.exceptions.BadRequest'>
Traceback (most recent call last):
File "elasticluster/cluster.py", line 580, in _start_node
    node.start()
File "elasticluster/cluster.py", line 1319, in start
    **self.extra)
File "elasticluster/providers/openstack.py", line 578, in start_instance
    vm = self.nova_client.servers.create(node_name, image_id, flavor, **vm_start_args)
File "/usr/local/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1481, in create
    return self._boot(response_key, *boot_args, **boot_kwargs)
File "/usr/local/lib/python2.7/site-packages/novaclient/v2/servers.py", line 846, in _boot
    return_raw=return_raw, **kwargs)
File "/usr/local/lib/python2.7/site-packages/novaclient/base.py", line 364, in _create
    resp, body = self.api.client.post(url, body=body)
File "/usr/local/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 392, in post
    return self.request(url, 'POST', **kwargs)
File "/usr/local/lib/python2.7/site-packages/novaclient/client.py", line 78, in request
    raise exceptions.from_response(resp, body, url, method)
BadRequest: Bad network format: missing 'uuid' (HTTP 400) (Request-ID: req-4de2dd7e-8b49-41c5-bd70-e233bf942e58)
2021-02-10 21:25:34 002d2fc41e0b elasticluster[1] ERROR Could not start node `compute001`: Bad network format: missing 'uuid' (HTTP 400)

Is there anyone with a known solution to this?

Tomasz Skowron

unread,

Feb 10, 2021, 4:42:13 PM2/10/21

to elasticluster

to be more precise, network_ids has been used in the cluster section, as per the documentation:

https://elasticluster.readthedocs.io/en/latest/configure.html?highlight=network#overridable-configuration-keys

[cluster/slurm]
cloud=openstack_HPC1
network_ids=9d702630-5e7a-4ef6-a60a-ffe23d4f5826
login=ubuntu
setup=slurm
frontend_nodes=1
compute_nodes=4
ssh_to=frontend
security_group=default
image_id=7d849dea-c90a-4231-9afb-50199385a4c4
flavor=200GB_16GB_RAM_4CPU
[setup/slurm]
frontend_groups=slurm_master
compute_groups=slurm_worker
~

Riccardo Murri

unread,

Feb 10, 2021, 5:24:01 PM2/10/21

to Tomasz Skowron, elasticluster

Hello Tomasz,

Specifying `network_ids` in the configuration file is the correct way of doing it.

Can you please re-run ElastiCluster adding a `-vvvv` option, so as to get DEBUG-level output?

Something like this:

    elasticluster -vvvv start slurm

If the config is correctly read, you should see lines like this in the output:

    Specifying networks for node compute001: 9d702630-5e7a-4ef6-a60a-ffe23d4f5826

Ciao,

R

Tomasz Skowron

unread,

Feb 10, 2021, 5:27:51 PM2/10/21

to elasticluster

appears to be similar to:

https://github.com/elasticluster/elasticluster/issues/584, however in my case I am using the correct ID (for the network).

openstack version: ussuri, would perhaps the client version from elasticluster be incompatible?

Tomasz Skowron

unread,

Feb 10, 2021, 5:30:45 PM2/10/21

to elasticluster

Data is not bein populated at all...

[tomaszs@elasticluster elasticluster]$ ./elasticluster.sh -vvvv start slurm | grep "Specifying networks for node"

2021-02-10 22:28:58 aa67417d8b1a elasticluster[1] DEBUG Specifying networks for node slurm-compute002:
2021-02-10 22:28:58 aa67417d8b1a elasticluster[1] DEBUG Specifying networks for node slurm-compute001:
2021-02-10 22:28:58 aa67417d8b1a elasticluster[1] DEBUG Specifying networks for node slurm-compute004:
2021-02-10 22:28:58 aa67417d8b1a elasticluster[1] DEBUG Specifying networks for node slurm-frontend001:
2021-02-10 22:28:58 aa67417d8b1a elasticluster[1] DEBUG Specifying networks for node slurm-compute003:

Kind Regards,

Tomasz

Message has been deleted

Tomasz Skowron

unread,

Feb 10, 2021, 8:16:34 PM2/10/21

to elasticluster

After a bit more digging and tidy up of the .elasticluster/config file, minimalist working version below.

#
[cloud/openstack_HPC1]
provider=openstack

[login/ubuntu]
image_user=ubuntu
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=~/.ssh/id_rsa
user_key_public=~/.ssh/id_rsa.pub

[cluster/slurm1]
cloud=openstack_HPC1
network_ids=b5a0acca-646b-4a19-bb95-19c69caad085
login=ubuntu
setup=slurm
frontend_nodes=1
compute_nodes=1
ssh_to=frontend
security_group=default
image_id=7d849dea-c90a-4231-9afb-50199385a4c4
flavor=200GB_16GB_RAM_4CPU

[setup/slurm]
frontend_groups=slurm_master
compute_groups=slurm_worker

change: less empty spaces (perhaps a weirg endline?) and renaming of the cluster to slurm1, allowing for all previosuly cached config elements to be cleared, successfully deploying a scaled down version on ussuri.

Riccardo Murri

unread,

Feb 12, 2021, 3:00:22 AM2/12/21

to Tomasz Skowron, elasticluster

On Wed, Feb 10, 2021 at 11:30 PM 'Tomasz Skowron' via elasticluster <elasti...@googlegroups.com> wrote:

Data is not bein populated at all...

I think this is fixed already in the sources, I'll try rebuilding the Docker image and then ask you to try again.

Thanks,

Riccardo

Riccardo Murri

unread,

Feb 12, 2021, 10:38:43 AM2/12/21

to Tomasz Skowron, elasticluster

I have just pushed an updated Docker image, can you please run:

    ./elasticluster.sh --latest

and then try again starting the cluster?  You may need to destroy it first (to remove crrupted state files):

    elasticluster stop -y slurm1

Let me know!

Thanks,

Riccardo

Tomasz Skowron

unread,

Feb 12, 2021, 10:43:58 AM2/12/21

to elasticluster

Hi,

did the update,

Looks like it runs with a few warnings, but these are relatively minor:

[tomaszs@elasticluster elasticluster]$ ./elasticluster.sh start slurm1
2021-02-12 15:41:25 771267cb1951 elasticluster[1] WARNING FutureWarning: OpenStack user domain name is present both in the environment and the config file. Environment variable OS_USER_DOMAIN_NAME takes precedence, but this may change in the future.
2021-02-12 15:41:25 771267cb1951 elasticluster[1] WARNING FutureWarning: OpenStack project domain name is present both in the environment and the config file. Environment variable OS_PROJECT_DOMAIN_NAME takes precedence, but this may change in the future.
Starting cluster `slurm1` with:
* 1 frontend nodes.
* 4 compute nodes.
(This may take a while...)
Version 2 is deprecated, use alternative version 3 instead.
Version 2 is deprecated, use alternative version 3 instead.
Version 2 is deprecated, use alternative version 3 instead.
Version 2 is deprecated, use alternative version 3 instead.
Version 2 is deprecated, use alternative version 3 instead.
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 54698)>
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=22, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 46020)>
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=23, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 59846)>
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=24, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 58998)>
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=25, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 42158)>
2021-02-12 15:42:16 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=27, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 46030)>
2021-02-12 15:42:26 771267cb1951 elasticluster[1] WARNING ResourceWarning: unclosed <socket.socket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('172.17.0.2', 59008)>
Configuring the cluster ...
(this too may take a while)
[WARNING]: Could not match supplied host pattern, ignoring: pvfs2
[WARNING]: Could not match supplied host pattern, ignoring: pvfs2_data
[WARNING]: Could not match supplied host pattern, ignoring: pvfs2_meta
[WARNING]: Could not match supplied host pattern, ignoring: pvfs2_client
[WARNING]: Could not match supplied host pattern, ignoring: ipython_controller
[WARNING]: Could not match supplied host pattern, ignoring: ipython_engine
[WARNING]: Could not match supplied host pattern, ignoring: jenkins