Upgrading Cluster from 3.11.5 to Latest 3.14

112 views
Skip to first unread message

Indervir Singh

unread,
May 27, 2022, 12:34:48 AM5/27/22
to OpenQuake Users
Hello!

I am attempting to upgrade all of our clusters that are currently running version 3.11.5 using the cluster installation method that specifies a different installer for the headnode vs the compute nodes (python3-oq-engine-master/python3-oq-engine-worker). The apt repo states that this is the latest version for those two installations.

However, if I would like to get my clusters upgraded to v 3.14, should I be uninstalling python3-oq-engine-master from the headnode and python3-oq-engine-worker from the compute nodes; afterwhich using the universal installer to install on all servers? Will the cluster configuration still work? Or is the current most up to date version for clustered environment 3.11.5? 

Any assistance would be greatly apricated.

And apologies if this is already documented somewhere on the Git pages, I tried to hunt any useful info down but couldn't find a concrete method to upgrade clustered environments.

Thanks!
-Indervir

Michele Simionato

unread,
May 27, 2022, 1:04:53 AM5/27/22
to OpenQuake Users
We do not provide Debian packages for non LTS versions of the engine, so if you want to upgrade you need to use the universal installer. The good news is that there are no differences anymore between head node and compute node.The relevant documentation is here: https://github.com/gem/oq-engine/blob/master/doc/installing/cluster.md
 Configuring a cluster can be tricky so feel free to ask for help. 

        Michele Simionato

PS: we at GEM we are moving away from using a cluster and we recently bought a single machine more powerful than the old cluster. This is what we recommend nowadays.

Indervir Singh

unread,
May 31, 2022, 7:34:01 PM5/31/22
to OpenQuake Users
Hi Michele,

Thank you for your reply and guidance! I followed the documentation provide however now am receiving the following error when trying to execute any calculation (demo or our own):

Traceback (most recent call last):
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/views.py", line 547, in calc_run
    job_id = submit_job(request.FILES, ini, user, hazard_job_id)
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/views.py", line 568, in submit_job
    [job] = engine.create_jobs(
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/engine/engine.py", line 328, in create_jobs
    logs.init('job', dic, log_level, log_file, user_name, hc_id))
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/commonlib/logs.py", line 261, in init
    return LogContext(job_ini, calc_id, log_level, log_file, user_name, hc_id)
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/commonlib/logs.py", line 173, in __init__
    self.calc_id = dbcmd(
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/commonlib/logs.py", line 51, in dbcmd
    return res.get()
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/baselib/parallel.py", line 404, in get
    raise etype(msg)
sqlite3.OperationalError:
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/baselib/parallel.py", line 427, in new
    val = func(*args)
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/db/actions.py", line 117, in create_job
    return db('INSERT INTO job (?S) VALUES (?X)',
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/dbapi.py", line 335, in __call__
    raise exc.__class__('%s: %s %s' % (exc, templ, args))
OperationalError: attempt to write a readonly database: INSERT INTO job (id, is_running, description, user_name, calculation_mode, hazard_calculation_id, ds_calc_dir) VALUES (?, ?, ?, ?, ?, ?, ?) (1, 1, 'Calculation waiting to start', 'openquake', 'preclassical', None, '/home/openquake/oqdata/calc_1')
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/baselib/parallel.py", line 427, in new
    val = func(*args)
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/db/actions.py", line 117, in create_job
    return db('INSERT INTO job (?S) VALUES (?X)',
  File "/opt/openquake/venv/lib/python3.8/site-packages/openquake/server/dbapi.py", line 335, in __call__
    raise exc.__class__('%s: %s %s' % (exc, templ, args))
OperationalError: attempt to write a readonly database: INSERT INTO job (id, is_running, description, user_name, calculation_mode, hazard_calculation_id, ds_calc_dir) VALUES (?, ?, ?, ?, ?, ?, ?) (1, 1, 'Calculation waiting to start', 'openquake', 'preclassical', None, '/home/openquake/oqdata/calc_1')


I had tried playing around with the config file to see if having it resemble how our other cluster running version 3.11.5 would help, but nothing so far. Here is the config:

[distribution]
# enable celery only if you have a cluster
oq_distribute = zmq
# log level for jobs spawned by the WebAPI
log_level = info
serialize_jobs = true

[dbserver]
file = /opt/openquake/venv/db.sqlite3
# daemon bind address; must be a valid IP address
listen =  (using IP address of headnode server)
# address of the dbserver; can be an hostname too
# on multi-node cluster it must be the IP or hostname
# of the master node (on the master node cfg too)

host = (using hostname of headnode server)
port = 1908
receiver_ports = 1912-1920
authkey = somethingstronger

[webapi]
server = http://localhost:8800
username =
password =

[zworkers]
host_cores = xx.xx.xx.xx -1,  xx.xx.xx.xx   -1
ctrl_port = 1909

[directory]
shared_dir = /opt/openquake



As always, any assistance would be greatly appreciated.

Thank you!
-Indervir

Antonio Ettorre

unread,
Jun 3, 2022, 12:24:31 PM6/3/22
to OpenQuake Users
OperationalError: attempt to write a readonly database: INSERT INTO job (id, is_running, description, user_name, calculation_mode, hazard_calculation_id, ds_calc_dir) VALUES (?, ?, ?, ?, ?, ?, ?) (1, 1, 'Calculation waiting to start', 'openquake', 'preclassical', None, '/home/openquake/oqdata/calc_1')

From the upper It seems the engine don't use the right shared_dir folder, since it try to save on /home/openquake/oqdata and not on /opt/openquake that is configured on openquake.cfg
Check if you have other python3 process for old dbserver and kill all the processes before to start new dbserver from systemd 

Reply all
Reply to author
Forward
0 new messages