openquake 2.0 hanging after submitting all tasks when using redis

40 views
Skip to first unread message

Rui

unread,
Aug 4, 2016, 11:05:40 PM8/4/16
to OpenQuake Users
Hi,

Recently I am trying to build openquake running above redis in a cluster environment. I used PointSourceClassicalPSHA demo as an example. The celery worker already reported the task has been successful but oq keeps hanging or waiting after submitting all tasks:

[2016-08-05 12:35:40,232 #6 INFO] Using engine version 2.1.0-git0ffe220
[2016-08-05 12:35:40,607 #6 INFO] Read 1936 hazard site(s)
[2016-08-05 12:35:40,767 #6 INFO] Parsed 1 sources from /openquake_test/demos/hazard/PointSourceClassicalPSHA/source_model.xml
[2016-08-05 12:35:41,350 #6 INFO] Processed source model 1/1 with 1 gsim path(s)
[2016-08-05 12:35:41,476 #6 INFO] Instantiated SourceManager with maxweight=0.5
[2016-08-05 12:35:41,593 #6 INFO] Filtering light sources
[2016-08-05 12:35:41,696 #6 INFO] Filtering heavy sources
[2016-08-05 12:35:41,937 #6 INFO] splitting <PointSource 2> of weight 2.0
[2016-08-05 12:35:42,085 #6 INFO] Submitting task classical #1
[2016-08-05 12:35:42,236 #6 INFO] Sent 1 sources in 1 block(s)
[2016-08-05 12:35:42,378 #6 INFO] Sent 43.89 KB of data in 1 task(s)

After manually cancelling the process it shows cannot connect to the db server:

^CTraceback (most recent call last):
  File "/bin/oq", line 9, in <module>
    load_entry_point('openquake.engine', 'console_scripts', 'oq')()
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/commands/__main__.py", line 41, in oq
    parser.callfunc()
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/commonlib/sap.py", line 177, in callfunc
    return self.func(**vars(namespace))
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/commonlib/sap.py", line 232, in main
    return func(**kw)
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/commands/engine.py", line 175, in engine
    exports, hazard_calculation_id=hc_id)
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/commands/engine.py", line 68, in run_job
    hazard_calculation_id=hazard_calculation_id)
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/engine/engine.py", line 185, in run_calc
    logs.LOG.critical(tb)
  File "/lib/python2.7/logging/__init__.py", line 1204, in critical
    self._log(CRITICAL, msg, args, **kwargs)
  File "/lib/python2.7/logging/__init__.py", line 1278, in _log
    self.handle(record)
  File "/lib/python2.7/logging/__init__.py", line 1288, in handle
    self.callHandlers(record)
  File "/lib/python2.7/logging/__init__.py", line 1328, in callHandlers
    hdlr.handle(record)
  File "/lib/python2.7/logging/__init__.py", line 751, in handle
    self.emit(record)
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/engine/logs.py", line 133, in emit
    record.getMessage())
  File "/oq-engine/2.0.0_src_install/src/oq-engine/openquake/engine/logs.py", line 54, in dbcmd
    raise RuntimeError('Cannot connect on %s:%s' % config.DBS_ADDRESS)
RuntimeError: Cannot connect on 127.0.0.1:1999

As there is no problem running this demo by using rabbitmq on the same cluster environment I am wondering whether openquake still supports redis? If so is there any configuration I missed to make it working?  Thanks a lot.

Regards,
Rui

Michele Simionato

unread,
Aug 5, 2016, 2:23:14 AM8/5/16
to OpenQuake Users
The error means that the dbserver is not started, see
https://github.com/gem/oq-engine/blob/engine-2.0/doc/installing/ubuntu.md

Will the engine work with redis? It used to work years ago, but we did no try it recently, so you are on your own.

Rui

unread,
Aug 5, 2016, 3:15:26 AM8/5/16
to OpenQuake Users
Hi Michele,

Many thanks for your reply. I believe the engine works with redis as the celery worker reported tasks are successful. However, the result or progress can not be returned to oq due the issue I mentioned. I can see the observer already started at port 1999 and created hdf5 and sqlite3 files under oqdata directory.
While using rabbitmq, I can see several established connections and data transfer between oq and dbserver ports. But in the case of using redis, the dbserver port kept in TIME_WAIT status and there is no connection established with oq after submitting all tasks.
I am thinking there is some wrong in the configuration of redis give rise to this problem. 

Also I can not find the db server log files although I already specify it in openquake.conf (even when I use rabbitmq to run the job successfully).  Is there any way to check the log message of dbserver? 

Thanks.

Regards,
Rui
Reply all
Reply to author
Forward
0 new messages