deadlock error?

632 views
Skip to first unread message

Nicola Segata

unread,
Feb 6, 2015, 5:41:56 AM2/6/15
to pytho...@googlegroups.com
Hi Eduardo,
 I'm getting a weird error when running a doit script (see below). The script was running perfectly until yesterday with 40 threads (-n 40). I haven't changed the script nor updated any system packages.

Basically now with -n 40 I get this error, and only 5 doit threads are running. If I use up to 6 threads the script runs perfectly. 

I'm on ubuntu 13.10 with python 2.7. The machine has 64 processors and 256GB of run. 

I understand I should post the script but it is quite complex and many input files are needed to run it. But maybe is this a known issue or there is a simple solution I cannot see?

many thanks
Nicola




Traceback (most recent call last):
 
File "/usr/lib/python2.7/dist-packages/doit/doit_cmd.py", line 105, in run
   
return self.sub_cmds[command].parse_execute(args)
 
File "/usr/lib/python2.7/dist-packages/doit/cmd_base.py", line 76, in parse_execute
   
return self.execute(params, args)
 
File "/usr/lib/python2.7/dist-packages/doit/cmd_base.py", line 266, in execute
   
return self._execute(**exec_params)
 
File "/usr/lib/python2.7/dist-packages/doit/cmd_run.py", line 181, in _execute
   
return runner.run_all(self.control.task_dispatcher())
 
File "/usr/lib/python2.7/dist-packages/doit/runner.py", line 237, in run_all
   
self.run_tasks(task_dispatcher)
 
File "/usr/lib/python2.7/dist-packages/doit/runner.py", line 370, in run_tasks
    proc_list
= self._run_start_processes(task_q, result_q)
 
File "/usr/lib/python2.7/dist-packages/doit/runner.py", line 347, in _run_start_processes
    next_node
= self.get_next_task(None)
 
File "/usr/lib/python2.7/dist-packages/doit/runner.py", line 329, in get_next_task
   
if self.select_task(node, self.tasks):
 
File "/usr/lib/python2.7/dist-packages/doit/runner.py", line 111, in select_task
   
if node.ignored_deps or self.dep_manager.status_is_ignore(task):
 
File "/usr/lib/python2.7/dist-packages/doit/dependency.py", line 449, in status_is_ignore
   
return self._get(task.name, "ignore:")
 
File "/usr/lib/python2.7/dist-packages/doit/dependency.py", line 155, in wrap
   
return func(self, key, *args)
 
File "/usr/lib/python2.7/dist-packages/doit/dependency.py", line 238, in get
    task_data
= self._dbm[task_id]
 
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in __getitem__
   
return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
 
File "/usr/lib/python2.7/bsddb/dbutils.py", line 68, in DeadlockWrap
   
return function(*_args, **_kwargs)
 
File "/usr/lib/python2.7/bsddb/__init__.py", line 270, in <lambda>
   
return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
DBPageNotFoundError: (-30986, 'BDB0075 DB_PAGE_NOTFOUND: Requested page not found')


Eduardo Schettino

unread,
Feb 6, 2015, 5:53:26 AM2/6/15
to python-doit
Looks like your `.doit.db` file is corrupted. Just remove it and let doit create another one.
If the error persist I have no idea and we will need to debug that.

Notice that depending on the DBM backend `.doit.db` might be more than 1 file (secondary files containing indexes).

cheers

--
You received this message because you are subscribed to the Google Groups "python-doit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-doit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicola Segata

unread,
Feb 6, 2015, 6:03:39 AM2/6/15
to pytho...@googlegroups.com
Thanks for the rapid answer! After remocing the .doit.db file everything worked perfectly! Great!
thanks again!
Nicola

Tom Varga

unread,
Feb 6, 2015, 8:44:05 AM2/6/15
to pytho...@googlegroups.com
Eduardo,

This brings up one of my biggest concerns about doit's db file.
I assume that deleting it results in every task having to be rebuilt.
Although I'm still developing our system, once done, it'll have 10s of thousands of tasks that takes thousands of cpu hours to build.  So a corrupted db file would be catastrophic.
What would be really nice and powerful would be to have an option to ask doit to 'rebuild' the db file's state of up-to-date-ness by using only timestamps for all existing targets and file_deps.  Without knowing which tasks were up-to-date before the corruption, this might make it possible to get back to the correct state as quickly as possible.

This might even be closely related to the recent threads about a mode that doesn't use md5.

Thanks,
-Tom

Eduardo Schettino

unread,
Feb 7, 2015, 3:23:54 AM2/7/15
to python-doit
On Fri, Feb 6, 2015 at 9:44 PM, Tom Varga <tomv...@gmail.com> wrote:
Eduardo,

This brings up one of my biggest concerns about doit's db file.
I assume that deleting it results in every task having to be rebuilt.
Although I'm still developing our system, once done, it'll have 10s of thousands of tasks that takes thousands of cpu hours to build.  So a corrupted db file would be catastrophic.

I never got a file corruption, anyway...
For most people doit db is not important but if it is for you, just create a backup or keep it in a VCS.

 
What would be really nice and powerful would be to have an option to ask doit to 'rebuild' the db file's state of up-to-date-ness by using only timestamps for all existing targets and file_deps.  Without knowing which tasks were up-to-date before the corruption, this might make it possible to get back to the correct state as quickly as possible.

I would accept a patch that does that. But you need to pay attention that a doit db file has more information than
timestamps and md5. It also contains a results from tasks, so just a marking a task as up-to-date could have bad
consequences and really give you some invalid DB that produce errors when you try to execute doit.
 

This might even be closely related to the recent threads about a mode that doesn't use md5.

Even if your "check" doesnt use MD5 you would need a backup to recover task results.

cheers,
   Eduardo
Reply all
Reply to author
Forward
0 new messages