Deadlock test

18 views
Skip to first unread message

Adriano dos Santos Fernandes

unread,
Nov 24, 2022, 8:50:33 PM11/24/22
to firebir...@googlegroups.com
Hi!

While testing hang happening in #7385 (with database sent privately to
me), I verified that cause of it is due to plume leaving opened
connections when error happen.

The deadlock is reported in fb_lock_print, but real reason is not clear
for me. It happens for me only with many threads (around 256).

But then I created a test reproducing similar problem with only 4 threads.

But reason is still unclear for me.

Can someone take a look in
https://github.com/asfernandes/firebird/commit/2b66eb7cb798056e445cea059d3d4638aef92573
?

It seems hang happens when line "AST exception" is triggered, which
happens in most of runs, but the test code seems to be like all code
dealing with this in the engine.


Adriano

Vlad Khorsun

unread,
Nov 25, 2022, 4:41:10 AM11/25/22
to firebir...@googlegroups.com
I didn't run it but it seems lock in AST is not released because LCK_lock
in the another thread was not in time to return from LM and update lck_id.
I.e. it looks like following sequence of events

thread 1: call LCK_lock() and waits in LockManager::wait_for_request()
here lck_id == 0, lck_logical = 0

thread 2: call LCK_release() and granted lock request of thread 1
thread 1 still waits as it is not sheduled to run yet

thread 3: call LCK_lock() and post blocking AST to lock request of thread 1
blocking ast run and see lck_id == 0,
thus AsyncContextHolder() throws isc_unavalilable exception and
LCK_release() is not called


thread 1: sheduled to run, returns from LCK_lock() and exits
note, lock request of thread 1 is not released

thread 3: loops in wait_for_request() waiting for release of lock request of thread 1

main thread: wait thread 3 indefinitely


Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Nov 25, 2022, 5:26:38 AM11/25/22
to firebir...@googlegroups.com
And as attachment initially created in thread 1 is not used anymore,
everyone remains locked.

So, do we consider correct (as designed) that inactive (live, but not
used attachment) may cause deadlock, or do we have a bug?


Adriano

Vlad Khorsun

unread,
Nov 25, 2022, 5:39:40 AM11/25/22
to firebir...@googlegroups.com
Yes

> So, do we consider correct (as designed) that inactive (live, but not
> used attachment) may cause deadlock, or do we have a bug?

There is no deadlock in classical meaning (to make deadlock one need two locks)
and there is a bug in the test of course.

Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Nov 25, 2022, 5:46:55 AM11/25/22
to firebir...@googlegroups.com
On 25/11/2022 07:39, Vlad Khorsun wrote:
>> So, do we consider correct (as designed) that inactive (live, but not
>> used attachment) may cause deadlock, or do we have a bug?
>
>   There is no deadlock in classical meaning (to make deadlock one need
> two locks)

fb_lock_print report it as deadlock (0x40 - LRQ_deadlock | LRQ_scanned).


> and there is a bug in the test of course.
>

How engine deal with it different than the test code?

It lock things shared/exclusive, checkout, use AST to flag others to
release lock.

Or you mean that application code holding an attachment opened but not
using it is bugged and engine is ok to lock others attachments?


Adriano

Vlad Khorsun

unread,
Nov 25, 2022, 6:09:47 AM11/25/22
to firebir...@googlegroups.com
25.11.2022 12:46, Adriano dos Santos Fernandes wrote:
> On 25/11/2022 07:39, Vlad Khorsun wrote:
>>> So, do we consider correct (as designed) that inactive (live, but not
>>> used attachment) may cause deadlock, or do we have a bug?
>>
>>   There is no deadlock in classical meaning (to make deadlock one need
>> two locks)
>
> fb_lock_print report it as deadlock (0x40 - LRQ_deadlock | LRQ_scanned).

To correctly interpret fb_lock_print report one need to understand LM in depth.
Hint: LRQ_deadlock doesn't mean deadlock is detected. Also, when deadlock is
actually detected, it is reported and LCK throws an exception.

>> and there is a bug in the test of course.
>>
>
> How engine deal with it different than the test code?

Engine objects have a (lot of) flags that control locks, usually.

> It lock things shared/exclusive, checkout, use AST to flag others to
> release lock.
>
> Or you mean that application code holding an attachment opened but not
> using it is bugged and engine is ok to lock others attachments?

I mean that test is buggy. Note, engine code works OK for more than 20 years.

Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Nov 25, 2022, 6:17:30 AM11/25/22
to firebir...@googlegroups.com
Sorry, I understand you are super intelligent and feels annoyed by dumb
questions of less smart people, but #7385 test currently hangs the
engine. It's probably due to a feature implemented by one of these dumb
people, broken something that was working for 20 years.

If you cannot help, let's see if other can point in a good direction.

Thanks for your time.


Adriano

Vlad Khorsun

unread,
Nov 28, 2022, 3:03:08 PM11/28/22
to firebir...@googlegroups.com
You understand wrong.

> but #7385 test currently hangs the
> engine. It's probably due to a feature implemented by one of these dumb
> people, broken something that was working for 20 years.

It happens. Don't be like an offended kid.

> If you cannot help, let's see if other can point in a good direction.

See my recent comments at github, draft patch is here:

https://github.com/FirebirdSQL/firebird/issues/7385#issuecomment-1329674905

Regards,
Vlad

PS with similar patch the subj test is not hangs anymore.

Reply all
Reply to author
Forward
0 new messages