event_t in Windows (and POSIX)

46 views
Skip to first unread message

Adriano dos Santos Fernandes

unread,
Aug 8, 2025, 10:29:28 PMAug 8
to firebir...@googlegroups.com
Hi!

The event_t struct seems not thread-safe in Windows, specially its event_count member:
    SLONG event_count;

It's not atomic and it's read/write without any synchronization, like for example:

SharedMemoryBase::eventPost:
    ++event->event_count;

event_blocked:
    if (event->event_count >= value)

In POSIX, event_blocked is also called outside synchronization in SharedMemoryBase::eventWait, so it must be problematic too.

AFAIK, there is no requirement that callers use events with outside synchronization.


Adriano

Vlad Khorsun

unread,
Aug 11, 2025, 7:38:33 AMAug 11
to firebir...@googlegroups.com
09.08.2025 5:29, Adriano dos Santos Fernandes:
I've asked myself many times about this and still can't find a case when it could
work not as expected. There are some recommendations of how to use it correctly (safely)
but they are not absolute requirements.

Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Aug 19, 2025, 6:40:40 AMAug 19
to firebir...@googlegroups.com
I created a test that is bugged only in Windows:
https://github.com/FirebirdSQL/firebird/blob/work/shared-memory-tests/src/common/tests/SharedMemoryTest.cpp

The test consists of multiple producers with a shared memory lock and a
single consumer without locking the same mutex, but using a pair of
events. This is the pattern used in the profiler and AFAIK even in the
lock manager.

The problem happens only with the block marked as "// Problem - begin" /
"// Problem - end", which is not necessary in the single-process test,
but would be with multi-process.

I verified that the problem happens in event_blocked function. Removing
calls to event_blocked and letting WaitForSingleObject always be called,
things works (very slow). And this looked very logical for me.

I tried to make all event_t members atomic, but the problem still
happens. I also tried to insert some memory fences, but no success.

Can you help fix it?


Adriano

Vlad Khorsun

unread,
Aug 19, 2025, 6:43:26 AMAug 19
to firebir...@googlegroups.com


19.08.2025 13:40, Adriano dos Santos Fernandes:
Will try, sure.

Regards,
Vlad

Vlad Khorsun

unread,
Aug 22, 2025, 7:01:40 AMAug 22
to firebir...@googlegroups.com
19.08.2025 13:40, Adriano dos Santos Fernandes:
I don't see LM use such pattern. IIRC, the only "new" code, written before your ProfilerIpc,
that works asynchronously is MappingIpc.
> The problem happens only with the block marked as "// Problem - begin" /
> "// Problem - end", which is not necessary in the single-process test,
> but would be with multi-process.
>
> I verified that the problem happens in event_blocked function. Removing
> calls to event_blocked and letting WaitForSingleObject always be called,
> things works (very slow). And this looked very logical for me.
>
> I tried to make all event_t members atomic, but the problem still
> happens. I also tried to insert some memory fences, but no success.
>
> Can you help fix it?
There is two problems in this test:

1. consumer() initialized its eventCounter with zero, while eventClear(receiverEvent) must be
used instead. But consumer() runs asynchronously, thus it is not safe to call eventClear() at
that place. eventCounter = 1 fixes this problem, but it relays on the fact that there is just
one thread that waits on receiverEvent. Note, first call of eventClear() with just initialized
event returns 1, not 0.

2. More hard problem happens due to non-synchronized run of consumer against producer's.
It (consumer) sometimes read the stale data in senderEvent and eventPost(senderEvent) failed
with ERROR_INVALID_HANDLE. To demonstrate and workaround it, change

if (sharedMemory.eventPost(&header.senderEvent) != FB_SUCCESS)
(Arg::Gds(isc_random) << " eventPost failed").raise();

at line 194, by

static HANDLE h = header.senderEvent.event_handle;
while (sharedMemory.eventPost(&header.senderEvent) != FB_SUCCESS)
{
DWORD error = GetLastError();

string err;
err.printf("eventPost(senderEvent) failed: error=%d, event_handle=%d, h=%d. Continue...",
error, header.senderEvent.event_handle, h);

if (h != header.senderEvent.event_handle || error == ERROR_INVALID_HANDLE)
{
BOOST_TEST_MESSAGE(err.c_str());
continue;
}

BOOST_FAIL(err.c_str());
}

I've run the test with these changes 50 times successfully. 10 runs printed message like:

eventPost(senderEvent) failed: error=6, event_handle=864, h=676. Continue...

Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Aug 22, 2025, 7:21:09 AMAug 22
to firebir...@googlegroups.com
On 8/22/25 08:01, Vlad Khorsun wrote:

>   There is two problems in this test:
>
> 1. consumer() initialized its eventCounter with zero, while
> eventClear(receiverEvent) must be
> used instead. But consumer() runs asynchronously, thus it is not safe to
> call eventClear() at
> that place. eventCounter = 1 fixes this problem, but it relays on the
> fact that there is just
> one thread that waits on receiverEvent. Note, first call of eventClear()
> with just initialized
> event returns 1, not 0.
>

Will take a look at this, thanks.


> 2. More hard problem happens due to non-synchronized run of consumer
> against producer's.
> It (consumer) sometimes read the stale data in senderEvent and
> eventPost(senderEvent) failed
> with ERROR_INVALID_HANDLE. To demonstrate and workaround it, change
>

Yes, sure, there is this concurrency problem due to non-synchronized
wrapper of Firebird around Win32 events. Not sure if the POSIX version
can be problematic in others architectures, but least in x64 it's not.

With direct Win32 events usage, WaitForSingleObject synchronizes (but is
very slow). The POSIX implementation also synchronizes.

So what we can do with the Firebird Windows implementation of event_t to
support this usage (but don't become so slow as WaitForSingleObject
makes it)?

My try was with fences and didn't work. AFAIK fences do nothing at
instruction-level in x86/x64, only at compiler-level.

Another try was with volatile and atomics together. Why don't the
consumer thread see the newly created handle if the handle is stored in
a atomic, since the consumer only run this part of the code when a
producer allows it?

Are there any other mechanism we can use?


Adriano

Adriano dos Santos Fernandes

unread,
Aug 23, 2025, 9:02:54 AMAug 23
to firebir...@googlegroups.com
I was able to reproduce the race condition with a pure Win32 API test
and fix the problems with proper usage of atomics seems to work.

I will do more tests and fix code in
https://github.com/FirebirdSQL/firebird/pull/8703

Thanks for your help, Vlad.


Adriano

Reply all
Reply to author
Forward
0 new messages