This PR addresses several of the IPC issues raised in #24639. It is a substantial rewrite of sckipc.cpp, so my attempt at writing "atomic commits" may fall short. It may be more productive to look at the entire file, along with the individual commit messages and these notes:
wxIPCMessageBase largely replaces IPCStreams. IPCStreams was burying errors, and was not detecting loss of data sync. Both of these issues are fixed with wxIPCMessageBase.
For each IPCCode, there is a message derived from wxIPCMessageBase. The derived message has methods DataFromSocket() and DataToSocket(), which simplifies verifying the symmetry in reading and writing.
Reading and writing messages to the socket are guarded by critical sections, which ensures that a message is read or written to the socket without interruption (and therefore corruption) from another thread.
When the socket sends notification, there are potentially multiple messages waiting. The previous implementation would not process more than one message per notification. To fix this, the new Client_OnRequest implements a read-and-execute loop, which continues until there is no data left in the socket. A critical section ensures that Client_OnRequest is alone in receiving messages, and locks out others like Request() to avoid race conditions.
When a method expects a reply (such as wxConnection::Request()), a critical section prevents any new request until the current request is answered, which ensures that the reply correctly matches the request. Any non-requested messages sent before the reply (eg IPC_ADVISE) are processed, and the read loop continues until the desired reply type is found.
In the issue thread for #24639, it was mentioned that ERROR_ACCESS_DENIED (5) and/or WSAECONNABORTED (10053) were being triggered on the socket for wxMSW. I've determined these errors result from attempting to read (or peek) on a socket which has no data waiting. Microsoft, in its infinite wisdom, has determined that trying to read data on a socket that has no data deserves both ERROR_ACCESS_DENIED and WSAECONNABORTED, which sound much more dramatic than just reading (or peeking) on the socket with no queued data. As such, I've moved those errors into the category of wxSOCKET_WOULDBLOCK, rather than wxSOCKET_IOERR; Otherwise, the communication drops at the first attempted read of an empty socket.
sockmsw.cpp was also filtering out some FD_READ notifications, and this was causing some hangs in the IPC reads. That filter has been eliminated. A side effect is that notifications from sockmsw.cpp will now sometimes be notifying when the socket does not have queued data waiting. I confess I do not understand why an FD_READ notification is sent in such circumstances, but again, who am I to question the infinite wisdom of Microsoft.
Data returned from something like wxConnection::Request() had a potential issue if the caller tried to use the data after another wxConnection::Request() had been received. There was only one buffer, and the new request-reply would overwrite the first returned data. The new design creates an array, so that data and the allocations have a lifetime greater than one call.
I have tested the changes successfully with both the ipc sample as well as our company's new windows-service + client. Our service+client has loops which are sending Advise() and Request() back and forth with greater speed than the ipc sample. (Testing with our service also uncovered issue #24856.)
This PR strikes me as a large change, so let me know of any questions and any recommendations as to how to best proceed.
https://github.com/wxWidgets/wxWidgets/pull/24858
(4 files)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 2 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Update for the latest commits: I've added a test for IPC. There was some previous test code, which was disabled via #ifdef.
Some notes on the new test:
For the socket reads and writes to work, IPC needs two separate processes, not just two threads in the same process. This test implements an external process for the server, and then the main testing thread makes calls to that external server.
Catch macros will not work in the external process, so the wxIPC Request() method is used at the end of some tests fetch information from the server and then verify its state.
ipc.cpp is mostly rewritten, so it's probably more productive to read the new file directly rather than a number of incremental commits. Same for the new external server file sckipc_server.cpp
There are two variables in the beginning of ipc.cpp which are useful for development and manual testing. g_use_external_server = true turns on use of the external server. It can be set to false, in which case the test_sckipc_server should be started via CLI (or under gdb, for debugging.) g_show_message_timing = true causes the messages to be printed to std::cout, which gives you a sense of the message arrival times, and how they are interleaving when multiple threads are running. It defaults to false, ie quiet mode so as not to produce output during the automated tests.
The file ipc_setup_test.h is configured to allow only sockets for IPC. I have not tried it with DDE.
I am not familiar with bakefiles: There needs to be a modification of test.bkl to implement compilation of sckipc_server.cpp, since it needs to compile to test_sckipc_server$(EXE). I've added a commit with the tests/Makefile.in that I've been using for development, but I know that is the incorrect way to do it, and I'm hoping you'll be able to translate that into the necessary bakefile changes.
This has been a bunch of work, so I've updated authorship on the changed files. I hope that's OK.
All of the tests run successfully on wxMSW, but the most stringent test fails on wxGTK: IPC::AdviseAndRequestMultiThread. Specifically, it fails under Rocky9 linux running in VirtualBox.
I don't have a good handle on why this test fails on Linux. Since the skcipc code succeeds in stress-testing on Windows (setting MESSAGE_ITERATIONS to 5000, and running it 10 times), I'm beginning to suspect something in the linux code (wxSocketImplUnix etc) is causing hanging problems in a multithreaded environment. That will be the first place I'll dig when I have time, but let me know if there might be a better place to check.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Hmm, it appears the automated checks are not allowing a separate process to run. For example, under "Ubuntu 18.04 wxGTK 2", the error is execvp(test_sckipc_server) failed with error 2!, and under MSW vs2022 DLL Debug x64 the test is failing on REQUIRE( m_pid != 0 ), which is saying that wxExecute resulted in no pid for the execution.
Any guidance welcome.
PS. Is there any way to run the automated checks without adding to the pull request (so I don't spam your inboxes and can try various things?)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Hmm, it appears the automated checks are not allowing a separate process to run. For example, under "Ubuntu 18.04 wxGTK 2", the error is
execvp(test_sckipc_server) failed with error 2!, and underMSW vs2022 DLL Debug x64the test is failing onREQUIRE( m_pid != 0 ), which is saying that wxExecute resulted in no pid for the execution.Any guidance welcome.
I don't think there is anything preventing the tests from launching other processes, there must be some other problem, e.g. wrong path (especially under Windows, where the executable would fail to run if it can't find the DLLs).
PS. Is there any way to run the automated checks without adding to the pull request (so I don't spam your inboxes and can try various things?)
You can use https://github.com/nektos/act to test Linux workflows locally. And, of course, you can always create a PR to your own repository first, like this the workflows would run there too.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 2 commits.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
OK, I will have to back-burner this for a while, but I wanted to document the state of the code.
On wxMSW with mingw64: Configure with
../configure --with-msw --disable-shared --disable-precomp-headers --build=x86_64-pc-mingw64
Running tests IPC* will be error free.
On Unix: Configure with
../configure -enable-debug --disable-shared --disable-precomp-headers
Running tests IPC* will be error free (but note that one test is omitted, see below.)
Some of the github workflows with testing for Unix (ci.yml) will pass.
The final test IPC::AdviseAndRequestMultiThread does not pass on Unix; It is omitted via #ifdef wxMSW. As mentioned in a comment above, I believe there is a multithreading bug somewhere in the wxSocket code on Unix, because this test passes on Windows. The Unix sockets code would benefit from multithread test coverage, so writing this test would be my first step in solving this issue.
The github workflow for unix fails when --enable-monolithic is turned on. This failure is likely related to issue #24909. As documented in that issue, the wxAppConsole application is (probably) erroneously using the wxGUIEventLoop instead of wxConsoleEventLoop, which might be a bug in the initialization in wxAppTraits, as you noted in the issue.
The github workflow for unix also fails for --with-cxx=20 --enable-utf8 --enable-utf8only. I have not yet investigated this.
The remaining issues revolve around test_sckipc_server(.exe). The IPC tests require two processes, and the test program
has no facility for this, so we use wxExecute to run test_sckipc_server, which serves as the second process.
So to make the github workflows work across all platforms, there needs to be attention devoted to the build process for
Unfortunately, I don't have sufficient experience with these to build the necessary files; therefore compiling test_sckipc_server to pass the github workflows a stumbling block. I have not modified the bakefile test.bkl, again not something I have experience with.
As you mentioned in #24639, very few people use wxIPC at all and weren't surprised it was broken. (Although I'd note the fact that it is broken might be the reason it is avoided.) So I suppose it's a possibility to #ifdef 0 out the tests (like it is in the master currently) and merge it, and my sense is that it would be less broken than before. It would be good, however, to get test coverage, both for wxIPC as well as whatever is causing the multi-threading wxSocket bug.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
If, by chance, anyone is interested in fixing wxIPC, any help with reviewing, testing and finishing this PR would be welcome. Unfortunately I don't think I'm going to have time to do it in the observable future.
As an aside, I have a question about why do we need to use multiple processes: couldn't we use multiple threads instead, with the server running in its own one?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Apologies: I'm still buried, so fixing the tests for the various build issues is not possible for quite a while from me. I will note that we have been using this branch of wxIPC for several months in alpha testing, and so far it's good.
As an aside, I have a question about why do we need to use multiple processes: couldn't we use multiple threads instead, with the server running in its own one?
I don't believe it's possible for the same reason that all wxIPC testing had to be removed in the master branch: I originally mentioned in #24639 (comment)
The wxSocketImpl callbacks for the server socket and the client socket cannot run simultaneously in the same process. Because there is only one TCP port in play, whatever last callback installed will have priority, and therefore only the server or the client (but not both) will receive the notification for reading the socket, and only one will get access to the data in the socket buffer.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Apologies: I'm still buried, so fixing the tests for the various build issues is not possible for quite a while from me. I will note that we have been using this branch of wxIPC for several months in alpha testing, and so far it's good.
This is good to know, but we just can't merge this if it breaks CI :-(
As an aside, I have a question about why do we need to use multiple processes: couldn't we use multiple threads instead, with the server running in its own one?
I don't believe it's possible for the same reason that all wxIPC testing had to be removed in the master branch: I originally mentioned in #24639 (comment)
The wxSocketImpl callbacks for the server socket and the client socket cannot run simultaneously in the same process. Because there is only one TCP port in play, whatever last callback installed will have priority, and therefore only the server or the client (but not both) will receive the notification for reading the socket, and only one will get access to the data in the socket buffer.
Sorry, I don't understand at all: why couldn't the server and the client use different ports?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
This is good to know, but we just can't merge this if it breaks CI :-(
I hesitate to mention this, but testing in the original wxIPC implementation was removed via #if 0, ie it passed CI by not doing any testing. I don't like that solution obviously, and it is likely why there were bugs in the implementation that went unfixed for a long time.
why couldn't the server and the client use different ports?
Ya know, they probably could. I took a quick look back, and as near as I can tell I was simply copying the master branch as much as possible and took it as a given that the same port should be used. It will add a bunch of work in combining the testing files, and I can't guarantee that nothing will be missed by choosing a multithread model over a multiprocess model for testing, but it sure sounds sensible.
[I think the real work will eventually be in figuring out why IPC::AdviseAndRequestMultiThread failed under unix testing. I removed it via #ifdef wxMSW, but there is something squirrelly going on, my best guess was a multithreading wxSocket bug. I've been known to be wrong though. :) ]
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 4 commits.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 35 commits.
—
View it on GitHub or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@jpmattia pushed 1 commit.
—
View it on GitHub or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Picking this back up after a long detour. The short version: your pushback (vadz) on the two-process test structure was the nudge that got me to rework the harness once I had time. I kept a separate server process, but it's now just a re-exec of the same test binary rather than a separate executable, which is what cleared the build blocker. With that plus the threading fix below, the original build/threading blocker is resolved, and the test suite passes on the GUI and console builds across the supported toolchains, with one documented exception I'll cover below. (The Unix builds check is currently red, but that is the Ubuntu 18.04 "Set up build system" step failing across many open PRs right now. It looks like an EOL-container apt/setup issue in the CI infrastructure, not a build or test failure on this branch.)
The Unix/macOS failure was a genuine threading bug, and it is now fixed. Back in 2024 I suspected "something in the linux wxSocket code is causing hanging problems in a multithreaded environment"; that turned out to be on the right track. The 2024 design let worker threads do their socket I/O directly. On wxMSW that's harmless because notifications come through a main-thread, message-based mechanism, which is why it always passed there. On Unix/macOS, event-driven sockets deliver and re-arm through the process-global wxFDIODispatcher that the main loop iterates inside epoll_wait/kqueue, so worker-thread I/O was mutating that shared dispatcher concurrently with the main loop. That's the race behind IPC::AdviseAndRequestMultiThread (and the macOS ModifyFD "modifying unregistered handler?" assert).
The fix is to marshal all socket I/O to the main thread: worker transactions hand their work to the main thread (run inline if already there, otherwise CallAfter() + block the worker on a semaphore), so exactly one thread ever touches the fd, which is the same thread that owns it for event monitoring. The earlier Linux-specific band-aids are reverted now that the root cause is gone. It's contained in sckipc.cpp plus a test dispatch helper. I might flag this as a commit worth reading as a whole rather than as a diff.
On the multiple-processes question: the actual blocker wasn't the multi-process model per se. The second process really needed its own executable (test_sckipc_server), which I couldn't feasibly wire into bakefile/VS/cmake/Xcode. That's gone, and now the server is started by re-executing the same test binary with WX_IPC_TEST_SERVER set, so there's no second executable to build. The client still runs in the main Catch2 process and queries the server for state to verify it (Catch2 macros can't run in the server). The bakefile and generated build files are properly regenerated, and the test now also runs under test_gui.
The one documented test exclusion is wxQt (via __WXQT__). While testing I found that a cross-thread CallAfter() isn't reliably processed by the wxQt event loop (wxQtEventLoopBase::WakeUp() wakes the loop without posting a Qt event), which stalls server-pushed Advise(). That's a wxQt event-loop bug, not a wxIPC bug; I'm finishing a minimal repro and a fix on a separate branch, which I plan to send as its own PR down the road.
Other notes:
#if 0, ie. not testing anything. There's real multithreaded coverage now, and it runs clean under TSan, ASAN, and UBSAN.wxIPCSocketStreams layer entirely, replaced by wxIPCMessageBase. It supersedes the recent read-validation work in #26628: the new reader (wxIPCMessageBase) validates the read length in ReadSizeAndData()/VerifyLastReadCount().Also worth mentioning: this branch has been running in production in our windows_service application since the alpha I mentioned earlier, not just in testing.
I force-pushed a curated, logically-grouped history onto this branch (the discussion above is preserved). The early commits walk through the streams-to-messages restructuring step-by-step as a reading aid. They're meant to be read in sequence and don't each compile on their own. The code builds once the message interface and its handler accessor are in place (commit ce0014c), and the functional milestones after that (the marshal rewrite, the two-request fix, the connect-timeout bound, and the multithreaded test) are self-contained and compile on their own. If you'd rather every commit build for bisect, I'm happy to squash the early restructuring into a single commit, since it is broken up only for expository purposes.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()