Hi all,
I just wanted to gently solicit a review again, with the understanding that everyone is busy! I believe that there is potentially something of value here for users of foreign callbacks.
Sincerely,
Kartik
From: Singh, Kartik S
Sent: Thursday, September 14, 2023 1:10 PM
To: 'sbcl-...@lists.sourceforge.net' <sbcl-...@lists.sourceforge.net>
Cc: 'do...@google.com' <do...@google.com>
Subject: [PATCH] Add feature to lazily clean up foreign callback threads
Hi all,
I have attached a patch proposing a feature called :fcb-lazy-thread-cleanup (open to other name suggestions!) that reduces the overhead of foreign callbacks by reusing a single thread struct for all the foreign callback invocations from a given native thread and cleaning up the thread struct lazily when the native thread exits. The patch also contains a separate commit that adds a new foreign callback benchmark.
I have built and tested the patch on x64 Windows, x64 Linux, and ARM64 Darwin. The patch also contains sample benchmark results for all three of these platforms. Here is x64 Linux (other platforms show similar speedups):
------------------------------------------
#+linux #+x86-64 #-fcb-lazy-thread-cleanup
------------------------------------------
$ LD_LIBRARY_PATH=. sh ../run-sbcl.sh --script fcb.lisp
[10,000,000 calls]
control (inline, no call)
elapsed time: 0.145976 seconds
regular C call (C -> C)
elapsed time: 0.157714 seconds
alien callback (C on Lisp thread -> Lisp)
elapsed time: 0.279785 seconds
foreign callback (C on foreign thread -> Lisp)
elapsed time: 142.108533 seconds
------------------------------------------
#+linux #+x86-64 #+fcb-lazy-thread-cleanup
------------------------------------------
$ LD_LIBRARY_PATH=. sh ../run-sbcl.sh --script fcb.lisp
[10,000,000 calls]
control (inline, no call)
elapsed time: 0.145932 seconds
regular C call (C -> C)
elapsed time: 0.157795 seconds
alien callback (C on Lisp thread -> Lisp)
elapsed time: 0.281898 seconds
foreign callback (C on foreign thread -> Lisp)
elapsed time: 8.629588 seconds
The design of this feature was originally outlined by Douglas Katzman in this mailing list thread: https://groups.google.com/g/sbcl-devel/c/3dSxGvF5oGo/. A new thread-specific data key called foreign_thread_ever_lispified is created with pthread_key_create when the main Lisp thread is created. Upon the first call into Lisp from a native thread, the thread struct produced by alloc_thread_struct is permanently associated with the foreign_thread_ever_lispified key using pthread_setspecific. When the native thread exits, the destructor function for foreign_thread_ever_lispified calls detach_os_thread on the associated thread struct.
To the best of my knowledge, I have addressed and documented all the problematic signal-related race conditions that arise when another thread initiates a GC after a foreign callback thread has exited but before it has finished running detach_os_thread in its destructor. I think these could be avoided altogether by inserting and removing the foreign callback thread from all_threads around every call into Lisp, but I could not get this to pass the tests.
Any feedback would be much appreciated!
Sincerely,
Kartik