[Sbcl-devel] Binary release sb-thread policy

Brian Mastenbrook

unread,

Oct 31, 2008, 10:56:07 PM10/31/08

to SBCL Devel-list

Which platforms should have SB-THREAD enabled in binary releases? When
I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
FreeBSD, and Solaris). Is this in line with what other maintainers do,
and is it a reasonable policy? Are threads on Darwin stable enough
that releases should be built with threads?
--
Brian Mastenbrook
br...@mastenbrook.net
http://brian.mastenbrook.net/

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

James Y Knight

unread,

Oct 31, 2008, 11:04:12 PM10/31/08

to SBCL Devel-list

On Oct 31, 2008, at 10:56 PM, Brian Mastenbrook wrote:
> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?

IMO the binary releases should always be built with the default
settings, and the default build settings should be adjusted so that
threads are enabled on platforms where they are considered release-
worthy.

James

Daniel Pezely

unread,

Nov 2, 2008, 5:49:27 PM11/2/08

to Brian Mastenbrook, SBCL Devel-list

Brian Mastenbrook wrote:
> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?

Please enable threads by default on Darwin and FreeBSD.

Threads on MacOSX 10.5 and FreeBSD 7.0 (both x86-64) seem very stable,
including under certain type of sustainable stress/load testing.

I emphasize "sustainable" because we've seen problems with certain
library combinations, which are outside the scope of your question.

For completeness, however, here's what we experienced:

Using current SBCL as of June and July, issues we saw were likely due
to Hunchentoot's lack of thread pool on SBCL. Looking at its port-
sbcl.lisp file, each inbound HTTP request creates a new thread
expected to be garbage collected upon completion of that one request.
You know where this is going...

Under heavy stress (both load and capacity) testing we'd eventually
exhaust the heap (e.g., sustained 500-2000 requests per second,
parallel requests, etc).

Considering each layer in this chain (libthr + Hunchentoot + its SBCL
port + HTTP protocol closing of socket + TCP/IP connection shutdown
lag + releasing thread related structures + gc finally taking action),
there may simply be not enough time available for gc when under heavy
load with tests driven from secondary hosts on local 100mb network.
Pausing the tests and allowing for clean-up then starting again DID
help in some tests.

We were developing on 15" MacOSX laptops (santa rosa chip sets; dual-
core), and production hosts were running FreeBSD 7.0 (current Xeon;
dual quad-core).

Some tests with comparable results were seen on Debian Linux and
CentOS/RHEL5 (same models of production hardware).

The only Lisp systems which DID NOT suffer heap exhaustion under
comparable load were AllegroCL and Clojure, neither of which use
libthr. We used SBCL, CCL and I believe, LispWork's free version.
Scieneer's free version wasn't available then and still doesn't run on
MacOSX or FreeBSD.

I'm trying to get the guy who ran the tests to provide more detail...

-Daniel

Nikodemus Siivola

unread,

Nov 4, 2008, 5:39:25 AM11/4/08

to Daniel Pezely, SBCL Devel-list, Brian Mastenbrook

On Mon, Nov 3, 2008 at 12:49 AM, Daniel Pezely <dpe...@gmail.com> wrote:
> Brian Mastenbrook wrote:
>> Which platforms should have SB-THREAD enabled in binary releases? When
>> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
>> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
>> FreeBSD, and Solaris). Is this in line with what other maintainers do,
>> and is it a reasonable policy? Are threads on Darwin stable enough
>> that releases should be built with threads?
>
>
> Please enable threads by default on Darwin and FreeBSD.
>
> Threads on MacOSX 10.5 and FreeBSD 7.0 (both x86-64) seem very stable,
> including under certain type of sustainable stress/load testing.

I am reluctant to enable threads on Darwin by default as long as the
test suite fares as badly as it does. Assuming there are no such
problems on FreeBSd, that's fine by me. On Linux building with threads
by default seems like a sensible thing.

Cheers,

-- Nikodemus

Justin Grant

unread,

Nov 5, 2008, 3:12:33 AM11/5/08

to sbcl-...@lists.sourceforge.net

Brian Mastenbrook wrote:

> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?

Brian,

In my testing of SBCL threaded code across platforms/architectures I've found
that only single core architectures are stable.

I've tested SBCL with threads on :

Linux/x86-64, OS X(Darwin)/x86-64, FreeBSD/x86-64 which were all stable
on single core systems. On multi-core/cpu systems things were very unpredictable
with issues ranging from memory leaks to unexplained fully loaded CPUs and
the process freezing for reasons I have not yet accurately determined. The evidence
seems to point to mapping to the underlying OSes native threads from SBCL but as I said
this issues seemed to only arise on multi-core systems. The same behaviour was seen
with CCL-64 (Clozure 64 bit). Allegro CL (64 bit) and Clojure (with a J, also 64 bit) did not have these issues.

I've attached output from these tests run on different systems.

Hope this helps.
-Justin

lisp_stacks_anon.txt

Attila Lendvai

unread,

Nov 5, 2008, 3:45:08 AM11/5/08

to Justin Grant, sbcl-...@lists.sourceforge.net

On Wed, Nov 5, 2008 at 9:12 AM, Justin Grant <jgra...@gmail.com> wrote:
> BCL deadlocks and CPUs max out at 100% after a few 10 thousand
> equests from the repl doing CTRL-C drops into the debugger where the
> rocess can be resumed but a memory fault is reported : 'debugger
> nvoked on a SB-SYS:MEMORY-FAULT-ERROR in thread #<THREAD "initial
> hread" {1002428E61}>: Unhandled memory fault at #x8040FE6F0.'
>
> fter resuming hunchentoot is again responsive until the problem
> ccurs again. top output shows that when deadlock occurs the sbcl
> rocess state is often 'sigwait' or 'umtxn' (When 'umtxn' is the
> talled state the CTRL-C break no longer works to get the sbcl process
> o resume withouth restarting). The sbcl process seems to be waiting
> or a signal(from the OS?) on when to continue (and probably on which
> PU to switch to and run).

i think i was seeing this yesterday when i tried to quickly upgrade
our servers to sbcl head to avoid some bugs. unfortunately this new
one was bringing them down (seemingly once the servers reached about
half the size of the available memory, but it may be coincidence
because it linearly grows with the number of sessions and parallel
requests).

i've attached a gdb to one of them, and i seem to remember seeing some
gc related function on the c stack. but i had to restart the server
quickly, because users were swearing on the other side.

the same code is running fine (well, does not expose *this*
misbehaviour) on 1.0.12.2. maybe something around the gc was changed
causing this?

for Nikodemus' request, using (sb-ext:get-bytes-consed) i've checked
if the code conses more on HEAD possibly due to some DX patches, but i
couldn't see a significant difference.

Linux foo 2.6.20-17-generic #2 SMP Wed Aug 20 15:14:36 UTC 2008 x86_64 GNU/Linux

the fresh version i've tried was: 1.0.22.11

please take this all as some background info with a piece of salt - it
was a hectic day.

--
attila

Martin Cracauer

unread,

Nov 5, 2008, 10:52:16 AM11/5/08

to Justin Grant, sbcl-...@lists.sourceforge.net

Justin Grant wrote on Wed, Nov 05, 2008 at 12:12:33AM -0800:
> In my testing of SBCL threaded code across platforms/architectures I've
> found
> that only single core architectures are stable.

[...]
> Linux test.machine 2.6.24-1-amd64 #1 SMP Sat May 10 09:28:10 UTC 2008 x86_64 GNU/Linux
> Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz / 1 GB ram
> Debian Lenny 5
>
> SBCL 1.0.17(threads)
> (appears stable - threads are reported stable in SBCL on Linux)
> apachebench command :
> ab -n 10000000 -c 10 http://127.0.0.1:8000/publisher-info?id=1234598
> results :
> sbcl appears to always stick to one CPU (reason for being somewhat stable ?)
> Successfully completed once all 10 million requests after ?? hours of runtime but
> mostly fails.

What software are you using here?

I bash the hell out of threads on Linux/amd64 and apart from some
problems such as the siesta semaphores and the time-endless-loop-zeros
bug I only see crashes I blame on the application. And it's defintely
not stuck to one CPU.

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@cons.org> http://www.cons.org/cracauer/
FreeBSD - where you want to go, today. http://www.freebsd.org/

Eric Marsden

unread,

Nov 6, 2008, 3:41:56 AM11/6/08

to sbcl-...@lists.sourceforge.net

Martin Cracauer écrivait:

> I bash the hell out of threads on Linux/amd64 and apart from some
> problems such as the siesta semaphores and the time-endless-loop-zeros
> bug I only see crashes I blame on the application. And it's defintely
> not stuck to one CPU.

I am seeing very regular crashes on Linux/AMD64, running Hunchentoot with
some pg-dot-lisp database connections. I'm seeing it with current CVS,
but it has been happening over the last 6 months. Features are

(:RUN-IS-INTEGER :ASDF :SB-THREAD :ANSI-CL :COMMON-LISP :SBCL :SB-DOC
:SB-TEST
:SB-LDB :SB-PACKAGE-LOCKS :SB-UNICODE :SB-EVAL :SB-SOURCE-LOCATIONS
:IEEE-FLOATING-POINT :X86-64 :UNIX :ELF :LINUX :GENCGC
:STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :LINKAGE-TABLE
:COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP
:RAW-INSTANCE-INIT-VOPS
:STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :CYCLE-COUNTER
:OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T)

fatal error encountered in SBCL pid 13874(tid 46912569784656):
sig_stop_for_gc_handler: wrong thread state on wakeup: 2

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> backtrace
Backtrace:
0: Foreign fp = 0x2aaaaf0e9590, ra = 0x40fb06
1: Foreign fp = 0x2aaaaf0e9640, ra = 0x410e1a
2: Foreign fp = 0x2aaaaf0e9b40, ra = 0x2aaaaaedaa80
3: Foreign fp = 0x2aaaaf0e9b60, ra = 0x40d55b
4: Foreign fp = 0x2aaaaf0e9ca0, ra = 0x4118a3
5: Foreign fp = 0x2aaaaf0ea1f8, ra = 0x2aaaaaedaa80
6: (COMMON-LISP::FLET
WITHOUT-INTERRUPTS-BODY-[CALL-WITH-SYSTEM-MUTEX/WITHOUT-GCING]279)
7: SB-EXT::CANCEL-FINALIZATION
8: (COMMON-LISP::FLET
WITHOUT-INTERRUPTS-BODY-[FORM-FUN-[RELEASE-FD-STREAM-RESOURCES]5659]5661)
9: SB-IMPL::RELEASE-FD-STREAM-RESOURCES
10: (SB-C::HAIRY-ARG-PROCESSOR SB-IMPL::FD-STREAM-MISC-ROUTINE)
11: (SB-C::TL-XEP (SB-PCL::FAST-METHOD SB-GRAY::PCL-CLOSE
(SB-KERNEL::ANSI-STREAM)))
12: (COMMON-LISP::FLET CLEANUP-FUN-[PROCESS-CONNECTION]94)
13: HUNCHENTOOT::PROCESS-CONNECTION
14: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK)
15: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477)
16: SB-THREAD::CALL-WITH-MUTEX
17: (COMMON-LISP::LAMBDA ())
18: Foreign fp = 0x2aaaaf0eb120, ra = 0x41ec62
19: Foreign fp = 0x2aaaaf0eb140, ra = 0x41624a

--
Eric