[Sbcl-devel] Binary release sb-thread policy

64 views
Skip to first unread message

Brian Mastenbrook

unread,
Oct 31, 2008, 10:56:07 PM10/31/08
to SBCL Devel-list
Which platforms should have SB-THREAD enabled in binary releases? When
I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
FreeBSD, and Solaris). Is this in line with what other maintainers do,
and is it a reasonable policy? Are threads on Darwin stable enough
that releases should be built with threads?
--
Brian Mastenbrook
br...@mastenbrook.net
http://brian.mastenbrook.net/


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Sbcl-devel mailing list
Sbcl-...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

James Y Knight

unread,
Oct 31, 2008, 11:04:12 PM10/31/08
to SBCL Devel-list
On Oct 31, 2008, at 10:56 PM, Brian Mastenbrook wrote:
> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?

IMO the binary releases should always be built with the default
settings, and the default build settings should be adjusted so that
threads are enabled on platforms where they are considered release-
worthy.

James

Daniel Pezely

unread,
Nov 2, 2008, 5:49:27 PM11/2/08
to Brian Mastenbrook, SBCL Devel-list
Brian Mastenbrook wrote:
> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?


Please enable threads by default on Darwin and FreeBSD.

Threads on MacOSX 10.5 and FreeBSD 7.0 (both x86-64) seem very stable,
including under certain type of sustainable stress/load testing.


I emphasize "sustainable" because we've seen problems with certain
library combinations, which are outside the scope of your question.


For completeness, however, here's what we experienced:

Using current SBCL as of June and July, issues we saw were likely due
to Hunchentoot's lack of thread pool on SBCL. Looking at its port-
sbcl.lisp file, each inbound HTTP request creates a new thread
expected to be garbage collected upon completion of that one request.
You know where this is going...

Under heavy stress (both load and capacity) testing we'd eventually
exhaust the heap (e.g., sustained 500-2000 requests per second,
parallel requests, etc).

Considering each layer in this chain (libthr + Hunchentoot + its SBCL
port + HTTP protocol closing of socket + TCP/IP connection shutdown
lag + releasing thread related structures + gc finally taking action),
there may simply be not enough time available for gc when under heavy
load with tests driven from secondary hosts on local 100mb network.
Pausing the tests and allowing for clean-up then starting again DID
help in some tests.

We were developing on 15" MacOSX laptops (santa rosa chip sets; dual-
core), and production hosts were running FreeBSD 7.0 (current Xeon;
dual quad-core).

Some tests with comparable results were seen on Debian Linux and
CentOS/RHEL5 (same models of production hardware).

The only Lisp systems which DID NOT suffer heap exhaustion under
comparable load were AllegroCL and Clojure, neither of which use
libthr. We used SBCL, CCL and I believe, LispWork's free version.
Scieneer's free version wasn't available then and still doesn't run on
MacOSX or FreeBSD.

I'm trying to get the guy who ran the tests to provide more detail...

-Daniel

Nikodemus Siivola

unread,
Nov 4, 2008, 5:39:25 AM11/4/08
to Daniel Pezely, SBCL Devel-list, Brian Mastenbrook
On Mon, Nov 3, 2008 at 12:49 AM, Daniel Pezely <dpe...@gmail.com> wrote:
> Brian Mastenbrook wrote:
>> Which platforms should have SB-THREAD enabled in binary releases? When
>> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
>> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
>> FreeBSD, and Solaris). Is this in line with what other maintainers do,
>> and is it a reasonable policy? Are threads on Darwin stable enough
>> that releases should be built with threads?
>
>
> Please enable threads by default on Darwin and FreeBSD.
>
> Threads on MacOSX 10.5 and FreeBSD 7.0 (both x86-64) seem very stable,
> including under certain type of sustainable stress/load testing.

I am reluctant to enable threads on Darwin by default as long as the
test suite fares as badly as it does. Assuming there are no such
problems on FreeBSd, that's fine by me. On Linux building with threads
by default seems like a sensible thing.

Cheers,

-- Nikodemus

Justin Grant

unread,
Nov 5, 2008, 3:12:33 AM11/5/08
to sbcl-...@lists.sourceforge.net
Brian Mastenbrook wrote:
> Which platforms should have SB-THREAD enabled in binary releases? When
> I build SBCL for upload to SourceForge, I enable SB-THREAD on Linux/
> x86 and Linux/x86-64, but not on platforms that use lutexes (Darwin,
> FreeBSD, and Solaris). Is this in line with what other maintainers do,
> and is it a reasonable policy? Are threads on Darwin stable enough
> that releases should be built with threads?

Brian,

In my testing of SBCL threaded code across platforms/architectures I've found
that only single core architectures are stable.

I've tested SBCL with threads on :

Linux/x86-64, OS X(Darwin)/x86-64, FreeBSD/x86-64 which were all stable
on single core systems. On multi-core/cpu systems things were very unpredictable
with issues ranging from memory leaks to unexplained fully loaded CPUs and
the process freezing for reasons I have not yet accurately determined. The evidence
seems to point to mapping to the underlying OSes native threads from SBCL but as I said
this issues seemed to only arise on multi-core systems. The same behaviour was seen
with CCL-64 (Clozure 64 bit). Allegro CL (64 bit) and Clojure (with a J, also 64 bit) did not have these issues.

I've attached output from these tests run on different systems.

Hope this helps.
-Justin





lisp_stacks_anon.txt

Attila Lendvai

unread,
Nov 5, 2008, 3:45:08 AM11/5/08
to Justin Grant, sbcl-...@lists.sourceforge.net
On Wed, Nov 5, 2008 at 9:12 AM, Justin Grant <jgra...@gmail.com> wrote:
> BCL deadlocks and CPUs max out at 100% after a few 10 thousand
> equests from the repl doing CTRL-C drops into the debugger where the
> rocess can be resumed but a memory fault is reported : 'debugger
> nvoked on a SB-SYS:MEMORY-FAULT-ERROR in thread #<THREAD "initial
> hread" {1002428E61}>: Unhandled memory fault at #x8040FE6F0.'
>
> fter resuming hunchentoot is again responsive until the problem
> ccurs again. top output shows that when deadlock occurs the sbcl
> rocess state is often 'sigwait' or 'umtxn' (When 'umtxn' is the
> talled state the CTRL-C break no longer works to get the sbcl process
> o resume withouth restarting). The sbcl process seems to be waiting
> or a signal(from the OS?) on when to continue (and probably on which
> PU to switch to and run).

i think i was seeing this yesterday when i tried to quickly upgrade
our servers to sbcl head to avoid some bugs. unfortunately this new
one was bringing them down (seemingly once the servers reached about
half the size of the available memory, but it may be coincidence
because it linearly grows with the number of sessions and parallel
requests).

i've attached a gdb to one of them, and i seem to remember seeing some
gc related function on the c stack. but i had to restart the server
quickly, because users were swearing on the other side.

the same code is running fine (well, does not expose *this*
misbehaviour) on 1.0.12.2. maybe something around the gc was changed
causing this?

for Nikodemus' request, using (sb-ext:get-bytes-consed) i've checked
if the code conses more on HEAD possibly due to some DX patches, but i
couldn't see a significant difference.

Linux foo 2.6.20-17-generic #2 SMP Wed Aug 20 15:14:36 UTC 2008 x86_64 GNU/Linux

the fresh version i've tried was: 1.0.22.11

please take this all as some background info with a piece of salt - it
was a hectic day.

--
attila

Martin Cracauer

unread,
Nov 5, 2008, 10:52:16 AM11/5/08
to Justin Grant, sbcl-...@lists.sourceforge.net
Justin Grant wrote on Wed, Nov 05, 2008 at 12:12:33AM -0800:
> In my testing of SBCL threaded code across platforms/architectures I've
> found
> that only single core architectures are stable.
[...]
> Linux test.machine 2.6.24-1-amd64 #1 SMP Sat May 10 09:28:10 UTC 2008 x86_64 GNU/Linux
> Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz / 1 GB ram
> Debian Lenny 5
>
> SBCL 1.0.17(threads)
> (appears stable - threads are reported stable in SBCL on Linux)
> apachebench command :
> ab -n 10000000 -c 10 http://127.0.0.1:8000/publisher-info?id=1234598
> results :
> sbcl appears to always stick to one CPU (reason for being somewhat stable ?)
> Successfully completed once all 10 million requests after ?? hours of runtime but
> mostly fails.

What software are you using here?

I bash the hell out of threads on Linux/amd64 and apart from some
problems such as the siesta semaphores and the time-endless-loop-zeros
bug I only see crashes I blame on the application. And it's defintely
not stuck to one CPU.

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@cons.org> http://www.cons.org/cracauer/
FreeBSD - where you want to go, today. http://www.freebsd.org/

Eric Marsden

unread,
Nov 6, 2008, 3:41:56 AM11/6/08
to sbcl-...@lists.sourceforge.net
Martin Cracauer écrivait:

> I bash the hell out of threads on Linux/amd64 and apart from some
> problems such as the siesta semaphores and the time-endless-loop-zeros
> bug I only see crashes I blame on the application. And it's defintely
> not stuck to one CPU.

I am seeing very regular crashes on Linux/AMD64, running Hunchentoot with
some pg-dot-lisp database connections. I'm seeing it with current CVS,
but it has been happening over the last 6 months. Features are

(:RUN-IS-INTEGER :ASDF :SB-THREAD :ANSI-CL :COMMON-LISP :SBCL :SB-DOC
:SB-TEST
:SB-LDB :SB-PACKAGE-LOCKS :SB-UNICODE :SB-EVAL :SB-SOURCE-LOCATIONS
:IEEE-FLOATING-POINT :X86-64 :UNIX :ELF :LINUX :GENCGC
:STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :LINKAGE-TABLE
:COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP
:RAW-INSTANCE-INIT-VOPS
:STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :CYCLE-COUNTER
:OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T)

fatal error encountered in SBCL pid 13874(tid 46912569784656):
sig_stop_for_gc_handler: wrong thread state on wakeup: 2

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> backtrace
Backtrace:
0: Foreign fp = 0x2aaaaf0e9590, ra = 0x40fb06
1: Foreign fp = 0x2aaaaf0e9640, ra = 0x410e1a
2: Foreign fp = 0x2aaaaf0e9b40, ra = 0x2aaaaaedaa80
3: Foreign fp = 0x2aaaaf0e9b60, ra = 0x40d55b
4: Foreign fp = 0x2aaaaf0e9ca0, ra = 0x4118a3
5: Foreign fp = 0x2aaaaf0ea1f8, ra = 0x2aaaaaedaa80
6: (COMMON-LISP::FLET
WITHOUT-INTERRUPTS-BODY-[CALL-WITH-SYSTEM-MUTEX/WITHOUT-GCING]279)
7: SB-EXT::CANCEL-FINALIZATION
8: (COMMON-LISP::FLET
WITHOUT-INTERRUPTS-BODY-[FORM-FUN-[RELEASE-FD-STREAM-RESOURCES]5659]5661)
9: SB-IMPL::RELEASE-FD-STREAM-RESOURCES
10: (SB-C::HAIRY-ARG-PROCESSOR SB-IMPL::FD-STREAM-MISC-ROUTINE)
11: (SB-C::TL-XEP (SB-PCL::FAST-METHOD SB-GRAY::PCL-CLOSE
(SB-KERNEL::ANSI-STREAM)))
12: (COMMON-LISP::FLET CLEANUP-FUN-[PROCESS-CONNECTION]94)
13: HUNCHENTOOT::PROCESS-CONNECTION
14: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK)
15: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477)
16: SB-THREAD::CALL-WITH-MUTEX
17: (COMMON-LISP::LAMBDA ())
18: Foreign fp = 0x2aaaaf0eb120, ra = 0x41ec62
19: Foreign fp = 0x2aaaaf0eb140, ra = 0x41624a


--
Eric

Nikodemus Siivola

unread,
Nov 6, 2008, 4:32:09 AM11/6/08
to Eric Marsden, sbcl-...@lists.sourceforge.net
On Thu, Nov 6, 2008 at 10:41 AM, Eric Marsden <eric.m...@free.fr> wrote:

> fatal error encountered in SBCL pid 13874(tid 46912569784656):
> sig_stop_for_gc_handler: wrong thread state on wakeup: 2

Any other failure modes?

Cheers,

-- Nikodemus

Eric Marsden

unread,
Nov 6, 2008, 7:39:12 AM11/6/08
to sbcl-...@lists.sourceforge.net
Nikodemus Siivola écrivait:

> On Thu, Nov 6, 2008 at 10:41 AM, Eric Marsden <eric.m...@free.fr> wrote:

>> fatal error encountered in SBCL pid 13874(tid 46912569784656):
>> sig_stop_for_gc_handler: wrong thread state on wakeup: 2
>
> Any other failure modes?

I haven't been logging them, but from memory this is the most frequent.

Eric

Nikodemus Siivola

unread,
Nov 6, 2008, 11:33:25 AM11/6/08
to Eric Marsden, sbcl-...@lists.sourceforge.net
On Thu, Nov 6, 2008 at 2:39 PM, Eric Marsden <eric.m...@free.fr> wrote:
> Nikodemus Siivola écrivait:
>> On Thu, Nov 6, 2008 at 10:41 AM, Eric Marsden <eric.m...@free.fr> wrote:
>
>>> fatal error encountered in SBCL pid 13874(tid 46912569784656):
>>> sig_stop_for_gc_handler: wrong thread state on wakeup: 2
>>
>> Any other failure modes?
>
> I haven't been logging them, but from memory this is the most frequent.

The only times I've managed to see this for myself so far have been
with GDB attached while poking at the image -- and on Darwin at that.

Aside from deep wrongness (corrupted thread objects) this can happen
if something send a signal to SBCL that is the same signal used
internally to wake up threads after GC -- SIGUSR2 on Darwin, and
SIGRTMIN+1 on Linux. Are you using postgres over FFI? Maybe the
library uses these signals internally?

It seems to me that using a realtime semaphore (sem_post & sem_wait)
to the waiting should not be suspectible to mishaps like these. Is a
spefic reason we use a signal instead?

Relatedly, running multiple reader threads in the "moribund thread
entering GC" test from some time back exposes all sorts of interesting
threading issues in addition to that. I some tentative fixes, and will
be looking at them in more detail next week.

Cheers,

-- Nikodemus

Gábor Melis

unread,
Nov 6, 2008, 12:01:08 PM11/6/08
to sbcl-...@lists.sourceforge.net, Eric Marsden
On Jueves 06 Noviembre 2008, Nikodemus Siivola wrote:
> Aside from deep wrongness (corrupted thread objects) this can happen
> if something send a signal to SBCL that is the same signal used
> internally to wake up threads after GC -- SIGUSR2 on Darwin, and
> SIGRTMIN+1 on Linux. Are you using postgres over FFI? Maybe the
> library uses these signals internally?

Years ago I ran into trouble with postgres linked in while having no
problems with it used via a socket.

> It seems to me that using a realtime semaphore (sem_post & sem_wait)
> to the waiting should not be suspectible to mishaps like these. Is a
> spefic reason we use a signal instead?

Because the sig_stop_for_gc needs a signal anyway to be able stop a
thread at arbitrary point and on most platforms that signal is sent to
resume the thread.

Eric Marsden

unread,
Nov 6, 2008, 2:46:47 PM11/6/08
to sbcl-...@lists.sourceforge.net
>>>>> "ns" == Nikodemus Siivola <niko...@random-state.net> writes:

ns> Aside from deep wrongness (corrupted thread objects) this can happen
ns> if something send a signal to SBCL that is the same signal used
ns> internally to wake up threads after GC -- SIGUSR2 on Darwin, and
ns> SIGRTMIN+1 on Linux. Are you using postgres over FFI? Maybe the
ns> library uses these signals internally?

pg-dot-lisp accesses PostgreSQL using its network protocol, without
FFI. Hunchentoot and and S-XML have a fair number of dependencies,
some of which use FFI, but I don't think any of them use signals.

I'll see whether I can produce a test case.

--
Eric Marsden

Justin Grant

unread,
Nov 6, 2008, 6:18:21 PM11/6/08
to Martin Cracauer, sbcl-...@lists.sourceforge.net
The application uses a basic hunchentoot stack to service urls.
There is no explicit threading used in this application it's only hunchentoot doing that type of thing.
Basically that url handles a request and looks up some data in hash tables. However even disabling all of the logic and just returning a static payload results in the same issues but it just takes longer.

Regardless though, why would the same application code work just fine under heavy load on single core systems but become very unstable on multi-core systems ?

-Justin

Nikodemus Siivola

unread,
Nov 7, 2008, 5:28:02 AM11/7/08
to Justin Grant, sbcl-...@lists.sourceforge.net, Martin Cracauer
On Fri, Nov 7, 2008 at 1:18 AM, Justin Grant <jgra...@gmail.com> wrote:

> why would the same application code work just fine under
> heavy load on single core systems but become very unstable on multi-core
> systems ?

Timing issues.

Assume there is a non-thread-safe codepath that is relatively short
(say a couple of dozen instructions, no syscalls.)

To see the problem on a single core system you need to hit a context
switch in the middle of it, *and* then get the other thread to execute
the same codepath. On a multicore system your chances of a context
switch are multiplied (assuming same average time-slice size), and
additionally you can get two threads to really execute the piece in
parallel.

The *real* parallelism is the bit that makes the big difference, but
more context-switches / real-time-unit helps too.

Cheers,

-- Nikodemus

Gábor Melis

unread,
Nov 7, 2008, 8:41:54 AM11/7/08
to sbcl-...@lists.sourceforge.net, Martin Cracauer
On Viernes 07 Noviembre 2008, Nikodemus Siivola wrote:
> On Fri, Nov 7, 2008 at 1:18 AM, Justin Grant <jgra...@gmail.com>
wrote:
> > why would the same application code work just fine under
> > heavy load on single core systems but become very unstable on
> > multi-core systems ?
>
> Timing issues.
>
> Assume there is a non-thread-safe codepath that is relatively short
> (say a couple of dozen instructions, no syscalls.)
>
> To see the problem on a single core system you need to hit a context
> switch in the middle of it, *and* then get the other thread to
> execute the same codepath. On a multicore system your chances of a
> context switch are multiplied (assuming same average time-slice
> size), and additionally you can get two threads to really execute the
> piece in parallel.

On multicore systems there is also the topic of memory model, cache
coherency, barriers. A deep look into this would be most beneficial.

> The *real* parallelism is the bit that makes the big difference, but
> more context-switches / real-time-unit helps too.
>
> Cheers,

-------------------------------------------------------------------------

Martin Cracauer

unread,
Nov 7, 2008, 11:17:30 AM11/7/08
to Justin Grant, sbcl-...@lists.sourceforge.net, Martin Cracauer
Justin Grant wrote on Thu, Nov 06, 2008 at 03:18:21PM -0800:
> The application uses a basic hunchentoot stack to service urls.

How did you verify that hunchentoot is thread-safe?

> There is no explicit threading used in this application it's only
> hunchentoot doing that type of thing.
> Basically that url handles a request and looks up some data in hash tables.
> However even disabling all of the logic and just returning a static payload
> results in the same issues but it just takes longer.

And that still show load only on one CPU in a multi-cpu system?

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@cons.org> http://www.cons.org/cracauer/
FreeBSD - where you want to go, today. http://www.freebsd.org/

-------------------------------------------------------------------------

Justin Grant

unread,
Nov 7, 2008, 8:52:28 PM11/7/08
to Martin Cracauer, sbcl-...@lists.sourceforge.net
How did you verify that hunchentoot is thread-safe?
 
I'm not sure it is. I need to dig up some code I wrote that avoids hunchentoot completely
and does the same thing with a socket listener. I recall the problem still exists.
 
And that still show load only on one CPU in a multi-cpu system?

The application load was fairly negligible wether the hash-table lookups are done or a static payload is returned. Most of the overhead was in the socket/threads layer from what I could tell.
However when the SBCL process breaks down then sometimes the CPU becomes fully loaded
 and completely unresponsive.

-Justin

Cyrus Harmon

unread,
Nov 8, 2008, 12:40:34 AM11/8/08
to Justin Grant, SBCL Devel-list

I haven't dug into this much, but on FreeBSD with the development
hunchentoot (with sb-threads), I see occasional crashes. I'm wondering
if usocket might be part of the problem here. Mind you, I have no
evidence, and lots of other things have changed in hunchentoot since
the last stable version, but the usocket code is new, I think, rather
than calling sb-bsd-sockets directly.

Just a guess...

Cyrus

Justin Grant

unread,
Nov 8, 2008, 5:03:13 AM11/8/08
to Martin Cracauer, sbcl-...@lists.sourceforge.net
Ok so with further testing I've confirmed that hunchentoot is the culprit.
I've bypassed hunchentoot by coding up a socket server for stress
testing sb-bsd-sockets and sb-thread.
So far my tests are perfectly stable(no memory leaks and no more
crashes) on my macbook core 2 duo using the following apache bench
settings :

ab -v 1 -n 10000000 -c 100 http://127.0.0.1:8080/

And here is the socket server test code :


;;;; Simple Threaded Socket listener
;;;; using SB-BSD-SOCKETS.

(in-package :cl-user)
(require :sb-bsd-sockets)

(defpackage :myServer
(:use :cl
:sb-thread
:sb-bsd-sockets
:sb-unix
:sb-ext)
(:export :run
:stop))

(in-package :myServer)

(defparameter *default-server-address* '(127 0 0 1)
"The default address on which instances of the server listen.")

(defparameter *default-server-port* 8080
"The default port on which instances of the server listen.")

(defparameter *default-server-backlog* 100
"The default number of simultaneous connections to the server.")

(defconstant +null+ (code-char 0)
"A null byte.")

(defun read-upto-null (stream char-array)
"Read everything from stream up until a null byte or EOF."
(do ((c (read-char stream nil nil) (read-char stream nil nil)))
((or (equal c +null+)
(equal c nil)
(equal c #\Newline))
(if (equal c +null+) char-array nil))
(vector-push-extend c char-array))
char-array)

(defmethod handle-client ((socket inet-socket))
"Handle a client request."
(let ((client-stream
(socket-make-stream socket
:input t
:output t
;:element-type '(unsigned-byte 8)
:element-type 'character
;:external-format :utf-8
:buffering :full))
(s (make-array 1024 :fill-pointer 0
:adjustable t
:element-type 'character)))
(let ((message (coerce (read-upto-null client-stream s) 'string)))
(write-string "STATIC RESPONSE TEXT." client-stream)
(write-line message)
)

(finish-output client-stream)
(setf (fill-pointer s) 0)) ; Reset our character array

(write-line "Closing client connection.")
(socket-close socket))

(defmacro with-socket (socket &body body)
"Create and close a socket around the body."
`(let ((,socket (make-instance 'inet-socket
:type :stream
:protocol :tcp)))
(unwind-protect (progn ,@body)
(socket-close ,socket))))

(defmethod run ()
"Run the server, listening on the specified port and dispatching
client requests."
(write-line "Starting server.")
(with-socket server-socket
(setf (sockopt-reuse-address server-socket) t)
(socket-bind server-socket *default-server-address* *default-server-port*)
(socket-listen server-socket *default-server-backlog*)
(do ((client-socket (socket-accept server-socket)
(socket-accept server-socket))) ; ignore peer value
(nil) ; infinite loop
(write-line "New client")
(let ((client-socket client-socket))
(make-thread
(lambda () (handle-client client-socket)) :name "handle-client")))))

(defun start ()
(setf myServer-thread (make-thread (lambda () (run)) :name "myServer")))

(defun stop ()
(let ((st myServer-thread))
(cond
((thread-alive-p st)
(write-line "Server stopped.")
(terminate-thread st))
(t
(write-line "Server is not running.")
nil))))

;; start the server
(start)

Martin Cracauer

unread,
Nov 8, 2008, 5:00:41 PM11/8/08
to Justin Grant, sbcl-...@lists.sourceforge.net, Martin Cracauer
Justin Grant wrote on Fri, Nov 07, 2008 at 05:52:28PM -0800:
> > How did you verify that hunchentoot is thread-safe?
>
>
> I'm not sure it is. I need to dig up some code I wrote that avoids
> hunchentoot completely
> and does the same thing with a socket listener. I recall the problem still
> exists.

Maybe the TCP layers in SBCL aren't thread safe?

> > And that still show load only on one CPU in a multi-cpu system?
> >
>
> The application load was fairly negligible wether the hash-table lookups are
> done or a static payload is returned. Most of the overhead was in the
> socket/threads layer from what I could tell.

You probably spend almost all your time in stream locking on the TCP
streams and the rest of the time in locks around that hashtable.

Lock-only work would explain why it pops up only on one CPU.

You should be able to do this is a design with less locking.

How many outgoing ... things are there per TCP client connecting?

> However when the SBCL process breaks down then sometimes the CPU becomes
> fully loaded
> and completely unresponsive.

"breaks down"? What does that mean?

JTK

unread,
Nov 9, 2008, 3:50:27 PM11/9/08
to Justin Grant, sbcl-...@lists.sourceforge.net

On Nov 8, 2008, at 12:03 AM, Justin Grant wrote:
> Ok so with further testing I've confirmed that hunchentoot is the
> culprit.
> I've bypassed hunchentoot by coding up a socket server for stress
> testing sb-bsd-sockets and sb-thread.


Hello,

A failure of Hunchentoot might be of interest to a lot of people.

Have you tried to repeat the experiment with H'toot running in OpenMCL,
another native-thread implementation? I've found that OpenMCL is
very easy to set up.


Also, H'toot has a lot of layers between user streams and the socket.

eg
http://www.weitz.de/hunchentoot/#performance
http://common-lisp.net/pipermail/tbnl-devel/2007-March/001099.html

Have you tried using some of the tricks suggested on the above pages
to bypass some of the intermediate layers, to see H'toot stabilizes?
For example, publishing static files, or returning an (unsigned-byte 8)
instead of a string from your page handler?

This might at least pin down where in H'toot the problem lies.

I would be very interested if you learn anything more.


Jan

Reply all
Reply to author
Forward
0 new messages