Currently i'm developing a multithreaded aplication, which uses
tracemonkey. And now I starting to face problems related to
performance. Application model is simple - each thread has a task to
do, tasks are non-blocking. So, each thread gets context from pool
(pool is big, problem is not here), gets precompiled script and
execute it. During execution i do not share date between threads.
Execution goes inside request (JS_BeginRequest/JS_EndRequest), after
each request I call JS_MaybeGC();
So, a used all recommended techniques to improve performance, listed
on MDC. (using different global object in each context, do not put
blocking operations inside requests, etc. ) But performance still
dramatically decreases, when multiple threads (10-20) are running.
Example - if i have only 1 thread running, task is done ind time T; if
10 threads running the same task is done for time about 5*T -- 7*T.
(There 16 CPU's) C++ does not decrease performance in same
_multithreaded_ test as it does Tracemonkey (in 5-7 times, C++ only in
2 times).
The question is: maybe Tracemonkey has a lot of internal blocking or
something like that, and it's slowes it down? Or maybe there some
another possibilities to increase performance? Thanks.
#0 0xb7fe7524 in __kernel_vsyscall ()
#1 0xb79da56e in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/
libpthread.so.0
#2 0xb79d7182 in _L_mutex_lock_152 () from /lib/tls/i686/cmov/
libpthread.so.0
#3 0x6e617669 in ?? ()
#4 0xb7f54b21 in js_NativeGet () from /home/itroot/build/prefix/lib/
libmozjs.so
#5 0xb7885b22 in PR_Lock () from /home/itroot/build/prefix/lib/
libnspr4.so
#6 0xb7f46c28 in js_Enqueue () from /home/itroot/build/prefix/lib/
libmozjs.so
#7 0xb7f0b362 in js_AtomizeString () from /home/itroot/build/prefix/
lib/libmozjs.so
#8 0xb7f0b659 in js_Atomize () from /home/itroot/build/prefix/lib/
libmozjs.so
#9 0xb7efe431 in DefineProperty () from /home/itroot/build/prefix/lib/
libmozjs.so
or
Thread 49 (Thread -1633506384 (LWP 19430)):
#0 0xb7fe7524 in __kernel_vsyscall ()
#1 0xb79da56e in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/
libpthread.so.0
#2 0xb79d7182 in _L_mutex_lock_152 () from /lib/tls/i686/cmov/
libpthread.so.0
#3 0x00000004 in ?? ()
#4 0xb7f73c13 in dosprintf () from /home/itroot/build/prefix/lib/
libmozjs.so
#5 0xb7885b22 in PR_Lock () from /home/itroot/build/prefix/lib/
libnspr4.so
#6 0xb7f46c28 in js_Enqueue () from /home/itroot/build/prefix/lib/
libmozjs.so
#7 0xb7f0b362 in js_AtomizeString () from /home/itroot/build/prefix/
lib/libmozjs.so
#8 0xb7f0b659 in js_Atomize () from /home/itroot/build/prefix/lib/
libmozjs.so
#9 0xb7f4ca8b in js_GetClassId () from /home/itroot/build/prefix/lib/
libmozjs.so
#10 0xb7f52f1a in js_NewObject () from /home/itroot/build/prefix/lib/
libmozjs.so
#11 0xb7f03562 in JS_NewPropertyIterator () from /home/itroot/build/
prefix/lib/libmozjs.so
or
Thread 30 (Thread -1474045008 (LWP 19411)):
#0 0xb7fe7524 in __kernel_vsyscall ()
#1 0xb79d7e16 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/
cmov/libpthread.so.0
#2 0xb7885992 in PR_WaitCondVar () from /home/itroot/build/prefix/lib/
libnspr4.so
#3 0xb7f46c9b in js_Enqueue () from /home/itroot/build/prefix/lib/
libmozjs.so
#4 0xb7f0b362 in js_AtomizeString () from /home/itroot/build/prefix/
lib/libmozjs.so
#5 0xb7f0b659 in js_Atomize () from /home/itroot/build/prefix/lib/
libmozjs.so
#6 0xb7f0257e in JS_GetProperty () from /home/itroot/build/prefix/lib/
libmozjs.so
Can I configure tracemonkey behavior in that way, that it not to do
additional locking while acessing properties, etc?
I am not an expert here, but am wondering why you are hitting PRLock() from
js_Enqueue.
Is it possible that your compiler/CPU is not supported for compare-and-swap
in jslock.cpp, and as such falling through to PRLock(), which uses a pthread
mutex instead?
What compiler, OS, and CPU are you using? If you are using sparc CPU or
SunStudio this patch might help:
https://bugzilla.mozilla.org/show_bug.cgi?id=502696
If you are using icc on non-Windows you should take a look at submitting a
patch of your own, I am pretty sure it falls through to PRLock().
Wes
> _______________________________________________
> dev-tech-js-engine mailing list
> dev-tech-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine
>
--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
js_AtomizeString uses a hashtable to map from strings to atoms. So you
have to lock around the hashtable access.
If your threads really have no shared state whatsoever, you might want
to just do separate JSRuntimes per thread, no?
-Boris
Really, i look now into jslock.h/jslock.cpp, this is strange.
OS : Linux 2.6.26
Compiler: g++ (GCC) 4.1.2
CPU: [Intel(R) Xeon(R) CPU E5530 @ 2.40GHz] x16
Thank you for advice and direction to look.
I just tried your advice, and it seems to me that is was very
helpfull. Lately I will write about results, thanks.
This shows contention as Boris said. I assume the next frame is being
mis-labeled as js_NativeGet:
This again shows contention on the thin lock used by js_AtomizeString,
indeed.
Using a JSRuntime per thread will avoid all of this -- if you do use a
runtime per thread, you should try going further by compiling without
JS_THREADSAFE defined. This will run into a few bugs but they may be
easy to fix, and if you are game, let's try it, get them filed, and
fix them. One is already on file:
https://bugzilla.mozilla.org/show_bug.cgi?id=509857
> #7 0xb7f0b362 in js_AtomizeString () from /home/itroot/build/prefix/
> lib/libmozjs.so
> #8 0xb7f0b659 in js_Atomize () from /home/itroot/build/prefix/lib/
> libmozjs.so
> #9 0xb7f4ca8b in js_GetClassId () from /home/itroot/build/prefix/lib/
> libmozjs.so
> #10 0xb7f52f1a in js_NewObject () from /home/itroot/build/prefix/lib/
> libmozjs.so
> #11 0xb7f03562 in JS_NewPropertyIterator () from /home/itroot/build/
> prefix/lib/libmozjs.so
The JS_NewPropertyIterator API is expensive if you have a shared
JSRuntime with many threads calling this API. Please file a bug on
this if you can at bugzilla.mozilla.org, in product Core, component
JavaScript Engine. Thanks.
>
> or
>
> Thread 30 (Thread -1474045008 (LWP 19411)):
> #0 0xb7fe7524 in __kernel_vsyscall ()
> #1 0xb79d7e16 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/
> cmov/libpthread.so.0
> #2 0xb7885992 in PR_WaitCondVar () from /home/itroot/build/prefix/lib/
> libnspr4.so
> #3 0xb7f46c9b in js_Enqueue () from /home/itroot/build/prefix/lib/
> libmozjs.so
> #4 0xb7f0b362 in js_AtomizeString () from /home/itroot/build/prefix/
> lib/libmozjs.so
> #5 0xb7f0b659 in js_Atomize () from /home/itroot/build/prefix/lib/
> libmozjs.so
> #6 0xb7f0257e in JS_GetProperty () from /home/itroot/build/prefix/lib/
> libmozjs.so
>
> Can I configure tracemonkey behavior in that way, that it not to do
> additional locking while acessing properties, etc?
BTW tracing is not even in the picture here. You would improve your
embedding's performance in any case if you avoid JS_GetProperty in
favor of JS_GetPropertyById -- the latter takes a jsid which avoids
all the re-atomizing (interning) and inflating (to Unicode from ASCII)
of const char *name parameter to JS_GetProperty. You have to hoist
that all out to generate an id that you reuse, of course, but this is
easy to do.
/be
Does your application share objects among threads? Apparently not if
multiple runtimes work for you. The MDC doc is worded a bit awkwardly,
for sure. It could say that you can use a runtime per thread for best
performance (soon I hope without JS_THREADSAFE) but doing so misses
opportunities to share some of the VM's data structures (duplicates
overhead), and doing so precludes sharing objects across threads.
The JSRuntime : Process :: JSContext : Thread or stack analogy shows
SpiderMonkey's roots in the '90s. In hindsight it looks like a shared-
nothing design that requires proxying across threads would be better.
My current thinking is that we will keep the ability to share objects
across threads within a single JSRuntime's GC heap, but change the JS
API so that objects are created to be used by only one thread for
their entire life, and you have to use a new API to get a multi-thread
accessible object from its birth.
> And i can not find no information
> about performance increasing using multiple runtimes in MDC. I'm sure,
> that it is very useful information.
Help wanted. It's not hard, just use JS_NewRuntime and
JS_DestroyRuntime appropriately.
/be
Just built my library my library without JS_THREADSAFE, all tests are
ok(tests of my library). No additional threads (Of course, because we
have no need in libnspr.) So i try to use this version.
I'm asking that, because i found no bugs using tracemonkey in that
way, so i wonder, maybe they will show under high load/high
concurrency/etc?