jemalloc to replace default allocator

1,160 views
Skip to first unread message

Joel Reymont

unread,
May 6, 2010, 5:51:47 PM5/6/10
to Darwin Dev
I would like to use jemalloc to replace the system memory allocator in Firefox.

My understanding is that it needs to hook into the zone system and there's existing code in Firefox that sort of does it [1]. According to Mozilla, "it is difficult to get our code to load fast enough to replace the default zone before other allocations happen" so a linker trick is used [2] to get it to happen early.

What is the proper way to make it work (fast!) in 10.5 and 10.6?

I'm not familiar with the zone system but I'm a quick learner, in case someone has pointers for me.

Thanks in advance, Joel

[1] http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#6380
[2] http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/Makefile.in#141

---
http://twitter.com/wagerlabs

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (Darwi...@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/darwin-dev-garchive-73044%40googlegroups.com

This email sent to darwin-dev-g...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "darwin-dev" group.
To post to this group, send email to darwi...@googlegroups.com.
To unsubscribe from this group, send email to darwin-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/darwin-dev?hl=en.

Alastair Houghton

unread,
May 7, 2010, 5:59:40 AM5/7/10
to Joel Reymont, Darwin Dev
On 6 May 2010, at 22:51, Joel Reymont wrote:

> I would like to use jemalloc to replace the system memory allocator in Firefox.

Why do that? My understanding is that jemalloc's primary advantage is massive scalability for heavily threaded code. Firefox isn't *that* heavily threaded, right?

In any event, the system allocator on Mac OS X has always been pretty good, and it looks like the Snow Leopard version is already designed with per-processor "magazines" for small allocations.

Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc. Not only does it mean that Firefox will get faster with every optimisation Apple makes to the system malloc, but it also means that all of the malloc debugging tools and so on will work with Firefox.

Kind regards,

Alastair.

--
http://alastairs-place.net

Joel Reymont

unread,
May 7, 2010, 6:08:13 AM5/7/10
to Alastair Houghton, Darwin Dev
Alastair,

On May 7, 2010, at 10:59 AM, Alastair Houghton wrote:

> Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc.

I get your point but let's just say this is something I've been tasked with. Apparently, the Javascript team discovered that too much time is being spent in free, or something like that. My goal is to plug in jemalloc and then let the Javascript team profile again.

Thanks, Joel

---
http://twitter.com/wagerlabs

Jean-Daniel Dupas

unread,
May 7, 2010, 6:23:55 AM5/7/10
to Alastair Houghton, Darwin Dev

Le 7 mai 2010 à 11:59, Alastair Houghton a écrit :

> On 6 May 2010, at 22:51, Joel Reymont wrote:
>
>> I would like to use jemalloc to replace the system memory allocator in Firefox.
>
> Why do that? My understanding is that jemalloc's primary advantage is massive scalability for heavily threaded code. Firefox isn't *that* heavily threaded, right?
>
> In any event, the system allocator on Mac OS X has always been pretty good, and it looks like the Snow Leopard version is already designed with per-processor "magazines" for small allocations.
>
> Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc.
> Not only does it mean that Firefox will get faster with every optimisation Apple makes to the system malloc, but it also means that all of the malloc debugging tools and so on will work with Firefox.
>
> Kind regards,
>

Good advices, but how do you perform measurements without a way to replace the system allocator ;-)

Joel, I think you can override malloc default implementation by using DYLD_INSERT_LIBRARIES and DYLD_FORCE_FLAT_NAMESPACE to inject your malloc library at launch runtime.

But I agree with Alastair, I doubt you get any performance improvement using a custom allocator. The system allocator is pretty good for general purpose applications (like Firefox)

-- Jean-Daniel

Jean-Daniel Dupas

unread,
May 7, 2010, 6:25:28 AM5/7/10
to Joel Reymont, Darwin Dev
Le 7 mai 2010 à 12:08, Joel Reymont a écrit :

Alastair,

On May 7, 2010, at 10:59 AM, Alastair Houghton wrote:

Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc.

I get your point but let's just say this is something I've been tasked with. Apparently, the Javascript team discovered that too much time is being spent in free, or something like that. My goal is to plug in jemalloc and then let the Javascript team profile again.

Why not using a custom allocator for javascript only ? The javascript engine should be design with memory allocation hook, so you can plug any allocator you want just for this specific part.



-- Jean-Daniel




Alastair Houghton

unread,
May 7, 2010, 6:34:27 AM5/7/10
to Jean-Daniel Dupas, Darwin Dev
On 7 May 2010, at 11:23, Jean-Daniel Dupas wrote:

> Good advices, but how do you perform measurements without a way to replace the system allocator ;-)

Indeed, I do realise that you could only *easily* perform artificial benchmarks. :-)

Kind regards,

Alastair.

--
http://alastairs-place.net



Alastair Houghton

unread,
May 7, 2010, 6:39:55 AM5/7/10
to Joel Reymont, Darwin Dev
On 7 May 2010, at 11:08, Joel Reymont wrote:

> On May 7, 2010, at 10:59 AM, Alastair Houghton wrote:
>
>> Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc.
>
> I get your point but let's just say this is something I've been tasked with. Apparently, the Javascript team discovered that too much time is being spent in free, or something like that. My goal is to plug in jemalloc and then let the Javascript team profile again.

So maybe the solution is to look why the Javascript interpreter is releasing (and presumably allocating) so much? Or perhaps to consider whether giving it its own zone would be a useful thing to do (though you may very well not need to use a different allocation routine).

Do you have the output from Shark for one of these cases to show you which code path it's hitting and where it's really spending its time?

Kind regards,

Alastair.

--
http://alastairs-place.net



Joel Reymont

unread,
May 7, 2010, 7:45:59 AM5/7/10
to Jean-Daniel Dupas, Darwin Dev

On May 7, 2010, at 11:23 AM, Jean-Daniel Dupas wrote:

> But I agree with Alastair, I doubt you get any performance improvement using a custom allocator. The system allocator is pretty good for general purpose applications (like Firefox)

What about Safari and Chrome using tcmalloc instead of the default allocator?

Here's a thread from the Chromium mailing list:

http://groups.google.com/group/chromium-dev/browse_thread/thread/60f5b665c9c536c6/d1f8e73fc7c33b4b?lnk=gst&q=tcmalloc#d1f8e73fc7c33b4b

"From a performance perspective, it may be critical to use tcmalloc to match
safari performance."

---
http://twitter.com/wagerlabs

David Leimbach

unread,
May 7, 2010, 10:37:33 AM5/7/10
to Alastair Houghton, Darwin Dev
On Fri, May 7, 2010 at 2:59 AM, Alastair Houghton <alas...@alastairs-place.net> wrote:
On 6 May 2010, at 22:51, Joel Reymont wrote:

> I would like to use jemalloc to replace the system memory allocator in Firefox.

Why do that?  My understanding is that jemalloc's primary advantage is massive scalability for heavily threaded code.  Firefox isn't *that* heavily threaded, right?

They already did that for linux and windows did they not?  Mac OS X was the one platform left out of this optimization.  
 

In any event, the system allocator on Mac OS X has always been pretty good, and it looks like the Snow Leopard version is already designed with per-processor "magazines" for small allocations.

Yeah Mac OS X's allocation scheme is optimized for threads already quite well, and this was necessary for GCD (libdispatch threading stuff) to work as well as it does.  There's a lot of copying that has to go on for those blocks to work appropriately.  
 

Personally, unless you have a really good justification for using jemalloc instead (and I'd argue that, for that, you need measurements showing that it's significantly faster than the system implementation), I'd stick with the system malloc.  Not only does it mean that Firefox will get faster with every optimisation Apple makes to the system malloc, but it also means that all of the malloc debugging tools and so on will work with Firefox.

Kind regards,

Alastair.

--
http://alastairs-place.net



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwi...@lists.apple.com)
Help/Unsubscribe/Update your Subscription:

Jens Alfke

unread,
May 7, 2010, 4:30:59 PM5/7/10
to Alastair Houghton, Darwin Dev

On May 7, 2010, at 2:59 AM, Alastair Houghton wrote:

> Why do that? My understanding is that jemalloc's primary advantage
> is massive scalability for heavily threaded code. Firefox isn't
> *that* heavily threaded, right?
> In any event, the system allocator on Mac OS X has always been
> pretty good, and it looks like the Snow Leopard version is already
> designed with per-processor "magazines" for small allocations.

Safari has long used a custom allocator (tcmalloc) for performance
reasons. It makes a significant difference on standard browser
benchmarks, I'm told, although I heard that before 10.6 came out.
Chrome uses tcmalloc on Windows and Linux, and probably will
eventually on Mac too.

Neither of these goes as far as overriding malloc itself. Instead they
define their own allocator functions and then use those instead of
calling malloc directly. (Simply overriding ::operator new() gets you
a lot of the way there, for a C++ app.)

In the case of JavaScript (or other GC'd languages) you get the
biggest speed boost from not using a normal allocator at all. The
usage patterns of a GC system are very different because freeing
blocks happens rarely and in huge batches. A typical GC allocator will
allocate space by simply bumping a global pointer, and trigger a minor
collection when the pointer hits the end of the space. This is one of
the things that makes V8 fast.

—Jens _______________________________________________

Alastair Houghton

unread,
May 7, 2010, 5:19:56 PM5/7/10
to Jens Alfke, Darwin Dev
On 7 May 2010, at 21:30, Jens Alfke wrote:

> Neither of these goes as far as overriding malloc itself. Instead they define their own allocator functions and then use those instead of calling malloc directly.

My impression is that the OP was actually talking about constructing a new default malloc() zone but using jemalloc's routines rather than the historic OS X implementation, which is what's making matters difficult.

As I hinted, if it were *my* application, I'd be looking at why it's calling free() so much. If that's happening, perhaps it's allocating and releasing too much and maybe caching and re-using objects might help (for instance).

Kind regards,

Alastair.

--
http://alastairs-place.net



Jens Alfke

unread,
May 7, 2010, 6:12:14 PM5/7/10
to Alastair Houghton, Darwin Dev

On May 7, 2010, at 2:19 PM, Alastair Houghton wrote:

> As I hinted, if it were *my* application, I'd be looking at why it's
> calling free() so much. If that's happening, perhaps it's
> allocating and releasing too much and maybe caching and re-using
> objects might help (for instance).

I believe Joel said it's primarily JavaScript objects, and there's not
much the app can do about that. As a scripting language, JS creates
objects promiscuously, and the collector has to free them all.
(Collectors that do their own allocation can avoid most of the free
overhead, but not ones that are layered on malloc.)

—Jens _______________________________________________

Jean-Daniel Dupas

unread,
May 7, 2010, 8:11:15 PM5/7/10
to Jens Alfke, Darwin Dev

Le 7 mai 2010 à 22:30, Jens Alfke a écrit :

>
> On May 7, 2010, at 2:59 AM, Alastair Houghton wrote:
>
>> Why do that? My understanding is that jemalloc's primary advantage is massive scalability for heavily threaded code. Firefox isn't *that* heavily threaded, right?
>> In any event, the system allocator on Mac OS X has always been pretty good, and it looks like the Snow Leopard version is already designed with per-processor "magazines" for small allocations.
>
> Safari has long used a custom allocator (tcmalloc) for performance reasons. It makes a significant difference on standard browser benchmarks, I'm told, although I heard that before 10.6 came out. Chrome uses tcmalloc on Windows and Linux, and probably will eventually on Mac too.
>
> Neither of these goes as far as overriding malloc itself. Instead they define their own allocator functions and then use those instead of calling malloc directly. (Simply overriding ::operator new() gets you a lot of the way there, for a C++ app.)
>
> In the case of JavaScript (or other GC'd languages) you get the biggest speed boost from not using a normal allocator at all. The usage patterns of a GC system are very different because freeing blocks happens rarely and in huge batches. A typical GC allocator will allocate space by simply bumping a global pointer, and trigger a minor collection when the pointer hits the end of the space. This is one of the things that makes V8 fast.

I agree for Javascript. The OP didn't tell us it was for Javascript in its first mail.
As suggested sooner, if the bottleneck is in the Javascript library, it makes sense to use a custom allocator for this specific part. But I would not recommend replacing the default malloc for the whole application unless it proves useful.

-- Jean-Daniel
Reply all
Reply to author
Forward
0 new messages