Re: MmapAllocator

Reid Kleckner

unread,

Aug 2, 2010, 4:27:19 PM8/2/10

to Steven Noonan, Unladen Swallow

The SlabAllocator interface should already solve this problem. You
don't need to replace the BumpPtrAllocator, you just need to pass some
class that inherits from SlabAllocator to its constructor. Note that
the Deallocate method takes a MemSlab * and not a void *. The
comments above it mention that it does this for exactly the reason
you're describing. You should just be able to just access the Size
field of the MemSlab.

Reid

On Mon, Aug 2, 2010 at 12:15 PM, Steven Noonan <ste...@uplinklabs.net> wrote:
> Hey Reid,
>
> I'm working on the mmap-based allocator, and encountering an issue.
> free()/delete/delete[] don't demand that the programmer specify the
> size of the block to be deallocated, but munmap() does. Any idea what
> would be the best way to keep track of the size of previous
> allocations? It just seems like I'd need to track every single
> pointer/size tuple, and I can't think of a particularly efficient way
> to do so.
>
> - Steven
>

Steven Noonan

unread,

Aug 2, 2010, 5:33:42 PM8/2/10

to Reid Kleckner, Unladen Swallow

Whoops, totally missed that. I should have looked at what MemSlab consisted of.

- Steven

Reid Kleckner

unread,

Aug 2, 2010, 8:17:45 PM8/2/10

to Steven Noonan, Unladen Swallow

My first instinct is to keep all the mmap related code in
llvm/lib/System/Unix/Memory.inc . They already have a variety of
portability ifdefs for Unix platforms, so I would refactor that so you
can allocate read/write as well as read/write/execute memory, and put
an AllocateRW stub in the Windows implementation that just calls
llvm_report_error or something.

Then, when we thread through the option for using the mmap-based
allocator, in our code we can avoid using the mmap-based allocator on
Windows.

That's all kind of a pain in the butt, though, so I'd do whatever's
easiest first so we can see if this whole thing is worth the while to
get moved upstream. Long-term, it probably wouldn't be that hard to
implement for Windows. Looking at the current AllocateRWX code, it
seems that VirtualAlloc is roughly the same as mmap.

Reid

On Mon, Aug 2, 2010 at 4:59 PM, Steven Noonan <ste...@uplinklabs.net> wrote:
> What do you think should be done for Windows? Should we just #ifdef
> LLVM_ON_UNIX around the definition of MmapAllocator in Allocator.h?
> Also, what about the variations between different UNIX
> implementations? For instance, on Mac OS X, they provide MAP_ANON,
> while on Linux you have to mmap /dev/zero. Should #ifdef __APPLE__
> with an #else for everything else?
>
> - Steven

Steven Noonan

unread,

Aug 2, 2010, 7:59:09 PM8/2/10

to Reid Kleckner, Unladen Swallow

What do you think should be done for Windows? Should we just #ifdef
LLVM_ON_UNIX around the definition of MmapAllocator in Allocator.h?
Also, what about the variations between different UNIX
implementations? For instance, on Mac OS X, they provide MAP_ANON,
while on Linux you have to mmap /dev/zero. Should #ifdef __APPLE__
with an #else for everything else?

- Steven

Reid Kleckner

unread,

Aug 3, 2010, 11:13:29 AM8/3/10

to Steven Noonan, Unladen Swallow

On Tue, Aug 3, 2010 at 5:59 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
> Well, it may actually not matter. Trying this out on Mac OS X, I'm
> noticing that it doesn't seem to make a difference as far as
> steady-state goes. Some of the spikes are somehow reduced in size (49M
> -> 40M with bm_spambayes.py), but overall, I'm not seeing much of a
> difference. I'm going to Linuxify my changes and then test with
> perf.py and see if the difference is significant.

That's unexpected. It may be because Instruments doesn't track memory
that is mmap'd instead of malloc'd. I wouldn't expect instruments to
report a drop in the steady state memory usage, though.

> I still think the Unladen Swallow performance tester (perf.py) is kind
> of weak. It only watches the _maximum_ memory usage, and doesn't track
> other potentially important data. It seems like there should be a
> better set of data to track as far as memory usage goes, but I can't
> think of one off the top of my head. One reason this is so difficult
> is because with most testing methods, you either have to monitor the
> extremes (i.e. the max, which we watch right now) or the
> average/median. If you don't just do sampling of the memory usage, you
> have to assume that each run will have a nearly identical memory
> profile (versus time) and compare two runs at specific points in time.
> Not sure how perf.py can be improved, but tracking maximum memory
> usage isn't winning us any points.

perf.py doesn't track the maximum, it tracks dirty pages. The problem
is that if one uses malloc/free, even after the free the page is still
dirty until it gets swapped out. In the absence of pressure from
other system activity, the kernel won't swap that out so that other
processes can use it. By using mmap, we can tell it that we really
don't need this memory right now, and other processes can allocate it.

The metric perf.py uses is also important because it's what people
will see in top or their favorite task manager.

Reid

Steven Noonan

unread,

Aug 3, 2010, 11:26:26 AM8/3/10

to Reid Kleckner, Unladen Swallow

On Tue, Aug 3, 2010 at 8:13 AM, Reid Kleckner <reid.k...@gmail.com> wrote:
> On Tue, Aug 3, 2010 at 5:59 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
>> Well, it may actually not matter. Trying this out on Mac OS X, I'm
>> noticing that it doesn't seem to make a difference as far as
>> steady-state goes. Some of the spikes are somehow reduced in size (49M
>> -> 40M with bm_spambayes.py), but overall, I'm not seeing much of a
>> difference. I'm going to Linuxify my changes and then test with
>> perf.py and see if the difference is significant.
>
> That's unexpected. It may be because Instruments doesn't track memory
> that is mmap'd instead of malloc'd. I wouldn't expect instruments to
> report a drop in the steady state memory usage, though.

The entire memory profile looks exactly the same, save for the spikes
being shorter than before.

>> I still think the Unladen Swallow performance tester (perf.py) is kind
>> of weak. It only watches the _maximum_ memory usage, and doesn't track
>> other potentially important data. It seems like there should be a
>> better set of data to track as far as memory usage goes, but I can't
>> think of one off the top of my head. One reason this is so difficult
>> is because with most testing methods, you either have to monitor the
>> extremes (i.e. the max, which we watch right now) or the
>> average/median. If you don't just do sampling of the memory usage, you
>> have to assume that each run will have a nearly identical memory
>> profile (versus time) and compare two runs at specific points in time.
>> Not sure how perf.py can be improved, but tracking maximum memory
>> usage isn't winning us any points.
>
> perf.py doesn't track the maximum, it tracks dirty pages. The problem
> is that if one uses malloc/free, even after the free the page is still
> dirty until it gets swapped out. In the absence of pressure from
> other system activity, the kernel won't swap that out so that other
> processes can use it. By using mmap, we can tell it that we really
> don't need this memory right now, and other processes can allocate it.
>
> The metric perf.py uses is also important because it's what people
> will see in top or their favorite task manager.

I see. Okay.

Steven Noonan

unread,

Aug 3, 2010, 8:59:56 AM8/3/10

to Reid Kleckner, Unladen Swallow

Well, it may actually not matter. Trying this out on Mac OS X, I'm
noticing that it doesn't seem to make a difference as far as
steady-state goes. Some of the spikes are somehow reduced in size (49M
-> 40M with bm_spambayes.py), but overall, I'm not seeing much of a
difference. I'm going to Linuxify my changes and then test with
perf.py and see if the difference is significant.

I still think the Unladen Swallow performance tester (perf.py) is kind

of weak. It only watches the _maximum_ memory usage, and doesn't track
other potentially important data. It seems like there should be a
better set of data to track as far as memory usage goes, but I can't
think of one off the top of my head. One reason this is so difficult
is because with most testing methods, you either have to monitor the
extremes (i.e. the max, which we watch right now) or the
average/median. If you don't just do sampling of the memory usage, you
have to assume that each run will have a nearly identical memory
profile (versus time) and compare two runs at specific points in time.
Not sure how perf.py can be improved, but tracking maximum memory
usage isn't winning us any points.

- Steven

Steven Noonan

unread,

Aug 3, 2010, 2:12:43 PM8/3/10

to Reid Kleckner, Unladen Swallow

On Tue, Aug 3, 2010 at 8:26 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
> On Tue, Aug 3, 2010 at 8:13 AM, Reid Kleckner <reid.k...@gmail.com> wrote:
>> On Tue, Aug 3, 2010 at 5:59 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
>>> Well, it may actually not matter. Trying this out on Mac OS X, I'm
>>> noticing that it doesn't seem to make a difference as far as
>>> steady-state goes. Some of the spikes are somehow reduced in size (49M
>>> -> 40M with bm_spambayes.py), but overall, I'm not seeing much of a
>>> difference. I'm going to Linuxify my changes and then test with
>>> perf.py and see if the difference is significant.
>>
>> That's unexpected. It may be because Instruments doesn't track memory
>> that is mmap'd instead of malloc'd. I wouldn't expect instruments to
>> report a drop in the steady state memory usage, though.
>
> The entire memory profile looks exactly the same, save for the spikes
> being shorter than before.

Here's the result on Linux: tinyurl.com/32lco84

The one labeled 'unladen-trunk-release' is the official LLVM tree,
while 'unladen-test-release' is the LLVM with mmap allocator.

It's something, anyway. Thoughts?

Reid Kleckner

unread,

Aug 4, 2010, 12:12:00 AM8/4/10

to Steven Noonan, Unladen Swallow

On Tue, Aug 3, 2010 at 11:12 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
> On Tue, Aug 3, 2010 at 8:26 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
>> On Tue, Aug 3, 2010 at 8:13 AM, Reid Kleckner <reid.k...@gmail.com> wrote:
>>> On Tue, Aug 3, 2010 at 5:59 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
>>> That's unexpected. It may be because Instruments doesn't track memory
>>> that is mmap'd instead of malloc'd. I wouldn't expect instruments to
>>> report a drop in the steady state memory usage, though.
>>
>> The entire memory profile looks exactly the same, save for the spikes
>> being shorter than before.
>
> Here's the result on Linux: tinyurl.com/32lco84
>
> The one labeled 'unladen-trunk-release' is the official LLVM tree,
> while 'unladen-test-release' is the LLVM with mmap allocator.
>
> It's something, anyway. Thoughts?

Hey, that's cool!

I wonder why it stiches up and down instead of flatlining like it used
to. I can't imagine that we're doing compilations at each of those
little spikes. I'd look into it, but it's probably not worth the
effort.

I'm also curious why the graph is shifted right almost perfectly. I
guess it has to do with the way that perf.py monitors memory usage
asynchronously.

I guess going from here I'd look more places that LLVM uses
BumpPtrAllocators and play the same game.

I'm looking at the RecyclingAllocator on line 150 of
include/llvm/CodeGen/SelectionDAG.h . Those are the SDNodes I think
you saw previously in Instruments.

Alternatively, we can whack those allocations by switching to
fastisel, but that has other performance consequences.

Speaking of which, does this change affect performance at all?
mmap/munmap is more expensive than malloc/free.

We should gauge the LLVM community's interest in this kind of change.
Depending on how the allocators are used in clang/llvm, we may be able
to just change the default to use mmap. However, if they create and
destroy these allocators for every function they codegen, the
mmap/munmap overhead might be too much, and we'd need to see how
receptive they are to threading an option to enable mmap through the
code somehow.

Reid

Reid Kleckner

unread,

Aug 4, 2010, 12:24:26 AM8/4/10

to Steven Noonan, Unladen Swallow

Oh yeah, one thing I forgot is that since we no longer have LLVM in
tree, we should figure out some way of doing code review, so I can
give you more feedback on this as it comes along. We should use
Rietveld + whatever DVCS you're most comfortable with, probably.
There are a couple of existing mirrors for hg and git I think. Don't
create your own and try to pull all the history, because the SVN
server won't like you.

Reid

Reid Kleckner

unread,

Aug 4, 2010, 1:25:36 AM8/4/10

to Steven Noonan, Unladen Swallow

On Tue, Aug 3, 2010 at 10:17 PM, Steven Noonan <ste...@uplinklabs.net> wrote:
> On Tue, Aug 3, 2010 at 10:17 PM, Steven Noonan <ste...@uplinklabs.net> wrote:
>>
>> I've been using a git mirror (which I am in turn mirroring on
>> git.uplinklabs.net). See here:
>> http://git.uplinklabs.net/tycho/mirrors/llvm/llvm.git and here:
>> http://git.uplinklabs.net/tycho/mirrors/llvm/clang.git
>>
>> I can also set up Reitveld on that server, if that works for you.
>
> Doh. "Rietveld". Butterfingers.

You don't need to set anything up, it should just work with
codereview.appspot.com and upload.py (option 3 in the link below).
I've never used it with git, so our mileage may vary.
http://code.google.com/p/rietveld/wiki/CodeReviewHelp

Rietveld has a variety of usability issues, but the goal is to be able
to make inline comments on the code like this:
http://codereview.appspot.com/1905048/diff/7001/8012

Reid

Steven Noonan

unread,

Aug 4, 2010, 1:17:07 AM8/4/10

to Reid Kleckner, Unladen Swallow

I've been using a git mirror (which I am in turn mirroring on

git.uplinklabs.net). See here:
http://git.uplinklabs.net/tycho/mirrors/llvm/llvm.git and here:
http://git.uplinklabs.net/tycho/mirrors/llvm/clang.git

I can also set up Reitveld on that server, if that works for you.

- Steven

Steven Noonan

unread,

Aug 4, 2010, 1:17:44 AM8/4/10

to Reid Kleckner, Unladen Swallow

Doh. "Rietveld". Butterfingers.

- Steven

Steven Noonan

unread,

Aug 4, 2010, 4:21:54 AM8/4/10

to Reid Kleckner, Unladen Swallow

Hah, got it working, with Google Apps.

Here's the relevant code review item: http://codereview.uplinklabs.net/1905049

- Steven

Reid Kleckner

unread,

Aug 4, 2010, 12:27:18 PM8/4/10

to Steven Noonan, Unladen Swallow

Ouch, I see what you mean about measuring the max. We should probably
also report something else there, like "steady state". I dono, I'd
just take the average of all of the sample points after the middle
point.

Reid

On Wed, Aug 4, 2010 at 1:33 AM, Steven Noonan <ste...@uplinklabs.net> wrote:

> Here are the full stats for MmapAllocator v. MallocAllocator as far as
> memory goes. I'm going to re-run this to compare performance
> overnight.
>
>
> ### 2to3 ###
> Mem max: 39008.000 -> 38904.000: 1.0027x smaller
> Usage over time: http://tinyurl.com/3axczjc
>
> ### bzr_startup ###
> Mem max: 11996.000 -> 11984.000: 1.0010x smaller
> Usage over time: http://tinyurl.com/2ucnbhb
>
> ### call_method ###
> Mem max: 11632.000 -> 11544.000: 1.0076x smaller
> Usage over time: http://tinyurl.com/22r6y9r
>
> ### call_method_slots ###
> Mem max: 10908.000 -> 10820.000: 1.0081x smaller
> Usage over time: http://tinyurl.com/3yqb2r4
>
> ### call_method_unknown ###
> Mem max: 11216.000 -> 11152.000: 1.0057x smaller
> Usage over time: http://tinyurl.com/3ypudfj
>
> ### call_simple ###
> Mem max: 10692.000 -> 10540.000: 1.0144x smaller
> Usage over time: http://tinyurl.com/2a62cbv
>
> ### django ###
> Mem max: 21600.000 -> 20672.000: 1.0449x smaller
> Usage over time: http://tinyurl.com/35rclpd
>
> ### float ###
> Mem max: 15904.000 -> 15852.000: 1.0033x smaller
> Usage over time: http://tinyurl.com/2vokmep
>
> ### hg_startup ###
> Mem max: 7000.000 -> 7012.000: 1.0017x larger
> Usage over time: http://tinyurl.com/3x4wneu
>
> ### iterative_count ###
> Mem max: 9992.000 -> 9908.000: 1.0085x smaller
> Usage over time: http://tinyurl.com/24dy7ql
>
> ### nbody ###
> Mem max: 13552.000 -> 13240.000: 1.0236x smaller
> Usage over time: http://tinyurl.com/23dstyu
>
> ### normal_startup ###
> Mem max: 5380.000 -> 5396.000: 1.0030x larger
> Usage over time: http://tinyurl.com/2fh7cmv
>
> ### nqueens ###
> Mem max: 12832.000 -> 12756.000: 1.0060x smaller
> Usage over time: http://tinyurl.com/29whema
>
> ### pickle ###
> Mem max: 6856.000 -> 6844.000: 1.0018x smaller
> Usage over time: http://tinyurl.com/3az5v6y
>
> ### pickle_dict ###
> Mem max: 6848.000 -> 6836.000: 1.0018x smaller
> Usage over time: http://tinyurl.com/2bkjdoh
>
> ### pickle_list ###
> Mem max: 6836.000 -> 6824.000: 1.0018x smaller
> Usage over time: http://tinyurl.com/23llzct
>
> ### pybench ###
> Benchmark does not report memory usage yet
>
> ### regex_compile ###
> Mem max: 39176.000 -> 38536.000: 1.0166x smaller
> Usage over time: http://tinyurl.com/33wylgu
>
> ### regex_effbot ###
> Mem max: 12340.000 -> 12084.000: 1.0212x smaller
> Usage over time: http://tinyurl.com/37u84z4
>
> ### regex_v8 ###
> Mem max: 33596.000 -> 33828.000: 1.0069x larger
> Usage over time: http://tinyurl.com/397hyfm
>
> ### richards ###
> Mem max: 12760.000 -> 12680.000: 1.0063x smaller
> Usage over time: http://tinyurl.com/25n3wkl
>
> ### rietveld ###
> Mem max: 29008.000 -> 28636.000: 1.0130x smaller
> Usage over time: http://tinyurl.com/25uu4x3
>
> ### slowpickle ###
> Mem max: 14096.000 -> 13804.000: 1.0212x smaller
> Usage over time: http://tinyurl.com/2wzxmu2
>
> ### slowspitfire ###
> Mem max: 94292.000 -> 93992.000: 1.0032x smaller
> Usage over time: http://tinyurl.com/2wo4lrs
>
> ### slowunpickle ###
> Mem max: 11620.000 -> 11516.000: 1.0090x smaller
> Usage over time: http://tinyurl.com/26pw2cr
>
> ### spambayes ###
> Mem max: 35896.000 -> 35860.000: 1.0010x smaller
> Usage over time: http://tinyurl.com/2dhtkeb
>
> ### spitfire ###
> Command '['/home/tycho/Development/unladen-test-release/bin/python2.6',
> 'setup.py', 'build', '--build-lib=/tmp/tmpWt4m4M']' returned non-zero
> exit status 1
>
> ### startup_nosite ###
> Mem max: 4876.000 -> 4868.000: 1.0016x smaller
> Usage over time: http://tinyurl.com/3agcts8
>
> ### threaded_count ###
> Mem max: 10048.000 -> 9972.000: 1.0076x smaller
> Usage over time: http://tinyurl.com/322hltw
>
> ### unpack_sequence ###
> Mem max: 11252.000 -> 11244.000: 1.0007x smaller
> Usage over time: http://tinyurl.com/34lsbqd
>
> ### unpickle ###
> Mem max: 6872.000 -> 6860.000: 1.0017x smaller
> Usage over time: http://tinyurl.com/2aeqeua
>
> ### unpickle_list ###
> Mem max: 6860.000 -> 6844.000: 1.0023x smaller
> Usage over time: http://tinyurl.com/36q766k
>
>
> - Steven
>

Steven Noonan

unread,

Aug 4, 2010, 4:33:24 AM8/4/10

to Reid Kleckner, Unladen Swallow

Here are the full stats for MmapAllocator v. MallocAllocator as far as

Steven Noonan

unread,

Aug 4, 2010, 12:40:14 PM8/4/10

to Reid Kleckner, Unladen Swallow

On Wed, Aug 4, 2010 at 9:27 AM, Reid Kleckner <reid.k...@gmail.com> wrote:
> Ouch, I see what you mean about measuring the max. We should probably
> also report something else there, like "steady state". I dono, I'd
> just take the average of all of the sample points after the middle
> point.
>

And here are the performance numbers. Overall, it looks like
mmap()/munmap() has no *real* performance impact. If anything, most of
these are faster.

Report on Linux xerxes 2.6.33.6 #1 SMP Fri Jul 9 02:53:04 PDT 2010
i686 Genuine Intel(R) CPU T2300 @ 1.66GHz
Total CPU cores: 2

### 2to3 ###
35.590589 -> 35.824554: 1.0066x slower

### bzr_startup ###
Min: 0.157976 -> 0.155976: 1.0128x faster
Avg: 0.167575 -> 0.168924: 1.0081x slower
Not significant
Stddev: 0.00334 -> 0.00716: 2.1463x larger
Timeline: http://tinyurl.com/39thymp

### call_method ###
Min: 0.878663 -> 0.884666: 1.0068x slower
Avg: 0.887148 -> 0.888667: 1.0017x slower
Not significant
Stddev: 0.02062 -> 0.02074: 1.0058x larger
Timeline: http://tinyurl.com/2fm39l2

### call_method_slots ###
Min: 0.872706 -> 0.867387: 1.0061x faster
Avg: 0.877261 -> 0.872754: 1.0052x faster
Significant (t=2.510615)
Stddev: 0.01523 -> 0.01586: 1.0410x larger
Timeline: http://tinyurl.com/3x84s9m

### call_method_unknown ###
Min: 1.031445 -> 1.028433: 1.0029x faster
Avg: 1.039063 -> 1.034296: 1.0046x faster
Not significant
Stddev: 0.03708 -> 0.03702: 1.0016x smaller
Timeline: http://tinyurl.com/395gevs

### call_simple ###
Min: 0.594110 -> 0.589934: 1.0071x faster
Avg: 0.606276 -> 0.594366: 1.0200x faster
Significant (t=5.874137)
Stddev: 0.01760 -> 0.01752: 1.0049x smaller
Timeline: http://tinyurl.com/2a2zv56

### django ###
Min: 0.997650 -> 0.993266: 1.0044x faster
Avg: 0.999423 -> 0.995495: 1.0039x faster
Significant (t=18.075408)
Stddev: 0.00093 -> 0.00122: 1.3050x larger
Timeline: http://tinyurl.com/28oa6wo

### float ###
Min: 0.102826 -> 0.102910: 1.0008x slower
Avg: 0.110088 -> 0.110280: 1.0017x slower
Not significant
Stddev: 0.02758 -> 0.02762: 1.0015x larger
Timeline: http://tinyurl.com/2w6ol8d

### hg_startup ###
Min: 0.045993 -> 0.044993: 1.0222x faster
Avg: 0.053388 -> 0.053510: 1.0023x slower
Not significant
Stddev: 0.00250 -> 0.00258: 1.0322x larger
Timeline: http://tinyurl.com/2ec392w

### iterative_count ###
Min: 0.157216 -> 0.156526: 1.0044x faster
Avg: 0.166971 -> 0.166897: 1.0004x faster
Not significant
Stddev: 0.06835 -> 0.07249: 1.0604x larger
Timeline: http://tinyurl.com/2g9agwl

### nbody ###
Min: 0.443087 -> 0.464941: 1.0493x slower
Avg: 0.456435 -> 0.475809: 1.0424x slower
Not significant
Stddev: 0.05609 -> 0.05523: 1.0156x smaller
Timeline: http://tinyurl.com/2wd6z8r

### normal_startup ###
Min: 0.438015 -> 0.437763: 1.0006x faster
Avg: 0.438425 -> 0.438810: 1.0009x slower
Not significant
Stddev: 0.00024 -> 0.00274: 11.6231x larger
Timeline: http://tinyurl.com/34nunk3

### nqueens ###
Min: 0.693033 -> 0.698259: 1.0075x slower
Avg: 0.698948 -> 0.704770: 1.0083x slower
Not significant
Stddev: 0.02644 -> 0.02590: 1.0208x smaller
Timeline: http://tinyurl.com/39ydyjs

### pickle ###
Min: 1.654750 -> 1.669246: 1.0088x slower
Avg: 1.660298 -> 1.673813: 1.0081x slower
Significant (t=-17.007317)
Stddev: 0.00391 -> 0.00403: 1.0298x larger
Timeline: http://tinyurl.com/36zz8yk

### pickle_dict ###
Min: 1.859310 -> 1.862217: 1.0016x slower
Avg: 1.864953 -> 1.863408: 1.0008x faster
Significant (t=2.269051)
Stddev: 0.00300 -> 0.00377: 1.2590x larger
Timeline: http://tinyurl.com/32kz4l6

### pickle_list ###
Min: 1.059003 -> 1.045209: 1.0132x faster
Avg: 1.065780 -> 1.048728: 1.0163x faster
Significant (t=21.791102)
Stddev: 0.00413 -> 0.00368: 1.1223x smaller
Timeline: http://tinyurl.com/27rpxol

### pybench ###
Min: 11461 -> 11472: 1.0010x slower
Avg: 16029 -> 16073: 1.0027x slower

### regex_compile ###
Min: 0.828427 -> 0.832179: 1.0045x slower
Avg: 0.890980 -> 0.894830: 1.0043x slower
Not significant
Stddev: 0.26185 -> 0.26180: 1.0002x smaller
Timeline: http://tinyurl.com/38c3z8m

### regex_effbot ###
Min: 0.162540 -> 0.162873: 1.0020x slower
Avg: 0.167092 -> 0.167389: 1.0018x slower
Not significant
Stddev: 0.02830 -> 0.02830: 1.0001x smaller
Timeline: http://tinyurl.com/33r7s5y

### regex_v8 ###
Min: 0.164368 -> 0.163174: 1.0073x faster
Avg: 0.417027 -> 0.416113: 1.0022x faster
Not significant
Stddev: 0.86580 -> 0.86190: 1.0045x smaller
Timeline: http://tinyurl.com/3yabd5v

### richards ###
Min: 0.352872 -> 0.353289: 1.0012x slower
Avg: 0.354989 -> 0.355434: 1.0013x slower
Not significant
Stddev: 0.00543 -> 0.00549: 1.0108x larger
Timeline: http://tinyurl.com/36weagp

### rietveld ###
Min: 0.693800 -> 0.695649: 1.0027x slower
Avg: 0.984689 -> 0.984075: 1.0006x faster
Not significant
Stddev: 0.36204 -> 0.36497: 1.0081x larger
Timeline: http://tinyurl.com/2wfw2z5

### slowpickle ###
Min: 0.772514 -> 0.757389: 1.0200x faster
Avg: 0.821586 -> 0.805534: 1.0199x faster
Not significant
Stddev: 0.17998 -> 0.18494: 1.0275x larger
Timeline: http://tinyurl.com/37rb8d5

### slowspitfire ###
Min: 1.022256 -> 1.023268: 1.0010x slower
Avg: 1.023244 -> 1.024326: 1.0011x slower
Significant (t=-6.305389)
Stddev: 0.00022 -> 0.00119: 5.3064x larger
Timeline: http://tinyurl.com/29h96r4

### slowunpickle ###
Min: 0.384167 -> 0.380310: 1.0101x faster
Avg: 0.410666 -> 0.409001: 1.0041x faster
Not significant
Stddev: 0.08844 -> 0.09114: 1.0305x larger
Timeline: http://tinyurl.com/286o5wt

### spambayes ###
Min: 0.417142 -> 0.398649: 1.0464x faster
Avg: 0.598665 -> 0.574443: 1.0422x faster
Not significant
Stddev: 0.58158 -> 0.57470: 1.0120x smaller
Timeline: http://tinyurl.com/23ozclq

### spitfire ###
Command '['/home/tycho/Development/unladen-test-release/bin/python2.6',

'setup.py', 'build', '--build-lib=/tmp/tmpzTxep8']' returned non-zero
exit status 1

### startup_nosite ###
Min: 0.333111 -> 0.332463: 1.0019x faster
Avg: 0.338470 -> 0.335088: 1.0101x faster
Significant (t=3.325272)
Stddev: 0.00886 -> 0.00500: 1.7726x smaller
Timeline: http://tinyurl.com/23ob2vy

### threaded_count ###
Min: 0.181111 -> 0.182459: 1.0074x slower
Avg: 0.214138 -> 0.211476: 1.0126x faster
Not significant
Stddev: 0.14987 -> 0.14531: 1.0314x smaller
Timeline: http://tinyurl.com/2dkupks

### unpack_sequence ###
Min: 0.000217 -> 0.000218: 1.0033x slower
Avg: 0.000223 -> 0.000222: 1.0048x faster
Significant (t=11.155116)
Stddev: 0.00002 -> 0.00001: 3.0782x smaller
Timeline: http://tinyurl.com/26tzfco

### unpickle ###
Min: 1.187517 -> 1.173118: 1.0123x faster
Avg: 1.212250 -> 1.178640: 1.0285x faster
Significant (t=9.457625)
Stddev: 0.02461 -> 0.00508: 4.8419x smaller
Timeline: http://tinyurl.com/2djz64c

### unpickle_list ###
Min: 1.106733 -> 1.086683: 1.0185x faster
Avg: 1.147736 -> 1.129444: 1.0162x faster
Significant (t=4.530626)
Stddev: 0.01237 -> 0.02573: 2.0800x larger
Timeline: http://tinyurl.com/2fctpe9

- Steven

Reid Kleckner

unread,

Aug 7, 2010, 8:52:03 PM8/7/10

to Steven Noonan, Unladen Swallow

On Wed, Aug 4, 2010 at 9:40 AM, Steven Noonan <ste...@uplinklabs.net> wrote:
> On Wed, Aug 4, 2010 at 9:27 AM, Reid Kleckner <reid.k...@gmail.com> wrote:
>> Ouch, I see what you mean about measuring the max. We should probably
>> also report something else there, like "steady state". I dono, I'd
>> just take the average of all of the sample points after the middle
>> point.
>>
>
> And here are the performance numbers. Overall, it looks like
> mmap()/munmap() has no *real* performance impact. If anything, most of
> these are faster.

In the future, I wouldn't worry about benchmarks not in the default set.

I'm worried about the 5% swings in nbody and some others, but I'm
willing to ignore them and call it noise or cache effects.

Looks good though. We need to kick off a discussion with the LLVM
folks about this. I showed the perf.py memory usage graph to Nick
Lewycky, and he said other users might be interested in this, so I
think it's worth pursuing.

Reid

Steven Noonan

unread,

Aug 7, 2010, 9:00:23 PM8/7/10

to Reid Kleckner, Unladen Swallow

Cool to hear. So where do we start with getting feedback from the LLVM
folks? Should I submit a patch and put it on their bug tracker, or
send a message to their mailing list, or what?

- Steven

Reid Kleckner

unread,

Aug 7, 2010, 9:11:39 PM8/7/10

to Steven Noonan, Unladen Swallow

Whoops, +list.

On Sat, Aug 7, 2010 at 6:11 PM, Reid Kleckner <reid.k...@gmail.com> wrote:

> For this, I'd send mail to llv...@cs.uiuc.edu with the relevant graph
> from the LiveRanges change. If they're positive on this idea, then
> they'll probably have opinions about how they want it implemented, ie
> a command line flag (the easy cop-out) or some kind of option threaded
> through the JIT to CodeGen.
>
> Reid
>

Steven Noonan

unread,

Aug 7, 2010, 10:05:20 PM8/7/10

to LLVM Development Mailing List, Unladen Swallow

Hi folks,

I've been doing work on memory reduction in Unladen Swallow, and
during testing, LiveRanges seemed to be consuming one of the largest
chunks of memory. I wrote a replacement allocator for use by
BumpPtrAllocator which uses mmap()/munmap() in place of
malloc()/free(). It has worked flawlessly in testing, and reduces
memory usage quite nicely in Unladen Swallow.

The code is available for review here. I'd appreciate feedback if
there's an interest in integrating this into LLVM trunk:
http://codereview.uplinklabs.net/1905049

Here are the results of our memory utilization tests. The 'Mem max'
numbers aren't particularly revealing though, so take a look at the
graphs. I think the spambayes benchmark was one of the most
interesting.

### regex_compile ###

### startup_nosite ###

Mem max: 4876.000 -> 4868.000: 1.0016x smaller
Usage over time: http://tinyurl.com/3agcts8

### threaded_count ###
Mem max: 10048.000 -> 9972.000: 1.0076x smaller
Usage over time: http://tinyurl.com/322hltw

### unpack_sequence ###
Mem max: 11252.000 -> 11244.000: 1.0007x smaller
Usage over time: http://tinyurl.com/34lsbqd

### unpickle ###
Mem max: 6872.000 -> 6860.000: 1.0017x smaller
Usage over time: http://tinyurl.com/2aeqeua

### unpickle_list ###
Mem max: 6860.000 -> 6844.000: 1.0023x smaller
Usage over time: http://tinyurl.com/36q766k

And to gauge the performance impact, I also ran the speed tests. It
seems using mmap()/munmap() has very little performance impact in
either direction, so that's good:

### regex_compile ###

### startup_nosite ###

Any thoughts?

- Steven

Reply all

Reply to author

Forward