kmem-pool-uvm

Lars Heidieker

unread,

Apr 8, 2011, 2:07:23 PM4/8/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I took Mindaugas idea of not converting kmem to use the kmem_map, but
to take the opportunity to figure out which allocations are actually
made from interrupt context during the malloc kmem transition and to
convert those allocation to either move allocation out of interrupt
context of make dedicated pools form them.

The patch provided includes the changed extent as well as a changed
implementation of kmem, which provides page aligned memory for
allocation >= PAGE_SIZE. The interface between the pool-subsystem and
uvm_km is changed by passing the pool_allocators page_size to the
uvm_km functions, the idea behind this is to have multiply default
pool_allocators with different pool-page-sizes to lower inner
fragmentation within the pools.
In order to support those different-pool-page-sizes the kernel_map and
kmem_map gained caches for virtual addresses not only for PAGE_SIZE
but low integer multiplies of PAGE_SIZE.
These large then PAGE_SIZE caches are used by the the larger the
PAGE_SIZE allocations in kmem.

Initialization of the kmem_map is moved to the uvm_km inialization.

The next steps I'd like to tackle are to remove the "lazy"
initialization from the pool-subsystem, but to have it initialized
during the uvm_km phase, this will allow for some cleanup in the pool
code as well as bringing different pool-page-sizes for the default
allocator to life.
This requires to move some pool_init in certain pmaps from
pmap_bootstrap to pmap_init this should have no ill effects as all
those pools require either the kernel_map or kmem_map to be installed
which isn't the case until uvm_km_init has run therefor no allocations
could have taken place before this point uvm_init anyway.

http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch
<http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch>

Lars

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2fTtsACgkQcxuYqjT7GRY6LgCfYcw6US1/XW6v402I7kh05NLW
GFQAnRgU2xFkgwPTx1YD3vsLx8UfuHOS
=xalA
-----END PGP SIGNATURE-----

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Mindaugas Rasiukevicius

unread,

Apr 9, 2011, 10:24:51 PM4/9/11

to

Lars Heidieker <la...@heidieker.de> wrote:
> The patch provided includes the changed extent as well as a changed
> implementation of kmem, which provides page aligned memory for
> allocation >= PAGE_SIZE. The interface between the pool-subsystem and
> uvm_km is changed by passing the pool_allocators page_size to the
> uvm_km functions, the idea behind this is to have multiply default
> pool_allocators with different pool-page-sizes to lower inner
> fragmentation within the pools.
> In order to support those different-pool-page-sizes the kernel_map and
> kmem_map gained caches for virtual addresses not only for PAGE_SIZE
> but low integer multiplies of PAGE_SIZE.
> These large then PAGE_SIZE caches are used by the the larger the
> PAGE_SIZE allocations in kmem.

Concerns/questions:

- What are the effects of replacing vmem(9) to fragmentation? Do you
have some numbers that kmem_cache_big_sizes is doing a good job?

- The patch decreases the variety of small-sized caches (e.g. no more
8, 24, 40, 56, etc). On what basis? Was it measured? It might be
sensitive! See here:

http://mail-index.netbsd.org/tech-kern/2009/01/11/msg003989.html

- Can you try regress/sys/kern/allocfree/allocfree.c with your patch?

If yamt@ says OK for replacing the use of vmem(9), as he probably has a
best understanding of fragmentation issues, then you have a green light
for the patch! :)

Also, I suppose by mistake, patch has removed kmem_poison_check() and the
following checks:

- KASSERT(!cpu_intr_p());
- KASSERT(!cpu_softintr_p());

These are must-have, critical mechanisms for catching bugs!

uvm_km_alloc_poolpage_cache() and uvm_km_alloc_poolpage() now take size
argument. Please KASSERT() that size is dividable by PAGE_SIZE. Also,
in uvm_km_free_poolpage_cache(), the following was removed:

-#if defined(DEBUG)
- pmap_update(pmap_kernel());
-#endif

Since interaction between layers here is already a bit confusing, this
definitely needs a comment i.e. why leaving a stale entry is OK. Which
is because of KVA reservation (as it is KVA cache), where TLB flush would
be performed in km_vacache_free(). (A case of KVA starvation, normally?)

>
> http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch
>

Few other comments on the patch.

Please use diff -p option when generating the patches. Also, splitting
mechanical parts and functional into a separate diffs would ease review,
and e.g. malloc to kmem conversions can/should be committed separately.

- hd = malloc(sizeof(*hd), M_DEVBUF, M_NOWAIT);
+ hd = kmem_alloc(sizeof(struct hook_desc), KM_NOSLEEP);

All such M_NOWAIT cases should be audited that there are no uses from
interrupt context, and if not - changed to KM_SLEEP (check for NULL can
be removed too). Generally, there should be no KM_NOSLEEP uses without
a very good reason.

- t = malloc(len + sizeof(struct m_tag), M_PACKET_TAGS, wait);
+ t = kmem_alloc(len + sizeof(struct m_tag),
+ wait ? KM_SLEEP : KM_NOSLEEP);

mbuf tags are created from interrupt context, so this is a bug (those
removed KASSERT()s would have caught)!

- free(corename, M_TEMP);
+ kmem_free(corename, strlen(corename) + 1);

Hmm, these are a bit inefficient. I have been thinking about this..
If, for nearly all cases, size is either constant or can-be/is normally
saved in some structures, and only strings are real, justified cases to
store the size within allocation, e.g. in a byte or word before the
actual string - perhaps add kmem_stralloc() and kmem_strfree()?

+ size += REDZONE_SIZE + SIZE_SIZE;
+ if ((index = ((size - 1) >> KMEM_TINY_SHIFT)) < (KMEM_TINY_MAXSIZE >> KMEM_TINY_SHIFT)) {
+ pc = kmem_cache_tiny[index];
+ } else if ((index = ((size - 1) >> KMEM_BIG_SHIFT)) < (KMEM_BIG_MAXSIZE >> KMEM_BIG_SHIFT)) {
+ pc = kmem_cache_big[index];
+ } else {
+ uvm_km_free(kernel_map, (vaddr_t)p, round_page(size), UVM_KMF_WIRED);
+ return;
+ }

Please KNF, clean-up (the whole patch). Lines should be no longer than 80
characters. When wrapping long lines, second level indents have four extra
spaces (not a tab). There are many cases in the patch with the trailing,
added or missed whitespaces, etc.

- * to only a part of an amap). if the malloc of the array fails
+ * to only a part of an amap). if the kmem of the array fails

In such places use word "allocation", no need to be allocator-specific. :)

rw_enter(&swap_syscall_lock, RW_WRITER);

- userpath = malloc(SWAP_PATH_MAX, M_TEMP, M_WAITOK);
+ userpath = kmem_alloc(SWAP_PATH_MAX, KM_SLEEP);
..
- free(userpath, M_TEMP);
+ kmem_free(userpath, SWAP_PATH_MAX);
rw_exit(&swap_syscall_lock);

Opportunity to move out kmem_*() outside the lock, as it is safe!

- sdp = malloc(sizeof *sdp, M_VMSWAP, M_WAITOK);
- spp = malloc(sizeof *spp, M_VMSWAP, M_WAITOK);
+ sdp = kmem_alloc(sizeof *sdp, KM_SLEEP);
+ spp = kmem_alloc(sizeof *spp, KM_SLEEP);
memset(sdp, 0, sizeof(*sdp));

KNF: sizeof *sdp -> sizeof(*sdp). Also, -memset(), +kmem_zalloc().

--
Mindaugas

Lars Heidieker

unread,

Apr 10, 2011, 3:51:10 PM4/10/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/10/11 04:24, Mindaugas Rasiukevicius wrote:
> Concerns/questions:
>
> - What are the effects of replacing vmem(9) to fragmentation? Do you
> have some numbers that kmem_cache_big_sizes is doing a good job?
>
> - The patch decreases the variety of small-sized caches (e.g. no more
> 8, 24, 40, 56, etc). On what basis? Was it measured? It might be
> sensitive! See here:
> http://mail-index.netbsd.org/tech-kern/2009/01/11/msg003989.html

The basis was to reduce the count of caches, with the allocations from
one smaller less populated cache going to the next bigger one, this is
subject to tuning.
The performance problems mentioned came from that there was no cache
larger then 128 byte, so there is nothing changed with the caches up
to page_size.

If it is worth having the large caches is a good question there are
some allocations within the 8kb and less in the larger pools, but they
are few and they seem to be static so it might be a good idea to life
without them.
Especially with the more generalized uvm_km_alloc_poolpage* functions
(see below)

> - Can you try regress/sys/kern/allocfree/allocfree.c with your patch?

I've tested different sizes up to page_size with kmem and pool_cache
scaling linearly.
(On my 4 core machine)

> If yamt@ says OK for replacing the use of vmem(9), as he probably has a
> best understanding of fragmentation issues, then you have a green light
> for the patch! :)
>

I've further changed the uvm_km_alloc_poolpage_cache /
uvm_km_free_poolpage_cache to call uvm_km_alloc_poolpage /
uvm_km_free_poolpage when the requested size is larger then the
largest VMK_VACACHE for the map.

This enables kmem to use uvm_km_alloc_poolpage_cache for allocations
that are larger then the largest cache.
Essentially making uvm_km_alloc_poolpage_cache a function that that
can allocated arbitrary sized chunks of memory utilizing the VA_CACHE
and uvm_km_alloc_poolpage the same without the cache.
(The names of these functions should be a more general term probably
as they allocated a slab of memory for higher level memory allocators).

The vmem could use these changed functions as it's back end too it
currently uses uvm_km_alloc uvm_km_free directly, which should result
in higher fragmentation of the kernel map.

> Also, I suppose by mistake, patch has removed kmem_poison_check() and the
> following checks:
>
> - KASSERT(!cpu_intr_p());
> - KASSERT(!cpu_softintr_p());
>
> These are must-have, critical mechanisms for catching bugs!
>

Yes, definitely I have dropped them when I had made kmem interrupt
safe and missed to bring them back.

I'll recheck those malloc -> kmem changes and separate them out.

> - free(corename, M_TEMP);
> + kmem_free(corename, strlen(corename) + 1);
>
> Hmm, these are a bit inefficient. I have been thinking about this..
> If, for nearly all cases, size is either constant or can-be/is normally
> saved in some structures, and only strings are real, justified cases to
> store the size within allocation, e.g. in a byte or word before the
> actual string - perhaps add kmem_stralloc() and kmem_strfree()?
>

It's not efficient ;-) I think we need a malloc/free like function
that encodes the allocated size in front of the memory anyway, there
are libs like libz that requires such functions and it can be used for
strings as well.
But beside those libs there are only strings.

KNFing in progress.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2iCi4ACgkQcxuYqjT7GRag9ACfeD2OP1QqPtIioYPF5g1mcnut
WJoAn1w0S1fWOKyAeRYivHP673Wiyp6r
=EiiC
-----END PGP SIGNATURE-----

YAMAMOTO Takashi

unread,

Apr 14, 2011, 3:05:31 AM4/14/11

to

hi,

> The patch provided includes the changed extent as well as a changed
> implementation of kmem, which provides page aligned memory for
> allocation >= PAGE_SIZE. The interface between the pool-subsystem and
> uvm_km is changed by passing the pool_allocators page_size to the
> uvm_km functions, the idea behind this is to have multiply default
> pool_allocators with different pool-page-sizes to lower inner
> fragmentation within the pools.
> In order to support those different-pool-page-sizes the kernel_map and
> kmem_map gained caches for virtual addresses not only for PAGE_SIZE
> but low integer multiplies of PAGE_SIZE.
> These large then PAGE_SIZE caches are used by the the larger the
> PAGE_SIZE allocations in kmem.

why do you want to make subr_kmem use uvm_km directly?
to simplify the code?
i don't want to see that change, unless there's a clear benefit.

let me explain some background. currently there are a number of
kernel_map related problems:

A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.

A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
the allocate-for-free problem. ie. to free memory, you might need to
split map-entries thus allocate some memory.

A-3. to solve A-2, there is map-entry-reservation mechanism. it's complicated
and broken.

B. kernel fault handling is complicated because it needs memory allocation
(eg. vm_anon) which needs some trick to avoid deadlock.

C. KVA allocation is complicated because it needs memory allocation
(eg. vm_map_entry) which needs some trick to avoid deadlock.

the most of the above can be solved by separating KVA allocation and
kernel fault handling. (except C, which will be merely moved to a
different place.)

i implemented subr_vmem so that eventually it can be used as the primary
KVA allocator. ie. when allocating from kernel_map, allocate KVA from
kernel_va_arena first and then, if and only if necessary, register it to
kernel_map for fault handling. it probably allows us to remove VACACHE
stuff, too. kmem_alloc will be backed by a vmem arena which is backed by
kernel_va_arena.

(well, optimizations like direct-mapping etc would be useful, but they
don't change the big picture.)

(while the current implementation of vmem depends on malloc(), it can be
fixed. vmem branch has the bootstrap code. it's a little outdated, tho.)

YAMAMOTO Takashi

Lars Heidieker

unread,

Apr 15, 2011, 5:51:01 AM4/15/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On 04/14/11 09:05, YAMAMOTO Takashi wrote:
> why do you want to make subr_kmem use uvm_km directly?
> to simplify the code?
> i don't want to see that change, unless there's a clear benefit.
>

The reason was to simplify the code, yes, and reduce redundancy
because in the current implementation the vmem allocates PAGE_SIZE
memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing
the vacache and more importantly vmem is essentially just taking the
address allocations made by uvm_map.
With the changes I see about 15% less kernel map entries.

> let me explain some background. currently there are a number of
> kernel_map related problems:
>
> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
>
> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
> the allocate-for-free problem. ie. to free memory, you might need to
> split map-entries thus allocate some memory.
>
> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
complicated
> and broken.
>
> B. kernel fault handling is complicated because it needs memory allocation
> (eg. vm_anon) which needs some trick to avoid deadlock.
>
> C. KVA allocation is complicated because it needs memory allocation
> (eg. vm_map_entry) which needs some trick to avoid deadlock.
>
> the most of the above can be solved by separating KVA allocation and
> kernel fault handling. (except C, which will be merely moved to a
> different place.)
>

A-1 with vmem_btag being slightly less then half the size of
vm_map_entry...
A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a
part of A1 as we still have less map entries in the map but we don't
save memory as all the entries not in the map cached aside for
potential merging.
In this sense it seems broken to me and that it is complicated.
Reducing the overall allocated map_entries will help here, as vacaches do.

C seems to be inevitable it's only a question where it happens...

B is a result of having pageable memory, which can fault and
non-pageable memory in the same map, with the need to allocated
non-pageable memory in the event of a page fault.

> i implemented subr_vmem so that eventually it can be used as the primary
> KVA allocator. ie. when allocating from kernel_map, allocate KVA from
> kernel_va_arena first and then, if and only if necessary, register it to
> kernel_map for fault handling. it probably allows us to remove VACACHE
> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by
> kernel_va_arena.
>

Originally I thought about two options with option one being what my
patch does and two:

If vmem is made the primary kva allocator, we should carve out a
kernel heap entirely controlled by vmem, probably one special
vm_map_entry in the kernel_map that spans the heap or a submap that
never has any map_entries.
Essentially separating pageable and non-pageable memory allocations,
this would allow for removing the vacaches in the kernel-maps as well
as the map-entry-reservation mechanism.

Questions that follow:
- - how to size it probably.....
- - this might be the kmem_map? or two heaps an interrupt safe one and
one non interrupt safe?

I think having two "allocators" (vmem and the vm_map_(entries) itself)
controlling the kernel_map isn't a good idea as both have to be in
sync, at least every allocation that is made by vm_map_entries need to
be made in vmem as well. There is no clear responsibility for either.

Option two is more challenging and will solve problems B and As while
option one solves most of the As leaving B untouched.

Lars

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2oFQUACgkQcxuYqjT7GRb4eACgt0ra+vpcQx8UTOivOgZcpsQe
Nl0AoK8YnsJoYS5wdaSidLLB0OifWqeI
=xLwx
-----END PGP SIGNATURE-----

YAMAMOTO Takashi

unread,

Apr 19, 2011, 9:22:52 PM4/19/11

to

hi,

is this about limiting total size for a particular allocation?

> - - this might be the kmem_map? or two heaps an interrupt safe one and
> one non interrupt safe?

becuase kernel_va_arena would be quantum cache disabled,
most users would use another arena stacked on it.
(like what we currently have as kmem_arena.)
interrupt-safe allocations can either use kernel_va_arena directly or
have another arena eg. kmem_arena_intrsafe.

>
> I think having two "allocators" (vmem and the vm_map_(entries) itself)
> controlling the kernel_map isn't a good idea as both have to be in
> sync, at least every allocation that is made by vm_map_entries need to
> be made in vmem as well. There is no clear responsibility for either.

i agree that having two allocator for KVA is bad.
my idea is having just one. (kernel_va_arena)
no allocation would be made by vm_map_entries for kernel_map.
kernel_map is kept merely for fault handling.

essentially kva allocation would be:

va = vmem_alloc(kernel_va_arena, ...);
if (pageable)
create kernel_map entry for the va
else
...
return va;

>
> Option two is more challenging and will solve problems B and As while
> option one solves most of the As leaving B untouched.

sure, it's more challenging and involves more work.
(so it hasn't finished yet. :-)

YAMAMOTO Takashi

Lars Heidieker

unread,

May 18, 2011, 2:52:52 PM5/18/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/20/11 03:22, YAMAMOTO Takashi wrote:
> hi,

Hi,

I've made some progress in exploring both options further.
Two patches implementing either option:
a) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch
b)
http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm-extent.patch

Option a has extended kva caches for both kernel_map and kmem_map with
interfaces to it that are used by kmem(9), malloc(9) and pool(9) with
the exception that the pool_allocator_meta goes directly to the
kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in
a lower vm_map_entry count)

Option b has one vm_map_entry in the kernel_map spawning the
kernel_heap, which in turn is controlled by vmem(9).
There are the heap_arena from wich the heap_va_arena (with quantum
caches) imports as well as a internal arena for vmems meta data.
On top of the heap_va_arena are interfaces used by kmem(9), malloc(9)
and pool(9) with the pool meta data allocator going to the vmems meta arena.
Originally I had another arena on top of the heap_va_arena, with backed
the virtual memory with physical pages on import and from with
malloc(9), kmem(9) and pool(9) allocated, lets call this option c.
I replaced this arena with interface functions for efficiency reasons.

Findings after having run the system for a while and having about 1.1gig
in the pool(9)s:
Option a: about 30000 allocated kernel map_entries (not in the map but
allocated)
Option b: about 100000 allocated boundary tags.
Option c: about 400000 allocated boundary tags.

With boundary tags beeing about half the size of vm_map_entries the vmem
version uses slightly more memory but not so much.

Both versions use a modified kmem(9) that interfaces either with vmem or
the extended kva caches, which has page_aligned memory for allocations
of page_size and larger and cache_line aligned allocations for
allocations between cache_line size and page_size.
This should resolve some problems xen-kernels do have.

The vmem versions isn't quit finished the vmem_size function required by
zfs needs to be adapted etc. (And malloc(9) is just replaced by some
arena and not gathering statistics anymore...)

So far the status report.

Greetings,

Lars
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3UFYQACgkQcxuYqjT7GRby8QCfX+aS5U4PdfLcPTzsCP7LSww6
LJkAoLn+KcK+51I575vLnyX1P83gmyHi
=QwUo

Lars Heidieker

unread,

May 21, 2011, 3:14:00 AM5/21/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I suggest to use option a for the time being and once option b is ready
to replace the uvm_km* and it's kva-caches with the vmem implementation.
This will give use the benefits of fewer vm_map_entries and a kmem(9)
that does page_aligned alloctions.

Lars

- --
- ------------------------------------

Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

-- Friedrich Nietzsche
[ Die Fröhliche Wissenschaft Buch 3, 126 ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3XZjgACgkQcxuYqjT7GRYn8wCfavqCfWGyQcxMpxVHJGRVZWZc
5MEAoKWH96l5euHeoe1NVVE7CEwKGEtV
=NqCo

Lars Heidieker

unread,

May 21, 2011, 7:26:07 AM5/21/11

to

> Have you done some performance testing so we can be sure that any of those
> patches will not hurt performance too much ? (Simple build.sh -j4 times on
> your machine)
>
> Regards
>
> Adam.
>
>

Here are the results

boot:
rm -rf /usr/obj/*
time ./build.sh -u -j4 distribution

first run
real user sys map-entries
current 1701 4425 1016 549
option a 1697 4420 997 425

rm -rf /usr/obj/*
time ./build.sh -u -j4 distribution

second run
real user sys map-entries
current 1656 4407 989 724
option a 1653 4405 980 591

map entry count gained by pmap -R 0 | wc

Most of the difference seems to be in the noise except for the map entry
count, with a slight edge for option a.
This comes at no surprise as most scalability comes from the pool(9)s
anyway.

Lars

Lars Heidieker

unread,

May 23, 2011, 4:39:49 PM5/23/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've uploaded patches with the conversion split-out to
ftp://ftp.netbsd.org/pub/NetBSD/misc/para

Lars

- --
- ------------------------------------

Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

-- Friedrich Nietzsche
[ Die Fröhliche Wissenschaft Buch 3, 126 ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3axhUACgkQcxuYqjT7GRbiCgCfUQQ59h6C8uk4m7WmTd3yrqhj
BtIAn0OU5lMVkzaG5OnSFv50N4XEylbY
=Xgfy

YAMAMOTO Takashi

unread,

May 25, 2011, 9:51:26 PM5/25/11

to

hi,

>> Findings after having run the system for a while and having about 1.1gig
>> in the pool(9)s:
>> Option a: about 30000 allocated kernel map_entries (not in the map but
>> allocated)
>> Option b: about 100000 allocated boundary tags.
>> Option c: about 400000 allocated boundary tags.
>>
>> With boundary tags beeing about half the size of vm_map_entries the vmem
>> version uses slightly more memory but not so much.

why did you use different numbers for heap_va_arena's qcache_max
(8 * PAGE_SIZE) and VMK_VACACHE_MAP_QUANTUM (32 * PAGE_SIZE)?

if i read your patches correctly, the number of map entries/boundary tags
will be smaller if these constants are bigger, right?

>> Both versions use a modified kmem(9) that interfaces either with vmem or
>> the extended kva caches, which has page_aligned memory for allocations
>> of page_size and larger and cache_line aligned allocations for
>> allocations between cache_line size and page_size.
>> This should resolve some problems xen-kernels do have.

does the original (solaris) version of kmem_alloc provide aligned
allocations?

YAMAMOTO Takashi

Lars Heidieker

unread,

May 28, 2011, 4:21:43 AM5/28/11

to

Hi,

On 05/26/11 03:51, YAMAMOTO Takashi wrote:
> hi,

>
>>> Findings after having run the system for a while and having about 1.1gig
>>> in the pool(9)s:
>>> Option a: about 30000 allocated kernel map_entries (not in the map but
>>> allocated)
>>> Option b: about 100000 allocated boundary tags.
>>> Option c: about 400000 allocated boundary tags.
>>>
>>> With boundary tags beeing about half the size of vm_map_entries the vmem
>>> version uses slightly more memory but not so much.
>

> why did you use different numbers for heap_va_arena's qcache_max
> (8 * PAGE_SIZE) and VMK_VACACHE_MAP_QUANTUM (32 * PAGE_SIZE)?
>
> if i read your patches correctly, the number of map entries/boundary tags
> will be smaller if these constants are bigger, right?
>

I choose the 8 * PAGE_SIZE for qcache_max as the quantum caches are
pool_caches, so if we have only two or three allocation of a particular
size made by different cpus we have 2 or 3 times the va in the pool
caches, with a lot va wasted.
This might or might not be a point but was the motivation to start with
a lower value.
If the size is increased the amount of boundary tags goes down a bit
further, these caches very much have an influence on the control
structure allocation count.

One could argument that the vmk_vacaches should be pool_caches as well
(I tried that, no problem to switch them) for scalability reasons, then
they will have to deal with the same wastage argument.
Currently having these as pool_caches doesn't buy us much, as they have
to get backed with physical memory, which is a process most likely
serializing the allocation anyway... But this is no different between
the two options ;-)

>>> Both versions use a modified kmem(9) that interfaces either with vmem or
>>> the extended kva caches, which has page_aligned memory for allocations
>>> of page_size and larger and cache_line aligned allocations for
>>> allocations between cache_line size and page_size.
>>> This should resolve some problems xen-kernels do have.
>

> does the original (solaris) version of kmem_alloc provide aligned
> allocations?
>

Yes it does, it switches to cache_line size for alignment for
allocations >= cache_line size and to page_size alignment for
allocations >= page_size.

> YAMAMOTO Takashi
>

Lars

YAMAMOTO Takashi

unread,

Jun 7, 2011, 10:11:32 PM6/7/11

to

hi,

> Hi,
>
> On 05/26/11 03:51, YAMAMOTO Takashi wrote:
>> hi,
>>
>>>> Findings after having run the system for a while and having about 1.1gig
>>>> in the pool(9)s:
>>>> Option a: about 30000 allocated kernel map_entries (not in the map but
>>>> allocated)
>>>> Option b: about 100000 allocated boundary tags.
>>>> Option c: about 400000 allocated boundary tags.
>>>>
>>>> With boundary tags beeing about half the size of vm_map_entries the vmem
>>>> version uses slightly more memory but not so much.
>>
>> why did you use different numbers for heap_va_arena's qcache_max
>> (8 * PAGE_SIZE) and VMK_VACACHE_MAP_QUANTUM (32 * PAGE_SIZE)?
>>
>> if i read your patches correctly, the number of map entries/boundary tags
>> will be smaller if these constants are bigger, right?
>>
>
> I choose the 8 * PAGE_SIZE for qcache_max as the quantum caches are
> pool_caches, so if we have only two or three allocation of a particular
> size made by different cpus we have 2 or 3 times the va in the pool
> caches, with a lot va wasted.

in that case, the "two or three allocation" will likely be served by
a single pool page, won't it? ie. the waste is same as the direct use of pool.

> This might or might not be a point but was the motivation to start with
> a lower value.
> If the size is increased the amount of boundary tags goes down a bit
> further, these caches very much have an influence on the control
> structure allocation count.
>
> One could argument that the vmk_vacaches should be pool_caches as well
> (I tried that, no problem to switch them) for scalability reasons, then
> they will have to deal with the same wastage argument.
> Currently having these as pool_caches doesn't buy us much, as they have
> to get backed with physical memory, which is a process most likely
> serializing the allocation anyway... But this is no different between
> the two options ;-)
>
>>>> Both versions use a modified kmem(9) that interfaces either with vmem or
>>>> the extended kva caches, which has page_aligned memory for allocations
>>>> of page_size and larger and cache_line aligned allocations for
>>>> allocations between cache_line size and page_size.
>>>> This should resolve some problems xen-kernels do have.
>>
>> does the original (solaris) version of kmem_alloc provide aligned
>> allocations?
>>
>
> Yes it does, it switches to cache_line size for alignment for
> allocations >= cache_line size and to page_size alignment for
> allocations >= page_size.

kmem_alloc(9F) says:

The allocated memory is at least double-word aligned, so it can
hold any C data structure. No greater alignment can be
assumed.

% uname -sr
SunOS 5.10

so i don't think it's api-wise guaranteed.
IMO it's better to use a low-level allocator (eg. uvm_km_alloc) for
alignment-sensitive users.

YAMAMOTO Takashi

Mindaugas Rasiukevicius

unread,

Jun 8, 2011, 7:44:57 PM6/8/11

to

ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> > Yes it does, it switches to cache_line size for alignment for
> > allocations >= cache_line size and to page_size alignment for
> > allocations >= page_size.
>

> kmem_alloc(9F) says:
>
> The allocated memory is at least double-word aligned, so it can
> hold any C data structure. No greater alignment can be
> assumed.
>
> % uname -sr
> SunOS 5.10
>
> so i don't think it's api-wise guaranteed.
> IMO it's better to use a low-level allocator (eg. uvm_km_alloc) for
> alignment-sensitive users.

While for page-size alignment, it makes sense to use uvm_km(9) allocator,
there are quite a few allocations when it is useful to give a separate
cache-line for a structure. I am not sure if it is desirable to sprinkle
pad & align magic each time in the caller's side, instead of adding such
support to kmem(9). Perhaps kmem_cacheline_{alloc,free}?

--
Mindaugas

Mindaugas Rasiukevicius

unread,

Jun 8, 2011, 8:01:02 PM6/8/11

to

ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
> >> let me explain some background. currently there are a number of
> >> kernel_map related problems:
> >>
> >> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
> >>
> >> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
> >> the allocate-for-free problem. ie. to free memory, you might need to
> >> split map-entries thus allocate some memory.
> >>
> >> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
> > complicated and broken.
> >>

> <...>

>
> i agree that having two allocator for KVA is bad.
> my idea is having just one. (kernel_va_arena)
> no allocation would be made by vm_map_entries for kernel_map.
> kernel_map is kept merely for fault handling.
>
> essentially kva allocation would be:
>
> va = vmem_alloc(kernel_va_arena, ...);
> if (pageable)
> create kernel_map entry for the va
> else
> ...
> return va;

I like this a lot. Seems an overall win.

--
Mindaugas

Lars Heidieker

unread,

Jun 12, 2011, 5:55:17 AM6/12/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 01:44, Mindaugas Rasiukevicius wrote:

> ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>>> Yes it does, it switches to cache_line size for alignment for
>>> allocations >= cache_line size and to page_size alignment for
>>> allocations >= page_size.
>>

>> kmem_alloc(9F) says:
>>
>> The allocated memory is at least double-word aligned, so it can
>> hold any C data structure. No greater alignment can be
>> assumed.
>>
>> % uname -sr
>> SunOS 5.10
>>
>> so i don't think it's api-wise guaranteed.
>> IMO it's better to use a low-level allocator (eg. uvm_km_alloc) for
>> alignment-sensitive users.
>
> While for page-size alignment, it makes sense to use uvm_km(9) allocator,
> there are quite a few allocations when it is useful to give a separate
> cache-line for a structure. I am not sure if it is desirable to sprinkle
> pad & align magic each time in the caller's side, instead of adding such
> support to kmem(9). Perhaps kmem_cacheline_{alloc,free}?
>

It's not api-wise guaranteed, but what do we loose if we get those
alignments? Nothing I think, they are in place with the OpenSolaris
implementation as well ;-)
I don't think it's a good idea to have a different api for
cache-line-aligned memory, this would require different pools for
cache-size-aligned memory and not cache-line-aligned memory just
spreading out the allocations and increasing fragmentation.

If the kva is controlled by a vmem arena then those page-size aligned
allocation should go to that arena, which quantum is page-size anyway.

Lars

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk30jQQACgkQcxuYqjT7GRYGHgCdErYJzFuB6sM5iJlucc/GO51r
r+8AnA5QjSfnKsB1+44DQQtJ7osS8e5w
=whHX
-----END PGP SIGNATURE-----

Lars Heidieker

unread,

Jun 12, 2011, 6:07:42 AM6/12/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 02:01, Mindaugas Rasiukevicius wrote:
> ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>>>> let me explain some background. currently there are a number of
>>>> kernel_map related problems:
>>>>
>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose.
>>>>
>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced
>>>> the allocate-for-free problem. ie. to free memory, you might need to
>>>> split map-entries thus allocate some memory.
>>>>
>>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's
>>> complicated and broken.
>>>>
>> <...>
>>
>> i agree that having two allocator for KVA is bad.
>> my idea is having just one. (kernel_va_arena)
>> no allocation would be made by vm_map_entries for kernel_map.
>> kernel_map is kept merely for fault handling.
>>
>> essentially kva allocation would be:
>>
>> va = vmem_alloc(kernel_va_arena, ...);
>> if (pageable)
>> create kernel_map entry for the va
>> else
>> ...
>> return va;
>
> I like this a lot. Seems an overall win.
>

Yes, definitly it simplifies locking a lot. But I don't like the idea of
mixing vmem and map entries. I think it's cleaner to have one
vm_map_entry spawning the entire heap which in turn is controlled by a
vmem_arena.
- From there on two options emerge:
Making kmem(9) interrupt safe, which seems to have some runtime overhead
or to retrofit malloc(9) (for the time it is still there) into the vmem
backed world.

Lars
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk30j+4ACgkQcxuYqjT7GRZXIgCdHpk9oc0Bm8nYBYxmMWhyhrGZ
WMUAmwbGnre42AWDuMGiE0dc0Efmq1eE
=MhnY
-----END PGP SIGNATURE-----

Lars Heidieker

unread,

Jun 12, 2011, 7:47:59 AM6/12/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

and having cache-line-size alignment, should yield better performance
especially on SMP because of cache-line sharing effects...

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEUEARECAAYFAk30p28ACgkQcxuYqjT7GRaU9QCfR6/as/mGPmSg6mSHarBYL4EZ
QUcAl0tI03B2MVmTAS7RsDz5jYBq8Fk=
=OGyo

Lars Heidieker

unread,

Jun 13, 2011, 5:28:05 AM6/13/11

to

hi,

On 06/08/11 04:11, YAMAMOTO Takashi wrote:
> hi,

>
>> Hi,
>>
>> On 05/26/11 03:51, YAMAMOTO Takashi wrote:
>>> hi,
>>>
>>>>> Findings after having run the system for a while and having about 1.1gig
>>>>> in the pool(9)s:
>>>>> Option a: about 30000 allocated kernel map_entries (not in the map but
>>>>> allocated)
>>>>> Option b: about 100000 allocated boundary tags.
>>>>> Option c: about 400000 allocated boundary tags.
>>>>>
>>>>> With boundary tags beeing about half the size of vm_map_entries the vmem
>>>>> version uses slightly more memory but not so much.
>>>
>>> why did you use different numbers for heap_va_arena's qcache_max
>>> (8 * PAGE_SIZE) and VMK_VACACHE_MAP_QUANTUM (32 * PAGE_SIZE)?
>>>
>>> if i read your patches correctly, the number of map entries/boundary tags
>>> will be smaller if these constants are bigger, right?
>>>
>>
>> I choose the 8 * PAGE_SIZE for qcache_max as the quantum caches are
>> pool_caches, so if we have only two or three allocation of a particular
>> size made by different cpus we have 2 or 3 times the va in the pool
>> caches, with a lot va wasted.
>

> in that case, the "two or three allocation" will likely be served by
> a single pool page, won't it? ie. the waste is same as the direct use of pool.
>

true, the wastage will be only larger if more puts and gets happen as
constructed objects are kept in the caches. However I don't think this
is a problem at all.

YAMAMOTO Takashi

unread,

Jun 15, 2011, 12:04:26 AM6/15/11

to

hi,

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/09/11 01:44, Mindaugas Rasiukevicius wrote:
>> ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>>>> Yes it does, it switches to cache_line size for alignment for
>>>> allocations >= cache_line size and to page_size alignment for
>>>> allocations >= page_size.
>>>
>>> kmem_alloc(9F) says:
>>>
>>> The allocated memory is at least double-word aligned, so it can
>>> hold any C data structure. No greater alignment can be
>>> assumed.
>>>
>>> % uname -sr
>>> SunOS 5.10
>>>
>>> so i don't think it's api-wise guaranteed.
>>> IMO it's better to use a low-level allocator (eg. uvm_km_alloc) for
>>> alignment-sensitive users.
>>
>> While for page-size alignment, it makes sense to use uvm_km(9) allocator,
>> there are quite a few allocations when it is useful to give a separate
>> cache-line for a structure. I am not sure if it is desirable to sprinkle
>> pad & align magic each time in the caller's side, instead of adding such
>> support to kmem(9). Perhaps kmem_cacheline_{alloc,free}?
>>
>
> It's not api-wise guaranteed, but what do we loose if we get those
> alignments? Nothing I think, they are in place with the OpenSolaris
> implementation as well ;-)

we loose the flexibility of the implementation.

> I don't think it's a good idea to have a different api for
> cache-line-aligned memory, this would require different pools for
> cache-size-aligned memory and not cache-line-aligned memory just
> spreading out the allocations and increasing fragmentation.
>
> If the kva is controlled by a vmem arena then those page-size aligned
> allocation should go to that arena, which quantum is page-size anyway.

i guess page-alignment and cache-line-alignemnt should be considered
separately.
i currently have no idea how many of kmem_alloc users are sensitive to
cache-line-alignment.

YAMAMOTO Takashi

Lars Heidieker

unread,

Jun 15, 2011, 1:54:47 AM6/15/11

to

hi,

On 06/15/11 06:04, YAMAMOTO Takashi wrote:
> hi,

>
> On 06/09/11 01:44, Mindaugas Rasiukevicius wrote:
>>>> ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>>>>>> Yes it does, it switches to cache_line size for alignment for
>>>>>> allocations >= cache_line size and to page_size alignment for
>>>>>> allocations >= page_size.
>>>>>
>>>>> kmem_alloc(9F) says:
>>>>>
>>>>> The allocated memory is at least double-word aligned, so it can
>>>>> hold any C data structure. No greater alignment can be
>>>>> assumed.
>>>>>
>>>>> % uname -sr
>>>>> SunOS 5.10
>>>>>
>>>>> so i don't think it's api-wise guaranteed.
>>>>> IMO it's better to use a low-level allocator (eg. uvm_km_alloc) for
>>>>> alignment-sensitive users.
>>>>
>>>> While for page-size alignment, it makes sense to use uvm_km(9) allocator,
>>>> there are quite a few allocations when it is useful to give a separate
>>>> cache-line for a structure. I am not sure if it is desirable to sprinkle
>>>> pad & align magic each time in the caller's side, instead of adding such
>>>> support to kmem(9). Perhaps kmem_cacheline_{alloc,free}?
>>>>
>
> It's not api-wise guaranteed, but what do we loose if we get those
> alignments? Nothing I think, they are in place with the OpenSolaris
> implementation as well ;-)
>

>> we loose the flexibility of the implementation.
>

With the flexibility you mean(?) that we can allocate lets say something
like 5200 bytes (as 8 or 64 byte aligned memory) in both my patches it
would e propagated to 8192 bytes eg?
I see the point there. If that's the case I still think it would be
better with the cache-line-aligned allocations and once we have sizes
greater than the kmem-pools we switch to a, not yet there arena, with a
quantum-size of cache-line-size for large allocations.
This way we avoid having a lot of boundary tags for small allocations
(because of small pool-page-sizes in the current implementation) and
retain this flexibility.

> I don't think it's a good idea to have a different api for
> cache-line-aligned memory, this would require different pools for
> cache-size-aligned memory and not cache-line-aligned memory just
> spreading out the allocations and increasing fragmentation.
>
> If the kva is controlled by a vmem arena then those page-size aligned
> allocation should go to that arena, which quantum is page-size anyway.
>

>> i guess page-alignment and cache-line-alignemnt should be considered
>> separately.
>> i currently have no idea how many of kmem_alloc users are sensitive to
>> cache-line-alignment.
>

Those two are distinct except for that page-alignment includes
cache-line-alignment (at least I don't know of any architecture other
than this) ;-)
Putting allocations in different cache-lines will have a general
benefiting effect.

>> YAMAMOTO Takashi
>
>
> Lars

Lars

Lars Heidieker

unread,

Aug 14, 2011, 1:17:08 PM8/14/11

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/12/11 12:07, Lars Heidieker wrote:
> On 06/09/11 02:01, Mindaugas Rasiukevicius wrote:

>> ya...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) wrote:
>>>>> let me explain some background. currently there are a number
>>>>> of kernel_map related problems:
>>>>>
>>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation
>>>>> purpose.
>>>>>
>>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it
>>>>> introduced the allocate-for-free problem. ie. to free memory,
>>>>> you might need to split map-entries thus allocate some
>>>>> memory.
>>>>>
>>>>> A-3. to solve A-2, there is map-entry-reservation mechanism.
>>>>> it's
>>>> complicated and broken.
>>>>>
>>> <...>
>>>
>>> i agree that having two allocator for KVA is bad. my idea is
>>> having just one. (kernel_va_arena) no allocation would be made by
>>> vm_map_entries for kernel_map. kernel_map is kept merely for
>>> fault handling.
>>>
>>> essentially kva allocation would be:
>>>
>>> va = vmem_alloc(kernel_va_arena, ...); if (pageable) create
>>> kernel_map entry for the va else ... return va;
>
>> I like this a lot. Seems an overall win.
>
>

> Yes, definitly it simplifies locking a lot. But I don't like the idea
> of mixing vmem and map entries. I think it's cleaner to have one
> vm_map_entry spawning the entire heap which in turn is controlled by
> a vmem_arena. - From there on two options emerge: Making kmem(9)
> interrupt safe, which seems to have some runtime overhead or to
> retrofit malloc(9) (for the time it is still there) into the vmem
> backed world.
>
> Lars
>

Hi,

i uploaded a new version of the kmem-pool-vmem-uvm patch:
ftp://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm.patch

vmem(9) is now in charge for controlling the entire kernel heap,
vmem_create is split into vmem_create and vmem_xcreate with different
import function signatures, this allows the use of vmem_alloc as an
import function.
vmem has two private arenas for it's internal purposes.

special kernel map entry support is removed, it's not required anymore
as well as map entry reservation not required any more as well.

pool(9) kmem(9) are now both backed by a quantum cached arena.
malloc(9) is replaced by small wrapper around kmem(9)

This is what my kernel_map looks like after having made a release build:
(8gb amd64)

$ pmap -R 0

FFFF800000000000 239364K read/write/exec [ anon ]
FFFF80000E9C1000 4067176K read/write/exec [ anon ]
FFFF800106D9B000 16384K read/write/exec [ pager_map ]
FFFF800107D9B000 96K read/write/exec [ anon ]
FFFF800107DB3000 1200K read/write/exec [ phys_map ]
FFFF800107EDF000 8196K read/write/exec [ anon ]
FFFF8001086E0000 8192K read/write/exec [ ubc_pager ]
FFFF800108EE0000 68K read/write/exec [ anon ]
FFFF800108EF1000 2816K read/write/exec [ uvm_aobj ]
total 4343492K

The "big" entry in the second line thats the heap controlled by vmem.

no failed allocations any more due to allocation while map locked etc

$ vmstat -mv

vmstat: Kmem statistics are not being gathered by the kernel.
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
aio_jobs_pool 128 0 0 0 0 0 0 0 0
inf 0
aio_lio_pool 40 0 0 0 0 0 0 0 0
inf 0
amappl 80 2636 0 0 53 0 53 53 0
inf 0
anonpl 32 274412 0 0 2178 0 2178 2178 0
inf 0
ataspl 96 692473 0 692473 1 0 1 1 0
inf 1
biopl 280 4023 0 0 288 0 288 288 0
inf 0
brtpl 56 0 0 0 0 0 0 0 0
inf 0
buf16k 16384 28091 0 4746 5990 153 5837 5837 1
1 0
buf1k 1024 0 0 0 1 0 1 1 1
1 1
buf2k 2048 179497 0 5924 5434 9 5425 5425 1
1 0
buf32k 32768 0 0 0 1 0 1 1 1
1 1
buf4k 4096 2745 0 1274 102 8 94 94 1
1 0
buf512b 512 0 0 0 1 0 1 1 1
1 1
buf64k 65536 0 0 0 1 0 1 1 1
1 1
buf8k 8192 1504 0 726 129 29 100 100 1
1 0
bufpl 280 199167 0 0 14227 0 14227 14227 0
inf 0
cryptkop 352 0 0 0 0 0 0 0 0
inf 0
cryptodesc 72 0 0 0 0 0 0 0 0
inf 0
cryptop 272 0 0 0 0 0 0 0 0
inf 0
csepl 208 0 0 0 17 0 17 17 17
inf 17
cwdi 64 173 0 0 3 0 3 3 0
inf 0
execargs 262144 710463 0 710463 4 0 4 4 0
16 4
extent 40 0 0 0 0 0 0 0 0
inf 0
fcrpl 168 0 0 0 3 0 3 3 3
inf 3
fdfile 64 2549 0 0 41 0 41 41 0
inf 0
ffsdino1 128 419052 0 0 13518 0 13518 13518 0
inf 0
ffsdino2 256 0 0 0 0 0 0 0 0
inf 0
ffsino 256 419056 0 0 26191 0 26191 26191 0
inf 0
file 128 281 0 0 10 0 10 10 0
inf 0
filedesc 832 173 0 0 44 0 44 44 0
inf 0
fstrans 32 154 0 0 2 0 2 2 0
inf 0
igmppl 32 0 0 0 0 0 0 0 0
inf 0
in6pcbpl 232 155 0 144 1 0 1 1 0
inf 0
inmltpl 48 3 0 0 1 0 1 1 0
inf 0
inpcbpl 192 172 0 162 1 0 1 1 0
inf 0
ipfrenpl 64 0 0 0 0 0 0 0 0
inf 0
kcpuset 64 1 0 0 1 0 1 1 0
inf 0
kcredpl 192 70 0 0 4 0 4 4 0
inf 0
kmem-1024 1024 2370 0 0 593 0 593 593 0
inf 0
kmem-112 112 894 0 0 25 0 25 25 0
inf 0
kmem-128 128 1334 0 0 42 0 42 42 0
inf 0
kmem-16 16 988 0 0 4 0 4 4 0
inf 0
kmem-160 160 214 0 0 9 0 9 9 0
inf 0
kmem-192 192 191 0 0 10 0 10 10 0
inf 0
kmem-2048 2048 1225 0 0 613 0 613 613 0
inf 0
kmem-224 224 85 0 0 5 0 5 5 0
inf 0
kmem-24 24 232 0 0 2 0 2 2 0
inf 0
kmem-256 256 123 0 0 8 0 8 8 0
inf 0
kmem-32 32 486 0 0 4 0 4 4 0
inf 0
kmem-320 320 285 0 0 24 0 24 24 0
inf 0
kmem-384 384 328 0 0 33 0 33 33 0
inf 0
kmem-40 40 4810 0 0 48 0 48 48 0
inf 0
kmem-4096 4096 55 0 0 55 0 55 55 0
inf 0
kmem-448 448 127 0 0 15 0 15 15 0
inf 0
kmem-48 48 1021 0 0 13 0 13 13 0
inf 0
kmem-512 512 97 0 0 13 0 13 13 0
inf 0
kmem-56 56 306 0 0 5 0 5 5 0
inf 0
kmem-64 64 4264 0 0 67 0 67 67 0
inf 0
kmem-768 768 245 0 0 49 0 49 49 0
inf 0
kmem-8 8 790 0 0 2 0 2 2 0
inf 0
kmem-80 80 2021 0 0 40 0 40 40 0
inf 0
kmem-96 96 200 0 0 5 0 5 5 0
inf 0
ksiginfo 72 83 0 0 2 0 2 2 0
inf 0
ktrace 120 0 0 0 0 0 0 0 0
inf 0
kva-12288 12288 223 0 0 11 0 11 11 0
inf 0
kva-16384 16384 17 0 0 2 0 2 2 0
inf 0
kva-20480 20480 13 0 0 2 0 2 2 0
inf 0
kva-24576 24576 1 0 0 1 0 1 1 0
inf 0
kva-28672 28672 1 0 0 1 0 1 1 0
inf 0
kva-32768 32768 1 0 0 1 0 1 1 0
inf 0
kva-36864 36864 2 0 0 1 0 1 1 0
inf 0
kva-4096 4096 118955 0 0 1859 0 1859 1859 0
inf 0
kva-40960 40960 0 0 0 0 0 0 0 0
inf 0
kva-49152 49152 1 0 0 1 0 1 1 0
inf 0
kva-65536 65536 11462 0 0 2866 0 2866 2866 0
inf 0
kva-8192 8192 112 0 0 4 0 4 4 0
inf 0
lockf 112 12 0 0 1 0 1 1 0
inf 0
lwppl 1056 217 0 0 73 0 73 73 0
inf 0
mbpl 512 153 0 0 22 0 22 22 2
inf 2
mclpl 2048 149 0 0 79 0 79 79 4
65536 4
mqmsgpl 1024 0 0 0 0 0 0 0 0
inf 0
mutex 64 422019 0 0 6699 0 6699 6699 0
inf 0
ncache 192 419038 0 0 19955 0 19955 19955 0
inf 0
pcache 896 59 0 0 15 0 15 15 0
inf 0
pcachecpu 64 237 0 0 4 0 4 4 0
inf 0
pcglarge 1024 4852 0 0 1213 0 1213 1213 0
inf 0
pcgnormal 256 7359 0 0 460 0 460 460 0
inf 0
pdict128 184 0 0 0 0 0 0 0 0
inf 0
pdict16 72 24 0 8 1 0 1 1 0
inf 0
pdict32 88 10 0 2 1 0 1 1 0
inf 0
pdppl 4096 168 0 0 168 0 168 168 0
inf 0
pewpl 24 0 0 0 1 0 1 1 1
1 1
phpool-0 56 41440 0 199 573 0 573 573 0
inf 0
phpool-1024 176 0 0 0 0 0 0 0 0
inf 0
phpool-128 64 70 0 0 2 0 2 2 0
inf 0
phpool-2048 304 0 0 0 0 0 0 0 0
inf 0
phpool-256 80 6 0 0 1 0 1 1 0
inf 0
phpool-4096 560 0 0 0 0 0 0 0 0
inf 0
phpool-512 112 2 0 0 1 0 1 1 0
inf 0
phpool-64 56 4970 0 0 70 0 70 70 0
inf 0
piperd 320 112 0 0 10 0 10 10 0
inf 0
pipewr 320 119 0 0 10 0 10 10 0
inf 0
plimitpl 216 123 0 0 7 0 7 7 0
inf 0
pmappl 392 168 0 0 17 0 17 17 0
inf 0
pnbufpl 1024 72 0 0 18 0 18 18 0
inf 0
procpl 656 161 0 0 27 0 27 27 0
inf 0
proparay 48 0 0 0 0 0 0 0 0
inf 0
propdata 40 0 0 0 0 0 0 0 0
inf 0
propdict 48 196 0 24 3 0 3 3 0
inf 0
propnmbr 56 26 0 7 1 0 1 1 0
inf 0
propstng 40 269 0 11 3 0 3 3 0
inf 0
pstatspl 448 161 0 0 18 0 18 18 0
inf 0
ptimerpl 264 11 0 5 1 0 1 1 0
inf 0
ptimerspl 280 11 0 5 1 0 1 1 0
inf 0
pvpl 40 38951 0 0 386 0 386 386 0
inf 0
ractx 32 68471 0 0 544 0 544 544 0
inf 0
rndsample 536 10808 0 10804 1 0 1 1 0
inf 0
rtentpl 272 27 0 0 2 0 2 2 0
inf 0
rttmrpl 64 0 0 0 0 0 0 0 0
inf 0
rwlock 64 0 0 0 0 0 0 0 0
inf 0
sackholepl 32 0 0 0 0 0 0 0 0
inf 0
sigacts 3088 173 0 0 173 0 173 173 0
inf 0
socket 584 41 0 0 6 0 6 6 0
inf 0
swp vnd 288 0 0 0 0 0 0 0 0
inf 0
swp vnx 32 0 0 0 0 0 0 0 0
inf 0
synpl 280 1 0 1 1 0 1 1 0
inf 1
tcpcbpl 792 17 0 12 2 0 2 2 0
inf 1
tcpipqepl 80 0 0 0 0 0 0 0 0
inf 0
tstilepl 96 217 0 0 6 0 6 6 0
inf 0
uaoeltpl 96 0 0 0 0 0 0 0 0
inf 0
uarea 12288 217 0 0 217 0 217 217 0
inf 0
ufsdir 264 4 0 0 1 0 1 1 0
inf 0
vmmpepl 136 5583 0 0 193 0 193 193 0
inf 0
vmsppl 368 168 0 0 16 0 16 16 0
inf 0
vnodepl 296 419079 0 0 32237 0 32237 32237 0
inf 0
wapblinopl 32 752908 0 752888 1 0 1 1 0
inf 0

In use 1206022K, total allocated 1224292K; utilization 98.5%

kind regards,
Lars

- --
- ------------------------------------

Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.

-- Friedrich Nietzsche
[ Die Fröhliche Wissenschaft Buch 3, 126 ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5IAxMACgkQcxuYqjT7GRZ9UwCeMazMDvYTns5LAq7K6TtiV5Ql
G6AAoMBtGvBfk4aniS8qrRt8EtYOjs1J
=moI/
-----END PGP SIGNATURE-----

YAMAMOTO Takashi

unread,

Aug 14, 2011, 10:21:44 PM8/14/11

to

hi,

> Hi,
>
> i uploaded a new version of the kmem-pool-vmem-uvm patch:
> ftp://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm.patch

thanks for working on this.

can you provide a patch with diff -up?

have you done benchmarks? eg. src/regress/sys/kern/allocfree
i'm a little concerned about IPL_VM mutex overhead for kmem_alloc.

YAMAMOTO Takashi

> Mystische Erkldrungen:
> Die mystischen Erkldrungen gelten f|r tief;
> die Wahrheit ist, dass sie noch nicht einmal oberfldchlich sind.
>
> -- Friedrich Nietzsche
> [ Die Frvhliche Wissenschaft Buch 3, 126 ]

Message has been deleted