[ANNOUNCE] GIT 1.0.0

Junio C Hamano

unread,

Dec 21, 2005, 3:00:45 AM12/21/05

to g...@vger.kernel.org, linux-...@vger.kernel.org

GIT 1.0.0 is found at the usual places:

Tarball http://www.kernel.org/pub/software/scm/git/
RPM http://www.kernel.org/pub/software/scm/git/RPMS/
Debian http://www.kernel.org/pub/software/scm/git/debian/
GIT git://git.kernel.org/pub/scm/git/git.git/

The name "1.0.0" ought to mean a significant milestone, but
actually it is not. Pre 1.0 version has been in production use
by the kernel folks for quite some time, and the changes since
1.0rc are pretty small and primarily consist of documenation
updates, clone/fetch enhancements and miscellaneous bugfixes.

Thank you all who gave patches, comments and time.

Happy hacking, and a little early ho-ho-ho.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric Dumazet

unread,

Dec 21, 2005, 4:13:00 AM12/21/05

to linux-...@vger.kernel.org, Andi Kleen

I wonder if the 32 and 192 bytes caches are worth to be declared in
include/linux/kmalloc_sizes.h, at least on x86_64

(x86_64 : PAGE_SIZE = 4096, L1_CACHE_BYTES = 64)

On my machines, I can say that the 32 and 192 sizes could be avoided in favor
in spending less cpu cycles in __find_general_cachep()

Could some of you post the result of the following command on your machines :

# grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40

size-131072 0 0 131072
size-65536 0 0 65536
size-32768 2 2 32768
size-16384 0 0 16384
size-8192 13 13 8192
size-4096 161 161 4096
size-2048 40564 42976 2048
size-1024 681 800 1024
size-512 19792 37168 512
size-256 81 105 256
size-192 1218 1280 192
size-64 31278 86907 64
size-128 5457 10380 128
size-32 594 784 32

Thank you

PS : I have no idea why the last lines (size-192, 64, 128, 32) are not ordered...

Eric

David S. Miller

unread,

Dec 21, 2005, 4:22:34 AM12/21/05

to da...@cosmosbay.com, linux-...@vger.kernel.org, a...@suse.de

From: Eric Dumazet <da...@cosmosbay.com>
Date: Wed, 21 Dec 2005 10:11:51 +0100

> Could some of you post the result of the following command on your machines :

sparc64, PAGE_SIZE=8192, L1_CACHE_BYTES=32

size-131072 0 0 131072
size-65536 13 13 65536
size-32768 2 2 32768
size-16384 2 2 16384
size-8192 67 67 8192
size-4096 75 76 4096
size-2048 303 308 2048
size-1024 176 176 1024
size-512 251 255 512
size-256 217 217 256
size-192 1230 1230 192
size-128 106 122 128
size-96 1098 1134 96
size-64 29387 30226 64

Alok kataria

unread,

Dec 21, 2005, 4:47:21 AM12/21/05

to Eric Dumazet, linux-...@vger.kernel.org, Andi Kleen

On 12/21/05, Eric Dumazet <da...@cosmosbay.com> wrote:
> I wonder if the 32 and 192 bytes caches are worth to be declared in
> include/linux/kmalloc_sizes.h, at least on x86_64
>
> (x86_64 : PAGE_SIZE = 4096, L1_CACHE_BYTES = 64)
>
> On my machines, I can say that the 32 and 192 sizes could be avoided in favor
> in spending less cpu cycles in __find_general_cachep()
>
> Could some of you post the result of the following command on your machines :
>
> # grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40
>
> size-131072 0 0 131072
> size-65536 0 0 65536
> size-32768 2 2 32768
> size-16384 0 0 16384
> size-8192 13 13 8192
> size-4096 161 161 4096
> size-2048 40564 42976 2048
> size-1024 681 800 1024
> size-512 19792 37168 512
> size-256 81 105 256
> size-192 1218 1280 192
> size-64 31278 86907 64
> size-128 5457 10380 128
> size-32 594 784 32
>
> Thank you
>
> PS : I have no idea why the last lines (size-192, 64, 128, 32) are not ordered...

The size-32 and size-128 caches are created before any other cache, as
the array_caches (arraycache_init) and kmem_list3's structure come
from these cache.
Thus these caches are added to the cache_chain before other caches.
And s_show just walks this chain and prints info for the caches.

Before l3 was converted into a pointer (per node slabs) we could
intialize the caches in order as we knew that the arraycache_init will
always fit in the first cache.

Thanks & Regards,
Alok

Jan-Benedict Glaw

unread,

Dec 21, 2005, 5:06:43 AM12/21/05

to David S. Miller, da...@cosmosbay.com, linux-...@vger.kernel.org, a...@suse.de

On Wed, 2005-12-21 01:22:12 -0800, David S. Miller <da...@davemloft.net> wrote:
> From: Eric Dumazet <da...@cosmosbay.com>
> Date: Wed, 21 Dec 2005 10:11:51 +0100
>
> > Could some of you post the result of the following command on your machines :

VAX KA650 (simulated), 4k pages (hw-size is 512 Bytes, though),
L1_CACHE_BYTES=32

# grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40

size-131072 0 0 131072
size-65536 0 0 65536
size-32768 0 0 32768
size-16384 0 0 16384
size-8192 0 0 8192
size-4096 21 21 4096
size-2048 39 42 2060
size-1024 18 21 1036
size-512 70 70 524
size-256 5 14 268
size-192 722 722 204
size-128 145 168 140
size-96 382 396 108
size-32 1040 1092 44
size-64 338 350 76

MfG, JBG

signature.asc

Ed Tomlinson

unread,

Dec 21, 2005, 7:45:24 AM12/21/05

to Eric Dumazet, linux-...@vger.kernel.org, Andi Kleen

On Wednesday 21 December 2005 04:11, Eric Dumazet wrote:
> (x86_64 : PAGE_SIZE = 4096, L1_CACHE_BYTES = 64)
>
> On my machines, I can say that the 32 and 192 sizes could be avoided in favor
> in spending less cpu cycles in __find_general_cachep()
>
> Could some of you post the result of the following command on your machines :
>
> # grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40
size-131072 0 0 131072

size-65536 3 3 65536
size-32768 0 0 32768
size-16384 3 3 16384
size-8192 28 28 8192
size-4096 184 184 4096
size-2048 272 272 2048
size-1024 300 300 1024
size-512 275 376 512
size-256 717 720 256
size-192 1120 1220 192
size-64 7720 8568 64
size-128 45019 65830 128
size-32 1627 3333 32

amd64 up

Ed Tomlinson

Folkert van Heusden

unread,

Dec 21, 2005, 8:21:28 AM12/21/05

to Ed Tomlinson, Eric Dumazet, linux-...@vger.kernel.org, Andi Kleen

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> > (x86_64 : PAGE_SIZE = 4096, L1_CACHE_BYTES = 64)
> > On my machines, I can say that the 32 and 192 sizes could be avoided in favor
> > in spending less cpu cycles in __find_general_cachep()
> > Could some of you post the result of the following command on your machines :
> > # grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40
> size-131072 0 0 131072
> size-65536 3 3 65536
> size-32768 0 0 32768
> size-16384 3 3 16384
> size-8192 28 28 8192
> size-4096 184 184 4096
> size-2048 272 272 2048
> size-1024 300 300 1024
> size-512 275 376 512
> size-256 717 720 256
> size-192 1120 1220 192
> size-64 7720 8568 64
> size-128 45019 65830 128
> size-32 1627 3333 32

size-131072 0 0 131072
size-65536 0 0 65536
size-32768 20 20 32768
size-16384 8 9 16384
size-8192 37 38 8192
size-4096 269 269 4096
size-2048 793 910 2048
size-1024 564 608 1024
size-512 702 856 512
size-256 1485 4005 256
size-128 1209 1350 128
size-64 2858 3363 64
size-32 1538 2714 64
Intel(R) Xeon(TM) MP CPU 3.00GHz
address sizes : 40 bits physical, 48 bits virtual

Folkert van Heusden

- --
Try MultiTail! Multiple windows with logfiles, filtered with regular
expressions, colored output, etc. etc. www.vanheusden.com/multitail/
- ----------------------------------------------------------------------
Get your PGP/GPG key signed at www.biglumber.com!
- ----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iIMEARECAEMFAkOpVq48Gmh0dHA6Ly93d3cudmFuaGV1c2Rlbi5jb20vZGF0YS1z
aWduaW5nLXdpdGgtcGdwLXBvbGljeS5odG1sAAoJEDAZDowfKNiuUUEAnR9DJq5M
x+Bj1R+djzCli3bFrJXKAJ9OmCx9FKDaGl6PocRwCZSKURerPA==
=vQhF
-----END PGP SIGNATURE-----

Eric Dumazet

unread,

Dec 21, 2005, 8:38:56 AM12/21/05

to Folkert van Heusden, Ed Tomlinson, linux-...@vger.kernel.org, Andi Kleen

Folkert van Heusden a écrit :

>
>
> size-131072 0 0 131072
> size-65536 0 0 65536
> size-32768 20 20 32768
> size-16384 8 9 16384
> size-8192 37 38 8192
> size-4096 269 269 4096
> size-2048 793 910 2048
> size-1024 564 608 1024
> size-512 702 856 512
> size-256 1485 4005 256
> size-128 1209 1350 128
> size-64 2858 3363 64
> size-32 1538 2714 64
> Intel(R) Xeon(TM) MP CPU 3.00GHz
> address sizes : 40 bits physical, 48 bits virtual
>
>
> Folkert van Heusden

Hi Folkert

Your results are interesting : size-32 seems to use objects of size 64
!

> size-32 1538 2714 64 <<HERE>>

So I guess that size-32 cache could be avoided at least for EMT (I take
you
run a 64 bits kernel ?)

Eric

Folkert van Heusden

unread,

Dec 21, 2005, 9:09:49 AM12/21/05

to Eric Dumazet, Ed Tomlinson, linux-...@vger.kernel.org, Andi Kleen

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> >size-131072 0 0 131072

> >size-65536 0 0 65536
> >size-32768 20 20 32768
> >size-16384 8 9 16384
> >size-8192 37 38 8192
> >size-4096 269 269 4096
> >size-2048 793 910 2048
> >size-1024 564 608 1024
> >size-512 702 856 512
> >size-256 1485 4005 256
> >size-128 1209 1350 128
> >size-64 2858 3363 64
> >size-32 1538 2714 64
> >Intel(R) Xeon(TM) MP CPU 3.00GHz
> >address sizes : 40 bits physical, 48 bits virtual
>

> Your results are interesting : size-32 seems to use objects of size 64 !
> > size-32 1538 2714 64 <<HERE>>
> So I guess that size-32 cache could be avoided at least for EMT (I take you
> run a 64 bits kernel ?)

I think I do yes:
Linux xxxxx 2.4.21-37.EL #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
It is a redhat 4 x64 system.
Also from /proc/cpuinfo:

address sizes : 40 bits physical, 48 bits virtual

Folkert van Heusden

- --

Try MultiTail! Multiple windows with logfiles, filtered with regular
expressions, colored output, etc. etc. www.vanheusden.com/multitail/
- ----------------------------------------------------------------------
Get your PGP/GPG key signed at www.biglumber.com!
- ----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iIMEARECAEMFAkOpYf08Gmh0dHA6Ly93d3cudmFuaGV1c2Rlbi5jb20vZGF0YS1z
aWduaW5nLXdpdGgtcGdwLXBvbGljeS5odG1sAAoJEDAZDowfKNiugqYAoJWSoI9M
O1sYrhWfFCoyTWweGN29AKCfPy46A1XHYC598IN4TXRSV2u6QA==
=xMjS
-----END PGP SIGNATURE-----

Dave Jones

unread,

Dec 21, 2005, 11:41:44 AM12/21/05

to Folkert van Heusden, Eric Dumazet, Ed Tomlinson, linux-...@vger.kernel.org, Andi Kleen

On Wed, Dec 21, 2005 at 03:09:02PM +0100, Folkert van Heusden wrote:

> > Your results are interesting : size-32 seems to use objects of size 64 !
> > > size-32 1538 2714 64 <<HERE>>
> > So I guess that size-32 cache could be avoided at least for EMT (I take you
> > run a 64 bits kernel ?)
>
> I think I do yes:
> Linux xxxxx 2.4.21-37.EL #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
> It is a redhat 4 x64 system.

Looks more like RHEL3 judging from the kernel version.

Dave

Folkert van Heusden

unread,

Dec 21, 2005, 2:36:57 PM12/21/05

to Dave Jones, Eric Dumazet, Ed Tomlinson, linux-...@vger.kernel.org, Andi Kleen

> > > Your results are interesting : size-32 seems to use objects of size 64 !
> > > > size-32 1538 2714 64 <<HERE>>
> > > So I guess that size-32 cache could be avoided at least for EMT (I take you
> > > run a 64 bits kernel ?)
> > I think I do yes:
> > Linux xxxxx 2.4.21-37.EL #1 SMP Wed Sep 7 13:32:18 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
> > It is a redhat 4 x64 system.
> Looks more like RHEL3 judging from the kernel version.

Ehr yes, you're totally right.

Folkert van Heusden

--
Try MultiTail! Multiple windows with logfiles, filtered with regular
expressions, colored output, etc. etc. www.vanheusden.com/multitail/

----------------------------------------------------------------------
Get your PGP/GPG key signed at www.biglumber.com!

----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

Denis Vlasenko

unread,

Dec 28, 2005, 3:34:09 AM12/28/05

to Eric Dumazet, linux-...@vger.kernel.org, Andi Kleen

size-32768 1 1 32768
size-16384 0 0 16384
size-8192 253 253 8192
size-4096 89 89 4096
size-2048 248 248 2048
size-1024 312 312 1024
size-512 545 648 512
size-256 213 270 256
size-128 5642 5642 128
size-64 1025 1586 64
size-32 2262 7854 32

Denis Vlasenko

unread,

Dec 28, 2005, 3:55:35 AM12/28/05

to Eric Dumazet, linux-...@vger.kernel.org, Andi Kleen

Wow... I overlooked that you are requesting data from x86_64 boxes.
Mine is not, it's i386...
--
vda

Andreas Kleen

unread,

Dec 28, 2005, 12:57:53 PM12/28/05

to Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

Am Mi 28.12.2005 09:54 schrieb Denis Vlasenko <v...@ilport.com.ua>:

> > # grep "size-" /proc/slabinfo |grep -v DMA|cut -c1-40
> > size-131072 0 0 131072
> > size-65536 0 0 65536
> > size-32768 1 1 32768
> > size-16384 0 0 16384
> > size-8192 253 253 8192
> > size-4096 89 89 4096
> > size-2048 248 248 2048
> > size-1024 312 312 1024
> > size-512 545 648 512
> > size-256 213 270 256
> > size-128 5642 5642 128
> > size-64 1025 1586 64
> > size-32 2262 7854 32
>
> Wow... I overlooked that you are requesting data from x86_64 boxes.
> Mine is not, it's i386...

This whole discussion is pointless anyways because most kmallocs are
constant
sized and with a constant sized kmalloc the slab is selected at compile
time.

What would be more interesting would be to redo the complete kmalloc
slab list.

I remember the original slab paper from Bonwick actually mentioned that
power of
two slabs are the worst choice for a malloc - but for some reason Linux
chose them
anyways. That would require a lot of measurements in different workloads
on the
actual kmalloc sizes and then select a good list, but could ultimately
safe
a lot of memory (ok not that much anymore because the memory intensive
allocations should all have their own caches, but at least some)

Most likely the best list is different for 32bit and 64bit too.

Note that just looking at slabinfo is not enough for this - you need the
original
sizes as passed to kmalloc, not the rounded values reported there.
Should be probably not too hard to hack a simple monitoring script up
for that
in systemtap to generate the data.

-Andi

Matt Mackall

unread,

Dec 28, 2005, 4:05:59 PM12/28/05

to Andreas Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

Something like this:

http://lwn.net/Articles/124374/

--
Mathematics is the supreme nostalgia of our time.

Dave Jones

unread,

Dec 28, 2005, 8:29:43 PM12/28/05

to Matt Mackall, linux-...@vger.kernel.org

> Something like this:
>
> http://lwn.net/Articles/124374/

One thing that really sticks out like a sore thumb is soft_cursor()
That thing gets called a *lot*, and every time it does a kmalloc/free
pair that 99.9% of the time is going to be the same size alloc as
it was the last time. This patch makes that alloc persistent
(and does a realloc if the size changes).
The only time it should change is if the font/resolution changes I think.

Boot tested with vesafb & fbconsole, which had the desired effect.
With this patch, it almost falls off the profile.

Signed-off-by: Dave Jones <da...@redhat.com>

--- linux-2.6.14/drivers/video/console/softcursor.c~ 2005-12-28 18:40:08.000000000 -0500
+++ linux-2.6.14/drivers/video/console/softcursor.c 2005-12-28 18:45:50.000000000 -0500
@@ -23,7 +23,9 @@ int soft_cursor(struct fb_info *info, st
unsigned int buf_align = info->pixmap.buf_align - 1;
unsigned int i, size, dsize, s_pitch, d_pitch;
struct fb_image *image;
- u8 *dst, *src;
+ u8 *dst;
+ static u8 *src=NULL;
+ static int allocsize=0;

if (info->state != FBINFO_STATE_RUNNING)
return 0;
@@ -31,9 +33,15 @@ int soft_cursor(struct fb_info *info, st
s_pitch = (cursor->image.width + 7) >> 3;
dsize = s_pitch * cursor->image.height;

- src = kmalloc(dsize + sizeof(struct fb_image), GFP_ATOMIC);
- if (!src)
- return -ENOMEM;
+ if (dsize + sizeof(struct fb_image) != allocsize) {
+ if (src != NULL)
+ kfree(src);
+ allocsize = dsize + sizeof(struct fb_image);
+
+ src = kmalloc(allocsize, GFP_ATOMIC);
+ if (!src)
+ return -ENOMEM;
+ }

image = (struct fb_image *) (src + dsize);
*image = cursor->image;
@@ -61,7 +69,6 @@ int soft_cursor(struct fb_info *info, st
fb_pad_aligned_buffer(dst, d_pitch, src, s_pitch, image->height);
image->data = dst;
info->fbops->fb_imageblit(info, image);
- kfree(src);
return 0;

Dave Jones

unread,

Dec 28, 2005, 8:29:45 PM12/28/05

to Matt Mackall, Andreas Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Wed, Dec 28, 2005 at 03:01:25PM -0600, Matt Mackall wrote:

> Something like this:
>
> http://lwn.net/Articles/124374/

Nice toy. Variant attached that works on 2.6.15rc7
- ->cs_size compile error fixed
- inlines kstrdup and kzalloc.
Otherwise these functions dominate the profile.

Dave

linux-2.6-debug-account-kmalloc.patch

Keith Owens

unread,

Dec 28, 2005, 8:50:41 PM12/28/05

to Dave Jones, Matt Mackall, linux-...@vger.kernel.org

Dave Jones (on Wed, 28 Dec 2005 20:29:15 -0500) wrote:
>
> > Something like this:
> >
> > http://lwn.net/Articles/124374/
>
>One thing that really sticks out like a sore thumb is soft_cursor()
>That thing gets called a *lot*, and every time it does a kmalloc/free
>pair that 99.9% of the time is going to be the same size alloc as
>it was the last time. This patch makes that alloc persistent
>(and does a realloc if the size changes).
>The only time it should change is if the font/resolution changes I think.

Can soft_cursor() be called from multiple processes at the same time,
in particular with dual head systems? If so then a static variable is
not going to work.

Dave Jones

unread,

Dec 28, 2005, 9:43:03 PM12/28/05

to Keith Owens, Matt Mackall, linux-...@vger.kernel.org

On Thu, Dec 29, 2005 at 12:50:10PM +1100, Keith Owens wrote:
> Dave Jones (on Wed, 28 Dec 2005 20:29:15 -0500) wrote:
> >
> > > Something like this:
> > >
> > > http://lwn.net/Articles/124374/
> >
> >One thing that really sticks out like a sore thumb is soft_cursor()
> >That thing gets called a *lot*, and every time it does a kmalloc/free
> >pair that 99.9% of the time is going to be the same size alloc as
> >it was the last time. This patch makes that alloc persistent
> >(and does a realloc if the size changes).
> >The only time it should change is if the font/resolution changes I think.
>
> Can soft_cursor() be called from multiple processes at the same time,
> in particular with dual head systems? If so then a static variable is
> not going to work.

My dual-head system here displays a cloned image on the second
screen, which seems to dtrt. I'm not sure how to make it show
something different on the other head to test further.

Dave

Steven Rostedt

unread,

Dec 29, 2005, 2:49:48 PM12/29/05

to Andreas Kleen, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Matt Mackall, Dave Jones

On Wed, 2005-12-28 at 18:57 +0100, Andreas Kleen wrote:
[...]

>
> This whole discussion is pointless anyways because most kmallocs are
> constant
> sized and with a constant sized kmalloc the slab is selected at compile
> time.
>
> What would be more interesting would be to redo the complete kmalloc
> slab list.
>
> I remember the original slab paper from Bonwick actually mentioned that
> power of
> two slabs are the worst choice for a malloc - but for some reason Linux
> chose them
> anyways. That would require a lot of measurements in different workloads
> on the
> actual kmalloc sizes and then select a good list, but could ultimately
> safe
> a lot of memory (ok not that much anymore because the memory intensive
> allocations should all have their own caches, but at least some)
>
> Most likely the best list is different for 32bit and 64bit too.
>
> Note that just looking at slabinfo is not enough for this - you need the
> original
> sizes as passed to kmalloc, not the rounded values reported there.
> Should be probably not too hard to hack a simple monitoring script up
> for that
> in systemtap to generate the data.
>

OK then, after reading this I figured there must be a way to dynamically
allocate slab sizes based on the kmalloc constants. So I spent last
night and some of this morning coming up with the below patch.

Right now it only works with i386, but I'm sure it can be hacked to work
with all archs. At compile time it creates a table of sizes for all
kmallocs (outside of slab.c and arch/i386/mm/init.c) that uses a
constant declaration.

This table is then initialized in arch/i386/mm/init.c to use a cache
that is either already created (like the mem_sizes array) or it creates
a new cache of that size (L1 cached aligned), and then updates the
pointers to use that cache.

Here's what was created on my test box:

cat /proc/slabinfo
[...]
dynamic_dma-1536 0 0 1536 5 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic-1536 1 5 1536 5 2 : tunables 24 12 0 : slabdata 1 1 0
dynamic_dma-1280 0 0 1280 3 1 : tunables 24 12 0 : slabdata 0 0 0
dynamic-1280 6 6 1280 3 1 : tunables 24 12 0 : slabdata 2 2 0
dynamic_dma-2176 0 0 2176 3 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic-2176 0 0 2176 3 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic_dma-1152 0 0 1152 7 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic-1152 0 0 1152 7 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic_dma-1408 0 0 1408 5 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic-1408 0 0 1408 5 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic_dma-640 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic-640 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic_dma-768 0 0 768 5 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic-768 0 0 768 5 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic_dma-3200 0 0 3200 2 2 : tunables 24 12 0 : slabdata 0 0 0
dynamic-3200 8 8 3200 2 2 : tunables 24 12 0 : slabdata 4 4 0
dynamic_dma-896 0 0 896 4 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic-896 9 12 896 4 1 : tunables 54 27 0 : slabdata 3 3 0
dynamic_dma-384 0 0 384 10 1 : tunables 54 27 0 : slabdata 0 0 0
dynamic-384 40 40 384 10 1 : tunables 54 27 0 : slabdata 4 4 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 40 40 8192 1 2 : tunables 8 4 0 : slabdata 40 40 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 34 34 4096 1 1 : tunables 24 12 0 : slabdata 34 34 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 266 266 2048 2 1 : tunables 24 12 0 : slabdata 133 133 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 24 24 1024 4 1 : tunables 54 27 0 : slabdata 6 6 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 90 112 512 8 1 : tunables 54 27 0 : slabdata 14 14 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 735 735 256 15 1 : tunables 120 60 0 : slabdata 49 49 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 2750 2760 128 30 1 : tunables 120 60 0 : slabdata 92 92 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 418 472 64 59 1 : tunables 120 60 0 : slabdata 8 8 0
size-32 1175 1243 32 113 1 : tunables 120 60 0 : slabdata 11 11 0
[...]

Not sure if this is worth looking further into, but it might actually be
a way to use less memory. For example, the above 384 size with 40
objects cost only 4 4k pages, where as these same objects would be 40
512 objects (in size-512) costing 5 4k pages. Plus the 384 probably has
ON_SLAB management where as the 512 does not.

Comments?

-- Steve

Index: linux-2.6.15-rc7/arch/i386/Kconfig
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/Kconfig 2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/Kconfig 2005-12-29 09:09:53.000000000 -0500
@@ -173,6 +173,14 @@
depends on HPET_TIMER && RTC=y
default y

+config DYNAMIC_SLABS
+ bool "Dynamically create slabs for constant kmalloc"
+ default y
+ help
+ This enables the creation of SLABS using information created at
+ compile time. Then on boot up, the slabs are created to fit
+ more with what was asked for.
+
config SMP
bool "Symmetric multi-processing support"
---help---
Index: linux-2.6.15-rc7/arch/i386/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/kernel/vmlinux.lds.S 2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/kernel/vmlinux.lds.S 2005-12-29 09:09:53.000000000 -0500
@@ -68,6 +68,13 @@
*(.data.init_task)
}

+#ifdef CONFIG_DYNAMIC_SLABS
+ . = ALIGN(16); /* dynamic slab table */
+ __start____slab_addresses = .;
+ __slab_addresses : AT(ADDR(__slab_addresses) - LOAD_OFFSET) { *(__slab_addresses) }
+ __stop____slab_addresses = .;
+#endif
+
/* will be freed after init */
. = ALIGN(4096); /* Init code and data */
__init_begin = .;
@@ -107,6 +114,14 @@
.altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement)
}
+#ifdef CONFIG_DYNAMIC_SLABS
+ . = ALIGN(16); /* dynamic slab table */
+ __start____slab_preprocess = .;
+ __slab_preprocess : AT(ADDR(__slab_preprocess) - LOAD_OFFSET) { *(__slab_preprocess) }
+ __slab_process_ret : AT(ADDR(__slab_process_ret) - LOAD_OFFSET) { *(__slab_process_ret) }
+ __stop____slab_preprocess = .;
+#endif
+
/* .exit.text is discard at runtime, not link time, to deal with references
from .altinstructions and .eh_frame */
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
@@ -119,7 +134,7 @@
__per_cpu_start = .;
.data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
__per_cpu_end = .;
- . = ALIGN(4096);
+ . = ALIGN(4096);
__init_end = .;
/* freed after init ends here */

Index: linux-2.6.15-rc7/arch/i386/mm/init.c
===================================================================
--- linux-2.6.15-rc7.orig/arch/i386/mm/init.c 2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/arch/i386/mm/init.c 2005-12-29 14:31:08.000000000 -0500
@@ -6,6 +6,7 @@
* Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999
*/

+#define DYNAMIC_SLABS_BOOTSTRAP
#include <linux/config.h>
#include <linux/module.h>
#include <linux/signal.h>
@@ -748,3 +749,187 @@
}
}
#endif
+
+#ifdef CONFIG_DYNAMIC_SLABS
+extern void __start____slab_preprocess(void);
+extern unsigned long __start____slab_addresses;
+extern unsigned long __stop____slab_addresses;
+
+static __initdata LIST_HEAD(slablist);
+
+struct slab_links {
+ struct cache_sizes *c;
+ struct list_head list;
+};
+
+static struct cache_sizes *find_slab_size(int size)
+{
+ struct list_head *curr;
+ struct slab_links *s;
+
+ list_for_each(curr, &slablist) {
+ s = list_entry(curr, struct slab_links, list);
+ if (s->c->cs_size == size)
+ return s->c;
+ }
+ return NULL;
+}
+
+static void free_slablist(void)
+{
+ struct list_head *curr, *next;
+ struct slab_links *s;
+
+ list_for_each_safe(curr, next, &slablist) {
+ s = list_entry(curr, struct slab_links, list);
+ list_del(&s->list);
+ kfree(s);
+ }
+}
+
+#ifndef ARCH_KMALLOC_MINALIGN
+#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
+#endif
+#ifndef ARCH_KMALLOC_FLAGS
+#define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN
+#endif
+#define BYTES_PER_WORD sizeof(void *)
+
+#ifdef DEBUG_ADDR
+static __init void print_slab_addresses(int hex)
+{
+ unsigned long *slab_addresses = &__start____slab_addresses;
+ unsigned long *end = &__stop____slab_addresses;
+
+
+ for (; slab_addresses < end; slab_addresses++) {
+ if (hex)
+ printk("slab %p = %lx\n",slab_addresses, *slab_addresses);
+ else
+ printk("slab %p = %ld\n",slab_addresses, *slab_addresses);
+ }
+}
+#else
+# define print_slab_addresses(x) do {} while(0)
+#endif
+
+int __init dynamic_slab_init(void)
+{
+ unsigned long *slab_addresses = &__start____slab_addresses;
+ unsigned long *end = &__stop____slab_addresses;
+ struct cache_sizes *c;
+ struct slab_links *s;
+ unsigned long sizes[] = {
+#define CACHE(C) C,
+#include <linux/kmalloc_sizes.h>
+#undef CACHE
+ };
+ int i;
+
+
+ asm (".section __slab_process_ret,\"ax\"\n"
+ "ret\n"
+ ".previous\n");
+
+ __start____slab_preprocess();
+
+ printk("Before update!\n");
+ print_slab_addresses(0);
+
+ /*
+ * DYNAMIC_SLABS_BOOTSTRAP is defined, so we don't need
+ * to worry about kmalloc hardcoded.
+ */
+
+ /*
+ * This is really bad, but I don't want to go monkey up the
+ * slab.c to get to the cache_chain. So right now I just
+ * allocate a pointer list to search for slabs that are
+ * of the right size, and then free it at the end.
+ *
+ * Hey, you find a better way, then fix this ;)
+ */
+ for (i=0; i < sizeof(sizes)/sizeof(sizes[0]); i++) {
+ s = kmalloc(sizeof(*s), GFP_ATOMIC);
+ if (!s)
+ panic("Can't create link list for slabs\n");
+ s->c = &malloc_sizes[i];
+ list_add_tail(&s->list, &slablist);
+ }
+
+ for (; slab_addresses < end; slab_addresses++) {
+ char *name;
+ char *name_dma;
+ unsigned long size = *slab_addresses;
+ struct cache_sizes **ptr = (struct cache_sizes**)slab_addresses;
+
+ if (!size)
+ continue;
+
+ size = (size + (L1_CACHE_BYTES-1)) & ~(L1_CACHE_BYTES-1);
+ if (size < BYTES_PER_WORD)
+ size = BYTES_PER_WORD;
+ if (size < ARCH_KMALLOC_MINALIGN)
+ size = ARCH_KMALLOC_MINALIGN;
+
+ c = find_slab_size(size);
+ if (c) {
+ *ptr = c;
+ continue;
+ }
+
+ /*
+ * Create a cache for this specific size.
+ */
+ name = kmalloc(25, GFP_ATOMIC);
+ if (!name)
+ panic("Can't allocate name for dynamic slab\n");
+
+ snprintf(name, 25, "dynamic-%ld", size);
+ name_dma = kmalloc(25, GFP_ATOMIC);
+ if (!name_dma)
+ panic("Can't allocate name for dynamic slab\n");
+
+ snprintf(name_dma, 25, "dynamic_dma-%ld", size);
+
+ c = kmalloc(sizeof(*c), GFP_ATOMIC);
+
+ if (!c)
+ panic("Can't allocate cache_size descriptor\n");
+
+ c->cs_size = size;
+
+ /*
+ * For performance, all the general caches are L1 aligned.
+ * This should be particularly beneficial on SMP boxes, as it
+ * eliminates "false sharing".
+ * Note for systems short on memory removing the alignment will
+ * allow tighter packing of the smaller caches.
+ */
+ c->cs_cachep = kmem_cache_create(name,
+ c->cs_size, ARCH_KMALLOC_MINALIGN,
+ (ARCH_KMALLOC_FLAGS | SLAB_PANIC), NULL, NULL);
+
+ c->cs_dmacachep = kmem_cache_create(name_dma,
+ c->cs_size, ARCH_KMALLOC_MINALIGN,
+ (ARCH_KMALLOC_FLAGS | SLAB_CACHE_DMA | SLAB_PANIC),
+ NULL, NULL);
+
+ s = kmalloc(sizeof(*s), GFP_ATOMIC);
+ if (!s)
+ panic("Can't create link list for slabs\n");
+ s->c = c;
+ list_add_tail(&s->list, &slablist);
+
+ *ptr = c;
+
+ }
+
+ free_slablist();
+
+ printk("\nAfter update!\n");
+ print_slab_addresses(1);
+
+ return 0;
+}
+#endif
Index: linux-2.6.15-rc7/include/asm-i386/dynamic_slab.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc7/include/asm-i386/dynamic_slab.h 2005-12-29 09:09:53.000000000 -0500
@@ -0,0 +1,20 @@
+
+/*
+ * Included in slab.h
+ *
+ * @c - cache pointer to return base on size
+ * @size - size of cache.
+ */
+__asm__ __volatile__ (
+ "jmp 2f\n"
+ ".section __slab_preprocess,\"ax\"\n"
+ "movl %1,1f\n"
+ ".previous\n"
+ ".section __slab_addresses,\"aw\"\n"
+ ".align 4\n"
+ "1:\n"
+ ".long 0\n"
+ ".previous\n"
+ "2:\n"
+ "movl 1b, %0\n"
+ : "=r"(c) : "i"(size));
Index: linux-2.6.15-rc7/include/linux/slab.h
===================================================================
--- linux-2.6.15-rc7.orig/include/linux/slab.h 2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/include/linux/slab.h 2005-12-29 09:23:44.000000000 -0500
@@ -80,6 +80,15 @@
{
if (__builtin_constant_p(size)) {
int i = 0;
+#if defined(CONFIG_DYNAMIC_SLABS) && !defined(MODULE) && !defined(DYNAMIC_SLABS_BOOTSTRAP)
+ {
+ struct cache_sizes *c;
+# include <asm/dynamic_slab.h>
+ return kmem_cache_alloc((flags & GFP_DMA) ?
+ c->cs_dmacachep :
+ c->cs_cachep, flags);
+ }
+#endif
#define CACHE(x) \
if (size <= x) \
goto found; \
Index: linux-2.6.15-rc7/mm/slab.c
===================================================================
--- linux-2.6.15-rc7.orig/mm/slab.c 2005-12-29 09:09:29.000000000 -0500
+++ linux-2.6.15-rc7/mm/slab.c 2005-12-29 14:04:44.000000000 -0500
@@ -86,6 +86,7 @@
* All object allocations for a node occur from node specific slab lists.
*/

+#define DYNAMIC_SLABS_BOOTSTRAP
#include <linux/config.h>
#include <linux/slab.h>
#include <linux/mm.h>
@@ -1165,6 +1166,19 @@
/* Done! */
g_cpucache_up = FULL;

+#ifdef CONFIG_DYNAMIC_SLABS
+ {
+ extern int dynamic_slab_init(void);
+ /*
+ * Create the caches that will handle
+ * kmallocs of constant sizes.
+ */
+ dynamic_slab_init();
+ }
+#endif
+ /*
+ */
+
/* Register a cpu startup notifier callback
* that initializes ac_data for all new cpus
*/

Andi Kleen

unread,

Dec 29, 2005, 4:17:13 PM12/29/05

to Steven Rostedt, Andreas Kleen, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Matt Mackall, Dave Jones

> OK then, after reading this I figured there must be a way to dynamically
> allocate slab sizes based on the kmalloc constants. So I spent last
> night and some of this morning coming up with the below patch.

The canonical slab theory is that constant allocations are for fixed
objects. And if they are frequent they should be in theory kmem
cache because in theory their object live times should be similar
and clustering them together should give the best fragmentation
advoidance.

So in theory longer term the dynamic kmallocs are more important because
they cannot be handled like this - and these are not caught by
your patch.

So I'm not sure you're optimizing the right thing here.

Perhaps a good evolution your patch would be to add some analysis of
the callers and generate a nice compile time report that people can use as a
guideline to convert kmalloc over the kmem_cache_alloc. But to do this really
well would require dynamic data from runtime.

Given that I think a runtime patch is better. Ideally one that's easy
to use with someone collecting data from users and then submitting a patch
for a better new set of default slabs. Would need to be separate

for 32bit and 64bit too.

I guess one could run a fancy dynamic optimization algorithm to find
the best set of slabs from the data.

-Andi

Steven Rostedt

unread,

Dec 29, 2005, 11:07:41 PM12/29/05

to Dave Jones, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Andreas Kleen, Matt Mackall

Attached is a variant that was refreshed against 2.6.15-rc7 and fixes
the logical bug that your compile error fix made ;)

It should be cachep->objsize not csizep->cs_size.

-- Steve

linux-2.6-debug-account-kmalloc.patch

Marcelo Tosatti

unread,

Dec 30, 2005, 8:16:11 PM12/30/05

to Matt Mackall, Andreas Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

<snip>

> > Note that just looking at slabinfo is not enough for this - you need the
> > original
> > sizes as passed to kmalloc, not the rounded values reported there.
> > Should be probably not too hard to hack a simple monitoring script up
> > for that
> > in systemtap to generate the data.
>
> Something like this:
>
> http://lwn.net/Articles/124374/

Written with a systemtap script:
http://sourceware.org/ml/systemtap/2005-q3/msg00550.html

Andi Kleen

unread,

Jan 1, 2006, 1:23:11 PM1/1/06

to Marcelo Tosatti, Matt Mackall, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Friday 30 December 2005 22:13, Marcelo Tosatti wrote:
>
> <snip>
>
> > > Note that just looking at slabinfo is not enough for this - you need the
> > > original
> > > sizes as passed to kmalloc, not the rounded values reported there.
> > > Should be probably not too hard to hack a simple monitoring script up
> > > for that
> > > in systemtap to generate the data.
> >
> > Something like this:
> >
> > http://lwn.net/Articles/124374/
>
> Written with a systemtap script:
> http://sourceware.org/ml/systemtap/2005-q3/msg00550.html

I had actually written a similar script on my own before,
but I found it was near completely unusable on a 4core Opteron
system even under moderate load because systemtap bombed out
when it needed more than one spin to take the lock of the
shared hash table.

(it basically did if (!spin_trylock()) ... stop script; ...)

The problem was that the backtraces took so long that another
CPU very often run into the locked lock.

Still with a stripped down script without backtraces had some
interesting results. In particular my init was reading some
file in /proc 10 times a second, allocating 4K (wtf did it do that?) and
some other somewhat surprising results.

-Andi

Pekka Enberg

unread,

Jan 2, 2006, 3:41:04 AM1/2/06

to Andreas Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On 12/28/05, Andreas Kleen <a...@suse.de> wrote:
> I remember the original slab paper from Bonwick actually mentioned that
> power of two slabs are the worst choice for a malloc - but for some reason Linux
> chose them anyways.

Power of two sizes are bad because memory accesses tend to concentrate
on the same cache lines but slab coloring should take care of that. So
I don't think there's a problem with using power of twos for kmalloc()
caches.

Pekka

Pekka Enberg

unread,

Jan 2, 2006, 3:47:35 AM1/2/06

to Steven Rostedt, Dave Jones, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Andreas Kleen, Matt Mackall

Hi,

On 12/30/05, Steven Rostedt <ros...@goodmis.org> wrote:
> Attached is a variant that was refreshed against 2.6.15-rc7 and fixes
> the logical bug that your compile error fix made ;)
>
> It should be cachep->objsize not csizep->cs_size.

Isn't there any other way to do this patch other than making kzalloc()
and kstrdup() inline? I would like to see something like this in the
mainline but making them inline is not acceptable because they
increase kernel text a lot.

Pekka Enberg

unread,

Jan 2, 2006, 3:51:32 AM1/2/06

to Steven Rostedt, Dave Jones, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Andreas Kleen, Matt Mackall

On 12/30/05, Steven Rostedt <ros...@goodmis.org> wrote:
> > Attached is a variant that was refreshed against 2.6.15-rc7 and fixes
> > the logical bug that your compile error fix made ;)
> >
> > It should be cachep->objsize not csizep->cs_size.

On 1/2/06, Pekka Enberg <pen...@cs.helsinki.fi> wrote:
> Isn't there any other way to do this patch other than making kzalloc()
> and kstrdup() inline? I would like to see something like this in the
> mainline but making them inline is not acceptable because they
> increase kernel text a lot.

Also, wouldn't it be better to track kmem_cache_alloc and
kmem_cache_alloc_node instead?

Steven Rostedt

unread,

Jan 2, 2006, 7:32:35 AM1/2/06

to Pekka Enberg, Dave Jones, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Andreas Kleen, Matt Mackall

On Mon, 2 Jan 2006, Pekka Enberg wrote:

> Hi,
>
> On 12/30/05, Steven Rostedt <ros...@goodmis.org> wrote:
> > Attached is a variant that was refreshed against 2.6.15-rc7 and fixes
> > the logical bug that your compile error fix made ;)
> >
> > It should be cachep->objsize not csizep->cs_size.
>
> Isn't there any other way to do this patch other than making kzalloc()
> and kstrdup() inline? I would like to see something like this in the
> mainline but making them inline is not acceptable because they
> increase kernel text a lot.

Actually, yes. I was adding to this patch something to be more specific,
and to either pass the EIP through the parameter or a __FILE__, __LINE__.

Using the following:

#ifdef CONFIG_KMALLOC_ACCOUNTING
# define __EIP__ , __builtin_return_address(0)
# define __DECLARE_EIP__ , void *eip
#else
# define __EIP__
# define __DECLARE_EIP__
#endif

#define kstrdup(s,g) __kstrdup(s, g __EIP__)
extern char *__kstrdup(const char *s, gfp_t g __DECLARE_EIP__);

Or a file line can be used:

# define __EIP__ , __FILE__, __LINE__
# define __DECLARE_EIP__ , char *file, int line

-- Steve

Steven Rostedt

unread,

Jan 2, 2006, 7:34:45 AM1/2/06

to Pekka Enberg, Dave Jones, linux-...@vger.kernel.org, Eric Dumazet, Denis Vlasenko, Andreas Kleen, Matt Mackall

On Mon, 2 Jan 2006, Pekka Enberg wrote:

>
> Also, wouldn't it be better to track kmem_cache_alloc and
> kmem_cache_alloc_node instead?
>

I believe they are very interested in when kmalloc and kfree are used,
since those are the ones for the generic slabs. And even then, they are
only profiling the ones that use a dynamic allocation. (the kmalloc and
kfree of sizeof(x) is not profiled). This was brought up earlier in the
thread.

-- Steve

Andi Kleen

unread,

Jan 2, 2006, 7:46:28 AM1/2/06

to Pekka Enberg, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Monday 02 January 2006 09:37, Pekka Enberg wrote:
> On 12/28/05, Andreas Kleen <a...@suse.de> wrote:
> > I remember the original slab paper from Bonwick actually mentioned that
> > power of two slabs are the worst choice for a malloc - but for some reason Linux
> > chose them anyways.
>
> Power of two sizes are bad because memory accesses tend to concentrate
> on the same cache lines but slab coloring should take care of that. So
> I don't think there's a problem with using power of twos for kmalloc()
> caches.

There is - who tells you it's the best possible distribution of memory?

-Andi

Pekka J Enberg

unread,

Jan 2, 2006, 8:04:51 AM1/2/06

to Andi Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On 12/28/05, Andreas Kleen <a...@suse.de> wrote:
> > > I remember the original slab paper from Bonwick actually mentioned that
> > > power of two slabs are the worst choice for a malloc - but for some reason Linux
> > > chose them anyways.

On Monday 02 January 2006 09:37, Pekka Enberg wrote:
> > Power of two sizes are bad because memory accesses tend to concentrate
> > on the same cache lines but slab coloring should take care of that. So
> > I don't think there's a problem with using power of twos for kmalloc()
> > caches.

On Mon, 2 Jan 2006, Andi Kleen wrote:
> There is - who tells you it's the best possible distribution of memory?

Maybe it's not. But that's besides the point. The specific problem Bonwick
mentioned is related to cache line distribution and should be taken care
of by slab coloring. Internal fragmentation is painful but the worst
offenders can be fixed with kmem_cache_alloc(). So I really don't see the
problem. On the other hand, I am not opposed to dynamic generic slabs if
you can show a clear performance benefit from it. I just doubt you will.

Pekka

Andi Kleen

unread,

Jan 2, 2006, 8:59:32 AM1/2/06

to Pekka J Enberg, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Monday 02 January 2006 14:04, Pekka J Enberg wrote:

> Maybe it's not. But that's besides the point.

It was my point. I don't know what your point was.

> The specific problem Bonwick
> mentioned is related to cache line distribution and should be taken care
> of by slab coloring. Internal fragmentation is painful but the worst
> offenders can be fixed with kmem_cache_alloc(). So I really don't see the
> problem. On the other hand, I am not opposed to dynamic generic slabs if
> you can show a clear performance benefit from it. I just doubt you will.

I wasn't proposing fully dynamic slabs, just a better default set
of slabs based on real measurements instead of handwaving (like
the power of two slabs seemed to have been generated). With separate
sets for 32bit and 64bit.

Also the goal wouldn't be better performance, but just less waste of memory.

I suspect such a move could save much more memory on small systems
than any of these "make fundamental debugging tools a CONFIG" patches ever.

-Andi

Helge Hafting

unread,

Jan 2, 2006, 10:00:07 AM1/2/06

to Dave Jones, Keith Owens, Matt Mackall, linux-...@vger.kernel.org

On Wed, Dec 28, 2005 at 09:39:06PM -0500, Dave Jones wrote:
> On Thu, Dec 29, 2005 at 12:50:10PM +1100, Keith Owens wrote:
> > Dave Jones (on Wed, 28 Dec 2005 20:29:15 -0500) wrote:
> > >
> > > > Something like this:
> > > >
> > > > http://lwn.net/Articles/124374/
> > >
> > >One thing that really sticks out like a sore thumb is soft_cursor()
> > >That thing gets called a *lot*, and every time it does a kmalloc/free
> > >pair that 99.9% of the time is going to be the same size alloc as
> > >it was the last time. This patch makes that alloc persistent
> > >(and does a realloc if the size changes).
> > >The only time it should change is if the font/resolution changes I think.
> >
> > Can soft_cursor() be called from multiple processes at the same time,
> > in particular with dual head systems? If so then a static variable is
> > not going to work.
>
> My dual-head system here displays a cloned image on the second
> screen, which seems to dtrt. I'm not sure how to make it show
> something different on the other head to test further.

Few dualhead drivers actually support two different framebuffers,
but the matrox G550 (and G400) drivers do. Compile one
of those, make sure to configure dualhead support.
After booting up, use "matroxset" to set the
framebuffer to vga-connector mapping so that the two
outputs actually show the different framebuffers.

Another way is to use several graphichs cards (AGP getting
the first framebuffer and each PCI card getting others as
the drivers load.)

Helge Hafting

Pekka J Enberg

unread,

Jan 2, 2006, 10:09:58 AM1/2/06

to Andi Kleen, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Mon, 2 Jan 2006, Andi Kleen wrote:

> I wasn't proposing fully dynamic slabs, just a better default set
> of slabs based on real measurements instead of handwaving (like
> the power of two slabs seemed to have been generated). With separate
> sets for 32bit and 64bit.
>
> Also the goal wouldn't be better performance, but just less waste of memory.
>
> I suspect such a move could save much more memory on small systems
> than any of these "make fundamental debugging tools a CONFIG" patches ever.

I misunderstood what you were proposing. Sorry. It makes sense to measure
it.

Pekka

JörnEngel

unread,

Jan 2, 2006, 10:47:41 AM1/2/06

to Andi Kleen, Pekka J Enberg, Denis Vlasenko, Eric Dumazet, linux-...@vger.kernel.org

On Mon, 2 January 2006 14:56:22 +0100, Andi Kleen wrote:
>
> I wasn't proposing fully dynamic slabs, just a better default set
> of slabs based on real measurements instead of handwaving (like
> the power of two slabs seemed to have been generated). With separate
> sets for 32bit and 64bit.
>
> Also the goal wouldn't be better performance, but just less waste of
memory.

My fear would be that this leads to something like the gperf: a
perfect distribution of slab caches - until any tiny detail changes.
But maybe there is a different distribution that is "pretty good" for
all configurations and better than powers of two.

> I suspect such a move could save much more memory on small systems
> than any of these "make fundamental debugging tools a CONFIG" patches
ever.

Unlikely. SLOB should be better than SLAB for those purposes, no
matter how you arrange the slab caches.

Jörn

--
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike

Dave Jones

unread,

Jan 4, 2006, 12:27:23 AM1/4/06

to Keith Owens, Matt Mackall, linux-...@vger.kernel.org

On Thu, Dec 29, 2005 at 12:50:10PM +1100, Keith Owens wrote:

> Dave Jones (on Wed, 28 Dec 2005 20:29:15 -0500) wrote:
> >
> > > Something like this:
> > >
> > > http://lwn.net/Articles/124374/
> >
> >One thing that really sticks out like a sore thumb is soft_cursor()
> >That thing gets called a *lot*, and every time it does a kmalloc/free
> >pair that 99.9% of the time is going to be the same size alloc as
> >it was the last time. This patch makes that alloc persistent
> >(and does a realloc if the size changes).
> >The only time it should change is if the font/resolution changes I think.
>
> Can soft_cursor() be called from multiple processes at the same time,
> in particular with dual head systems? If so then a static variable is
> not going to work.

I looked at this a little closer. If my understanding of the console/fb layers
is correct, soft_cursor() is serialised by the console_sem in
drivers/video/console/fbcon.c::fb_flashcursor()

Dave