Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

kmem_map too small with ZFS and 8.2-RELEASE

58 views
Skip to first unread message

Mickaël Canévet

unread,
Mar 4, 2011, 3:23:05 AM3/4/11
to
Hello,

I know there is a lot of threads about "kmem_map too small" problems on
former versions of FreeBSD with ZFS, but on the wiki
(http://wiki.freebsd.org/ZFSTuningGuide) it is said that "FreeBSD 7.2+
has improved kernel memory allocation strategy and no tuning may be
necessary on systems with more than 2 GB of RAM."

I have a 64bits machine with 16GB of RAM with FreeBSD 8.2-RELEASE and no
tuning:

# sysctl -a | grep -e "vm.kmem_size_max:" -e "vm.kmem_size:" -e
"vfs.zfs.arc_max:"
vm.kmem_size_max: 329853485875
vm.kmem_size: 16624558080
vfs.zfs.arc_max: 15550816256

This morning this server crashed with:

panic: kmem_malloc(1048576): kmem_map too small: 8658309120 total
allocated

At that time I was doing a quite huge "zfs send | zfs recv" to another
server, this is maybe the origin of the memory consumption.

Does anybody knows why the kernel crashed at around 8GB allocated while
vm.kmem_size and vfs.zfs.arc_max are set at around 16GB and
vm.kmem_size_max is set at around 300GB (is it not a little big huge by
the way ?).

Should I increase this values ?

Thanks a lot for your answers.

Mickaël

signature.asc

Ollivier Robert

unread,
Mar 4, 2011, 4:34:52 AM3/4/11
to
According to Mickaël Canévet:

> panic: kmem_malloc(1048576): kmem_map too small: 8658309120 total
> allocated

I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.

--
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- rob...@keltia.net
In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/

_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Mickaël Canévet

unread,
Mar 4, 2011, 4:48:53 AM3/4/11
to
> I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.

Should I also increase vfs.zfs.arc_max ?

Do you have any idea why the kernel panicked at only 8GB allocated ?

Thank you

signature.asc

Jeremy Chadwick

unread,
Mar 4, 2011, 5:05:17 AM3/4/11
to
On Fri, Mar 04, 2011 at 10:48:53AM +0100, Mickaël Canévet wrote:
> > I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.
>
> Should I also increase vfs.zfs.arc_max ?

You should adjust vm.kmem_size, but not vm.kmem_size_max.

You can adjust vfs.zfs.arc_max to basically ensure system stability.
This thread is acting as evidence that there are probably edge cases
where the kmem too small panic can still happen despite the limited ARC
maximum defaults.

For a 16GB system, I'd probably use these settings:

vm.kmem_size="16384M"
vfs.zfs.arc_max="13312M"

I would also use these two settings:

# Disable ZFS prefetching
# http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
# Increases overall speed of ZFS, but when disk flushing/writes occur,
# system is less responsive (due to extreme disk I/O).
# NOTE: Systems with 8GB of RAM or more have prefetch enabled by
# default.
vfs.zfs.prefetch_disable="1"

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds. This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
vfs.zfs.txg.timeout="5"

The advice in the Wiki is outdated, especially for 8.2-RELEASE. Best
not to follow it as of this writing.

> Do you have any idea why the kernel panicked at only 8GB allocated ?

I do not. A kernel developer will have to comment on that.

Please attempt to reproduce the problem. If you can reproduce it
reliably, this will greatly help kernel developers tracking down the
source of the problem.

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |

Jeremy Chadwick

unread,
Mar 4, 2011, 5:56:08 AM3/4/11
to
On Fri, Mar 04, 2011 at 11:30:04AM +0100, Matthias Gamsjager wrote:
> > # Disable ZFS prefetching
> > # http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
> > # Increases overall speed of ZFS, but when disk flushing/writes occur,
> > # system is less responsive (due to extreme disk I/O).
> > # NOTE: Systems with 8GB of RAM or more have prefetch enabled by
> > # default.
> > vfs.zfs.prefetch_disable="1"
> >
> >
> > I wonder if this is still the case since the website you referring to was
> from 2008 and I guess a lot has changed since then.

Yes, this is still the case. The website I linked in the comment is
dated, but still applicable in my experience (at least to FreeBSD). On
Solaris it doesn't appear applicable (every x86 Solaris 10 system at my
workplace uses ZFS exclusively, and we do not have to disable
prefetching; the performance is fine). We do have one Sol10 system
though that's IMAP-centric, uses a 3-disk raidz1, but has severe
performance (I/O) issues sometimes, but I'm chalking that up to being it
running Dovecot 1.0.0.

Circling back to FreeBSD: if you want further confirmations, you can
read about my findings with 8.0-RC1 here -- scroll down to about the
half way mark:

http://koitsu.wordpress.com/2009/10/12/testing-out-freebsd-8-0-rc1/

As far as I can tell from following commits since 8.0-RC1, nothing has
changed about the prefetch mechanism in FreeBSD (developers please
correct me if I'm wrong here).

I should also note that I think my "# NOTE:" comment is inaccurate; I
believe it should read "Systems with more than 4GB usable RAM default
to having prefetch enabled". I'll have to update my /boot/loader.conf
files.

> For example I get horrible performance with prefech disabled. (running on 4
> disk striped mirror)

Every system I manage (personally; unrelated to my above workplace) has
8GB of RAM in it and run amd64. Some contain 2-disk mirrors, some
contain single-disk pools, and a few are 3-disk raidz1. *All* of those
systems have historically seen abysmal I/O performance when prefetching
was enabled. The hardware is mostly the same (different CPU models, but
the same SATA controllers (ICH9R)), so it's not an issue with a single
system.

If you get better performance -- really, truly, honestly -- with
prefetch enabled on your system, then I strongly recommend you keep it
enabled. However, for what it's worth (probably not much), this is the
first I've ever heard of a FreeBSD system performing better with
prefetch enabled.

To be completely fair: I should probably dig out my test/stress system
and re-test performance with prefetch enabled, keeping in mind the other
tunables I use for ZFS (some loader, some sysctl). But those settings
are performance-related, and the initial topic of discussion was kmem
exhaustion. I don't want to get off-topic for the OP's sake.

Matthias Gamsjager

unread,
Mar 4, 2011, 5:30:04 AM3/4/11
to
> # Disable ZFS prefetching
> # http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
> # Increases overall speed of ZFS, but when disk flushing/writes occur,
> # system is less responsive (due to extreme disk I/O).
> # NOTE: Systems with 8GB of RAM or more have prefetch enabled by
> # default.
> vfs.zfs.prefetch_disable="1"
>
>
> I wonder if this is still the case since the website you referring to was
from 2008 and I guess a lot has changed since then.
For example I get horrible performance with prefech disabled. (running on 4
disk striped mirror)

Ollivier Robert

unread,
Mar 4, 2011, 6:13:55 AM3/4/11
to
Mickaël Canévet disait :

> Should I also increase vfs.zfs.arc_max ?

Unless you have a very busy server, I do not think so. Jeremy Chedwick has a very nice post about several sysctl/loader.conf tunables, did you see it?

http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.html

> Do you have any idea why the kernel panicked at only 8GB allocated ?

No.

--
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- rob...@keltia.net
In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/

_______________________________________________

Mickaël Canévet

unread,
Mar 4, 2011, 7:33:20 AM3/4/11
to
On Fri, 2011-03-04 at 12:13 +0100, Ollivier Robert wrote:
> Mickaël Canévet disait :
> > Should I also increase vfs.zfs.arc_max ?
>
> Unless you have a very busy server, I do not think so. Jeremy Chedwick has a very nice post about several sysctl/loader.conf tunables, did you see it?
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.html
>
> > Do you have any idea why the kernel panicked at only 8GB allocated ?
>
> No.
>

Thank you I will read that carefully.

signature.asc

Adam McDougall

unread,
Mar 4, 2011, 8:23:29 AM3/4/11
to
On 03/04/11 04:48, Mickaël Canévet wrote:
>> I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.
>
> Should I also increase vfs.zfs.arc_max ?
>
> Do you have any idea why the kernel panicked at only 8GB allocated ?
>
> Thank you

I believe ARC allocations in kmem can become fragmented, so when it is
searching for a place to store a new contiguous segment of memory, the
remaining fragmented free spaces may all be too small. I also set
vm.kmem_size to about twice the amount of ram to help it avoid this
issue. I suspect if kmem is badly fragmented then performance of ZFS
can downgrade, so that is another reason to keep kmem bigger.

Jeremy Chadwick

unread,
Mar 4, 2011, 9:38:01 AM3/4/11
to
On Fri, Mar 04, 2011 at 08:23:29AM -0500, Adam McDougall wrote:
> On 03/04/11 04:48, Mickaël Canévet wrote:
> >>I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.
> >
> >Should I also increase vfs.zfs.arc_max ?
> >
> >Do you have any idea why the kernel panicked at only 8GB allocated ?
> >
> >Thank you
>
> I believe ARC allocations in kmem can become fragmented, so when it
> is searching for a place to store a new contiguous segment of
> memory, the remaining fragmented free spaces may all be too small.
> I also set
> vm.kmem_size to about twice the amount of ram to help it avoid this
> issue. I suspect if kmem is badly fragmented then performance of
> ZFS can downgrade, so that is another reason to keep kmem bigger.

My findings on 8.2-RELEASE indicate that doing this results in very
unexpected behaviour regarding the ARC maximum. As such, I cannot
recommend this model.

For example, on an amd64 system with 8GB physical RAM and these two
settings in /boot/loader.conf:

vm.kmem_size="8192M"
vfs.zfs.arc_max="6144M"

kstat.zfs.misc.arcstats.size tops out at around 6240986896, with Wired
in top(1) showing ~6.3GB. This is expected behaviour and fits (I
think) what people expect.

However, on the exact same system with these two settings:

vm.kmem_size="16384M"
vfs.zfs.arc_max="6144M"

The above ARC numbers are exactly *half* that amount. This is easily
reproducible.

Can someone 1) justify the "2x the amount of RAM for vm.kmem_size"
setting, and 2) explain in detail the above behaviour?

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |

_______________________________________________

Joshua Boyd

unread,
Mar 6, 2011, 1:47:58 PM3/6/11
to
On Sun, Mar 6, 2011 at 4:04 AM, Jeremy Chadwick
<fre...@jdc.parodius.com> wrote:
> On Sun, Mar 06, 2011 at 02:59:43AM -0500, Joshua Boyd wrote:

>> On Fri, Mar 4, 2011 at 5:56 AM, Jeremy Chadwick
>> <fre...@jdc.parodius.com> wrote:
>> > If you get better performance -- really, truly, honestly -- with
>> > prefetch enabled on your system, then I strongly recommend you keep it
>> > enabled. ?However, for what it's worth (probably not much), this is the

>> > first I've ever heard of a FreeBSD system performing better with
>> > prefetch enabled.
>>
>> I just recently turned it on after having it turned off for a long
>> time ... my speeds went from ~300MB/s to 600+MB/s in bonnie++. This is
>> a dual core AM3 system with 8GB of ram, and 15 disks in a striped
>> raidz configuration (3 sets striped).
>
> Here are some numbers for you.  This is from a 8.2-STABLE (RELENG_8)
> system built Thu Feb 24 22:06:45 PST 2011, type amd64.

Interesting results. My kernel currently has a build date 2 days
earlier than yours. Here are my results, showing the huge increase in
speed. The only major configuration difference appears that I've
disabled the ZIL and you have yours enabled. That shouldn't make any
difference for read speeds though.

FreeBSD foghornleghorn.res.openband.net 8.2-PRERELEASE FreeBSD
8.2-PRERELEASE #13: Tue Feb 22 17:39:03 EST 2011
ro...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64

/boot/loader.conf
======================
vfs.zfs.zil_disable="1"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"
vm.kmem_size="8192M"
vfs.zfs.arc_max=6144M
vfs.zfs.prefetch_disable="0"
vfs.zfs.txg.timeout="5"

/etc/sysctl.conf
======================
kern.maxfiles=65536
kern.maxfilesperproc=32768
vfs.read_max=32
vfs.ufs.dirhash_maxmem=16777216
kern.maxvnodes=250000
vfs.zfs.txg.write_limit_override=1073741824

ZFS details
======================
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 18.2T 7.73T 10.4T 42% ONLINE -

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 6.17T 8.10T 36.7K /tank
tank/downloads 4.89T 8.10T 2.30T /tank/downloads
tank/downloads/movies 2.59T 8.10T 2.59T /tank/downloads/movies
tank/usr 1.29T 8.10T 32.0K /tank/usr
tank/usr/home 1.29T 8.10T 69.5K /usr/home
tank/usr/home/josh 1.29T 8.10T 13.4G /usr/home/josh
tank/usr/home/josh/hellanzb 32.0K 8.10T 32.0K /usr/home/josh/hellanzb
tank/usr/home/josh/rtorrent 1.27T 8.10T 1.27T /usr/home/josh/rtorrent
tank/usr/home/josh/watch 8.00M 8.10T 8.00M /usr/home/josh/watch

# zpool status tank
pool: tank
state: ONLINE
scrub: scrub completed after 7h43m with 0 errors on Sun Mar 6 07:43:56 2011
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da8 ONLINE 0 0 0
da18 ONLINE 0 0 0
da19 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da11 ONLINE 0 0 0
da10 ONLINE 0 0 0
da17 ONLINE 0 0 0
da9 ONLINE 0 0 0
da5 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da3 ONLINE 0 0 0
da2 ONLINE 0 0 0
da4 ONLINE 0 0 0

errors: No known data errors

Controller details
======================
mpt0: <LSILogic SAS/SATA Adapter> port 0x6000-0x60ff mem
0xf75fc000-0xf75fffff,0xf75e0000-0xf75effff irq 18 at device 0.0 on
pci1
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.20.0
mpt1: <LSILogic SAS/SATA Adapter> port 0x7000-0x70ff mem
0xf78fc000-0xf78fffff,0xf78e0000-0xf78effff irq 19 at device 0.0 on
pci2
mpt1: [ITHREAD]
mpt1: MPI Version=1.5.20.0
mpt2: <LSILogic SAS/SATA Adapter> port 0xd000-0xd0ff mem
0xf7ffc000-0xf7ffffff,0xf7fe0000-0xf7feffff irq 19 at device 0.0 on
pci6
mpt2: [ITHREAD]
mpt2: MPI Version=1.5.19.0

Disk details
======================
da8 at mpt0 bus 0 scbus0 target 0 lun 0
da8: <ATA ST31000528AS CC38> Fixed Direct Access SCSI-5 device
da8: 300.000MB/s transfers
da8: Command Queueing enabled
da8: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da9 at mpt0 bus 0 scbus0 target 1 lun 0
da9: <ATA Hitachi HDS72101 A39C> Fixed Direct Access SCSI-5 device
da9: 300.000MB/s transfers
da9: Command Queueing enabled
da9: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da0 at mpt1 bus 0 scbus1 target 0 lun 0
da0: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da10 at mpt0 bus 0 scbus0 target 2 lun 0
da10: <ATA SAMSUNG HD103SJ 00E4> Fixed Direct Access SCSI-5 device
da10: 300.000MB/s transfers
da10: Command Queueing enabled
da10: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da11 at mpt0 bus 0 scbus0 target 3 lun 0
da11: <ATA SAMSUNG HD103SJ 00E4> Fixed Direct Access SCSI-5 device
da11: 300.000MB/s transfers
da11: Command Queueing enabled
da11: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da1 at mpt1 bus 0 scbus1 target 1 lun 0
da1: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da2 at mpt1 bus 0 scbus1 target 2 lun 0
da2: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da3 at mpt1 bus 0 scbus1 target 3 lun 0
da3: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
da3: 300.000MB/s transfers
da3: Command Queueing enabled
da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da4 at mpt1 bus 0 scbus1 target 4 lun 0
da4: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
da4: 300.000MB/s transfers
da4: Command Queueing enabled
da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da5 at mpt1 bus 0 scbus1 target 5 lun 0
da5: <ATA SAMSUNG HD103SI 1118> Fixed Direct Access SCSI-5 device
da5: 300.000MB/s transfers
da5: Command Queueing enabled
da5: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da6 at mpt1 bus 0 scbus1 target 6 lun 0
da6: <ATA ST31000340AS SD04> Fixed Direct Access SCSI-5 device
da6: 300.000MB/s transfers
da6: Command Queueing enabled
da6: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da7 at mpt1 bus 0 scbus1 target 7 lun 0
da7: <ATA SAMSUNG HD103SI 1118> Fixed Direct Access SCSI-5 device
da7: 300.000MB/s transfers
da7: Command Queueing enabled
da7: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da16 at mpt2 bus 0 scbus2 target 82 lun 0
da16: <ATA ST31000340AS SD04> Fixed Direct Access SCSI-5 device
da16: 300.000MB/s transfers
da16: Command Queueing enabled
da16: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da17 at mpt2 bus 0 scbus2 target 83 lun 0
da17: <ATA Hitachi HDS72101 A39C> Fixed Direct Access SCSI-5 device
da17: 300.000MB/s transfers
da17: Command Queueing enabled
da17: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da18 at mpt2 bus 0 scbus2 target 84 lun 0
da18: <ATA SAMSUNG HD103SI 1118> Fixed Direct Access SCSI-5 device
da18: 300.000MB/s transfers
da18: Command Queueing enabled
da18: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
da19 at mpt2 bus 0 scbus2 target 85 lun 0
da19: <ATA SAMSUNG HD103SI 1118> Fixed Direct Access SCSI-5 device
da19: 300.000MB/s transfers
da19: Command Queueing enabled
da19: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

Benchmark results #1
======================
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
foghornleghorn. 16G 213 99 266782 53 90296 19 480 95 218719
24 229.7 6
Latency 43348us 37929us 242ms 102ms 68306us 462ms
Version 1.96 ------Sequential Create------ --------Random Create--------
foghornleghorn.res. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 15220 44 +++++ +++ 19214 56 22371 58 +++++ +++ 22133 66
Latency 10658us 60us 82us 6540us 39us 1677us

Benchmark results #2
======================
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
foghornleghorn. 16G 201 99 276506 56 198428 38 459 97 627451
73 252.0 5
Latency 45695us 35953us 265ms 69630us 42440us 389ms
Version 1.96 ------Sequential Create------ --------Random Create--------
foghornleghorn.res. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 14988 50 +++++ +++ 18693 59 18535 51 +++++ +++ 20827 67
Latency 13309us 93us 116us 8165us 36us 1046us


--
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net

http://www.jbip.net

Martin Matuska

unread,
Mar 7, 2011, 4:49:26 AM3/7/11
to
In 8-STABLE (amd64), starting with SVN revision 214620
vm.kmem_size_scale defaults to 1.

This means in 8.2-RELEASE, vm.kmem_size is automatically set to the
amout of your system RAM,
so vm.kmem_size="16G" is automatically set on a system with 16GB RAM.

Dňa 04.03.2011 11:05, Jeremy Chadwick wrote / napísal(a):


> On Fri, Mar 04, 2011 at 10:48:53AM +0100, Mickaël Canévet wrote:
>>> I'd use vm.kmem_size="32G" (i.e. twice your RAM) and that's it.
>> Should I also increase vfs.zfs.arc_max ?

> You should adjust vm.kmem_size, but not vm.kmem_size_max.
>
> You can adjust vfs.zfs.arc_max to basically ensure system stability.
> This thread is acting as evidence that there are probably edge cases
> where the kmem too small panic can still happen despite the limited ARC
> maximum defaults.
>
> For a 16GB system, I'd probably use these settings:
>
> vm.kmem_size="16384M"
> vfs.zfs.arc_max="13312M"
>
> I would also use these two settings:
>

> # Disable ZFS prefetching
> # http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
> # Increases overall speed of ZFS, but when disk flushing/writes occur,
> # system is less responsive (due to extreme disk I/O).
> # NOTE: Systems with 8GB of RAM or more have prefetch enabled by
> # default.
> vfs.zfs.prefetch_disable="1"
>

> # Decrease ZFS txg timeout value from 30 (default) to 5 seconds. This
> # should increase throughput and decrease the "bursty" stalls that
> # happen during immense I/O with ZFS.
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
> # http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
> vfs.zfs.txg.timeout="5"
>
> The advice in the Wiki is outdated, especially for 8.2-RELEASE. Best
> not to follow it as of this writing.
>

>> Do you have any idea why the kernel panicked at only 8GB allocated ?

> I do not. A kernel developer will have to comment on that.
>
> Please attempt to reproduce the problem. If you can reproduce it
> reliably, this will greatly help kernel developers tracking down the
> source of the problem.
>

Kai Gallasch

unread,
Mar 7, 2011, 5:37:16 AM3/7/11
to

Am 04.03.2011 um 09:23 schrieb Mickaël Canévet:

> Hello,
>
> I know there is a lot of threads about "kmem_map too small" problems on
> former versions of FreeBSD with ZFS, but on the wiki
> (http://wiki.freebsd.org/ZFSTuningGuide) it is said that "FreeBSD 7.2+
> has improved kernel memory allocation strategy and no tuning may be
> necessary on systems with more than 2 GB of RAM."
>
> I have a 64bits machine with 16GB of RAM with FreeBSD 8.2-RELEASE and no
> tuning:
>
> # sysctl -a | grep -e "vm.kmem_size_max:" -e "vm.kmem_size:" -e
> "vfs.zfs.arc_max:"
> vm.kmem_size_max: 329853485875
> vm.kmem_size: 16624558080
> vfs.zfs.arc_max: 15550816256
>
> This morning this server crashed with:
>

> panic: kmem_malloc(1048576): kmem_map too small: 8658309120 total
> allocated

Hi, Mickaël.

If you want to "get a picture" on how setting ZFS tunables in loader.conf affect the different cache sizes and cache hit ratios, I can recommend installing the freebsd port sysutils/munin-node together with sysutils/zfs-stats and the following munin ZFS plugins:

http://exchange.munin-monitoring.org/plugins/search?keyword=FreeBSD

Regards,
Kai._______________________________________________

Matthias Gamsjager

unread,
Mar 7, 2011, 1:41:10 PM3/7/11
to
Let me too backup my claim with data:

AMD Dual core 4G ram 4x 1TB Samsung drives OS installed on separate ufs disk

FreeBSD fb 8.2-STABLE FreeBSD 8.2-STABLE #0 r219265: Fri Mar 4 16:47:35 CET
2011


loader.conf:
vm.kmem_size="6G"
vfs.zfs.txg.timeout="5"
vfs.zfs.vdev.min_pending=1 #default = 4
vfs.zfs.vdev.max_pending=4 #default= 35

sysctl.conf:
vfs.zfs.txg.write_limit_override=805306368
kern.sched.preempt_thresh=220

Zpool:


NAME STATE READ WRITE CKSUM

storage ONLINE 0 0 0
mirror ONLINE 0 0 0
ad6 ONLINE 0 0 0
ad10 ONLINE 0 0 0
mirror ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad8 ONLINE 0 0 0

NAME SIZE USED AVAIL CAP HEALTH ALTROOT

storage 1.81T 1.57T 245G 86% ONLINE -

Prefetch disable = 1


Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
%CP

fb 10000M 54 74 99180 42 35955 14 140 73 68174 11 180.6
4
Latency 295ms 1581ms 1064ms 428ms 58640us
755ms


Version 1.96 ------Sequential Create------ --------Random
Create--------

fb -Create-- --Read--- -Delete-- -Create-- --Read---


-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP

16 6697 39 +++++ +++ 11798 74 10060 61 +++++ +++ 11104
72
Latency 213ms 134us 257us 32866us 2672us
174us

Prefetch disable = 0


Version 1.96 ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
%CP

fb 10000M 52 74 107602 46 65443 29 135 74 243760 42
186.5 4
Latency 214ms 865ms 1525ms 79771us 254ms
924ms


Version 1.96 ------Sequential Create------ --------Random
Create--------

fb -Create-- --Read--- -Delete-- -Create-- --Read---


-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP

16 8152 56 +++++ +++ 4534 36 10966 69 32607 74 9692
71
Latency 112ms 21108us 169ms 30018us 4097us
318us

Read performance 68MB/s vs 243MB/s.

Maybe the kind of workload you have does not work well with prefetch, I
don't know, but for sequential load like I use my NAS for, which is used as
a media tank, it does boost performance quiet a bit.

Jeremy Chadwick

unread,
Mar 7, 2011, 7:28:20 PM3/7/11
to
On Mon, Mar 07, 2011 at 10:49:26AM +0100, Martin Matuska wrote:
> In 8-STABLE (amd64), starting with SVN revision 214620
> vm.kmem_size_scale defaults to 1.
>
> This means in 8.2-RELEASE, vm.kmem_size is automatically set to the
> amout of your system RAM,
> so vm.kmem_size="16G" is automatically set on a system with 16GB RAM.
>
> D??a 04.03.2011 11:05, Jeremy Chadwick wrote / nap??sal(a):

Thanks -- you're absolutely right. I had forgotten all about
vm.kmem_size_scale. :-)

So yeah, with 8.2-RELEASE onward (rather than get into individual SVN
revisions I'm using 8.2-RELEASE as "a point in time"), vm.kmem_size will
default to the amount of usable memory (usually slightly less than
hw.physmem). Validation:

$ sysctl hw.realmem hw.physmem hw.usermem
hw.realmem: 9395240960
hw.physmem: 8579981312
hw.usermem: 1086521344
$ sysctl vm.kmem_size
vm.kmem_size: 8303894528

As such, one only needs to tune vfs.zfs.arc_max if desired.

--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |

_______________________________________________

0 new messages