I posted a thread about ZFS performance issues in the General section of the
forums and there were other people having the same issue, so I thought that
it might be helpful to send the details to the mailing list as well. The
thread can be found at:
http://forums.freebsd.org/showthread.php?p=59680&posted=1#post59680
My original post:
---
I'm having problems with ZFS performance. When my system comes up,
read/write speeds are excellent (testing with dd if=/dev/zero
of=/tank/bigfile and dd if=/tank/bigfile of=/dev/null); I get at least
100MB/s on both reads and writes, and I'm happy with that.
The longer the system is up, the worse my performance gets. Currently my
system has been up for 4 days, and read/write performance is down to about
10MB/s at best.
The system is only accessed by 3 clients: myself, my roommate, and our HTPC.
Usually, only one client will be doing anything at a time, so it is not
under heavy load or anything.
*Software:*
Code:
FreeBSD leviathan 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21
15:02:08 UTC 2009
ro...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
The following apps are running and touching data on the zpool:
- rTorrent - read and write, usually active, not doing much for reads
(not doing any major seeding)
- SABnzbd+ - write only, not always active
- Lighttpd - running ruTorrent (web interface for rTorrent); nothing else
- samba - all of our clients are running Windows, so we use samba to
network-mount the zpool
*Hardware:*
- AMD Athlon II X2 250 Dual Core Processor Socket AM3 3.0GHZ
- Gigabyte MA790GP-UD4H AMD790GX ATX AM2+/AM3 Sideport 2PCI-E Sound GBLAN
HDMI CrossFireX Motherboard
- Corsair XMS2 TWIN2X4096-6400C5 4GB DDR2 2X2GB
- Supermicro AOC-USASLP-L8I LSI 1068E 8-PORT RAID 0/1/10 Uio SATA/SAS
Controller W/ 16MB Low Profile
- *8x* Western Digital WD15EADS Caviar Green 1.5TB SATA 32MB Cache 3.5IN
*ZFS setup:*
I have the 1.5TB drives in one RAIDZ pool. All 8 drives are connected to the
Supermicro L8I controller. The controller is set to 'disabled', so it isn't
doing anything with the drives except presenting them to the system
untouched. (So I'm really only using it as an expansion card, for the extra
ports).
Code:
[root@leviathan ~]# zpool status
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
errors: No known data errors
Any suggestions as to what might be causing the performance to degrade with
system uptime? If I missed anything or more information is needed, please
let me know. Thanks in advance.
---
Also just to note, as suggested by someone in the thread, it's probably not
directly related to system uptime, but instead related to usage - the more
usage, the worse the performance.
> I'm having problems with ZFS performance. When my system comes up,
> read/write speeds are excellent (testing with dd if=/dev/zero
> of=/tank/bigfile and dd if=/tank/bigfile of=/dev/null); I get at least
> 100MB/s on both reads and writes, and I'm happy with that.
>
> The longer the system is up, the worse my performance gets. Currently my
> system has been up for 4 days, and read/write performance is down to about
> 10MB/s at best.
>
> The system is only accessed by 3 clients: myself, my roommate, and our HTPC.
> Usually, only one client will be doing anything at a time, so it is not
> under heavy load or anything.
This could be due to the amount of memory available for ZFS caching declining
as time goes on (for reasons that are not entirely clear, though I suspect
it may be due to increasing fragmentation in the kernel memory). You might
try doing "sysctl kstat.zfs.misc.arcstats.size" repeatedly to monitor the
amount of memory the ZFS cache is taking up as your system uptime increases.
> I'm having problems with ZFS performance. When my system comes up,
> read/write speeds are excellent (testing with dd if=/dev/zero
> of=/tank/bigfile and dd if=/tank/bigfile of=/dev/null); I get at least
> 100MB/s on both reads and writes, and I'm happy with that.
>
> The longer the system is up, the worse my performance gets. Currently my
> system has been up for 4 days, and read/write performance is down to about
> 10MB/s at best.
Are you sure you have isolated the cause to be only the uptime of the
machine? Is there no other change between the runs? E.g. did you stop
all other services and applications on the machine before doing the test
for the second time? Can you create a big file (2x memory size) when the
machine boots, measure the time to read it, then read it again after a
few days when you notice performance problems?
I've been starting my system with different combinations of applications
running to see what access patterns cause the most slowdown. So far, I don't
have enough data to give anything concrete.
This weekend I'll try some tests such as the one you describe, and see what
happens. I have a strong suspicion that rTorrent is to blame, since I
haven't seen major slowdowns in the last few days with rTorrent not running.
rTorrent preallocates the space needed for the file download (and I'm
downloading large 4GB+ files using it), and then writes to them in an
unpredictable pattern, so maybe ZFS doesn't like being touched this way?
> _______________________________________________
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>
sysctl kstat.zfs.misc.arcstats.size
sysctl vm.stats.vm.v_inactive_count
sysctl vm.stats.vm.v_active_count
sysctl vm.stats.vm.v_cache_count
sysctl vm.stats.vm.v_cache_count
ZFS performance does degrade a lot if ARC becomes too small. Writes
also get throttled if ZFS thinks the system is running low on memory.
One way to help the situation somewhat is to bump vfs.zfs.arc_min
tunable. It would make ZFS somewhal less eager to give up memory.
However, write throttling seems to rely on amount on memory on the
free list. FreeBSD appears to have somewhat different semantics for
'free" compared to solaris and that makes ZFS think that we're running
low on memory while there's plenty of it sitting on inactive/cached
lists and could be used.
One rather crude way to get ZFS back in shape in this situation is to
temporarily cause real memory shortage on the system. That would force
trimming of active/inactive lists (and ARC, too) but once it's done,
ARC would be free to grow and that may restore ZFS performance for a
while.
Following command will allocate about 8G of memory on my system --
enough to start swapout:
perl -e '$x="x"x3000000000'
--Artem
Probably unrelated, but this prefetch issue results in a slowdown:
http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007481.html
-Jonathan
With the introduction of 8.0-RC1, FreeBSD explicitly disables prefetch
if your system has less than 4GB usable[1] memory. The code does print
a warning/FYI message when this condition is met[2].
You can disable the warning message by setting prefetch_disable="1" in
loader.conf (and this naturally means prefetch is disabled). You can
also explicitly enable prefetch by setting the value to "1", and this
trumps the how-much-usable-RAM check.
The OP doesn't disclose if he's enabled prefetch, so I assume he hasn't,
which means he shouldn't be susceptible to the bug described above.
All that said -- I know what the OP is referring to, as I've seen it
myself (on RELENG_7, and possibly early releases of 8.0). The only
way to relieve the pain, AFAIK, is to reboot.
I do see some MFC's done about 13 hours ago to RELENG_7 and RELENG_8
that talk about the ARC and "paging pressure", which to me means
decreased performance when it occurs... or maybe it helps with the kmem
exhaustion problem? The brief description in the commit is simply not
enough to suffice; it's almost like we need a "FreeBSD ZFS Newsletter"
that documents what all the changes are that get committed, what they
fix, and what's being worked on/tested in HEAD (for potential MFC).
It would also be good to get some concise documentation with regards to
what all the kstat.zfs.misc.arcstats.* counters mean and represent;
admins like myself would love to track/graph these for helping correlate
performance or stability issues, but we don't even know what they
represent. For example, just earlier this week I read a semi-recent
message talking about how a large kstat.zfs.misc.arcstats.evict_skip
counter indicates something bad, yet I have no idea what "evict skip"
means in reference to the ZFS core/model itself.
[1]: Because of #2 (see above/below), I had to analyse the code. My
system was amd64 + 4GB RAM yet ZFS was telling me the system did not
have 4GB of RAM thus had disabled prefetch. My analysis is near the
bottom of my blog post:
http://koitsu.wordpress.com/2009/10/12/testing-out-freebsd-8-0-rc1/
[2]: The grammar of the message printed has gone through numerous
revisions (I forget the exact number, but *at least* 2), and the most
recent one is still too ambiguous, resulting in reader confusion. In my
above blog post, I propose what the kernel message should read (in my
opinion), which is significantly less ambiguous and hopefully won't
cause "but I do have 4GB!" confusion.
[3]: src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
http://freshbsd.org/2010/01/08/09/59/13 -- RELENG_8
http://freshbsd.org/2010/01/08/11/06/13 -- RELENG_7
It sure would be useful if these source files had $FreeBSD$ ID tags in
them, rather than having to grep /var/db/sup. I swore there was some
command which did this, but I might be thinking of object files.
--
| Jeremy Chadwick j...@parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
This should have read 'setting the value to "0"'; sorry.
Ok, noted :)
> I've been starting my system with different combinations of applications
> running to see what access patterns cause the most slowdown. So far, I don't
> have enough data to give anything concrete.
>
> This weekend I'll try some tests such as the one you describe, and see what
> happens. I have a strong suspicion that rTorrent is to blame, since I
> haven't seen major slowdowns in the last few days with rTorrent not running.
> rTorrent preallocates the space needed for the file download (and I'm
> downloading large 4GB+ files using it), and then writes to them in an
> unpredictable pattern, so maybe ZFS doesn't like being touched this way?
This is why I thought the test with a single large file written before
"slowdown" and read after could be helpful.
It is true that ZFS in theory doesn't do very well with random writes of
any kind - the kind that torrent clients do should actually be the worst
case for ZFS, *but*, this very much depends on the actual workload.
This is because, simplified, ZFS *always* appends data to files and then
does a type of pointer gymnastics to update the actual file logical
"flow" from the beginning to the end. For example, doing thousands of
random small writes where each of them is logically located at another
position in the file (what torrent clients do) will actually write them
all together in some not entirely predictable order - then when the file
is read in its logical/natural order, the drives' heads will have to
seek much more often to pick up the pieces. Thus, a 700 MB file could
end up in a thousand fragments. There are some very good reasons why ZFS
does things that way but it makes it sensitive to performance issues in
the described scenario. Preallocating space will not help.
Surprisingly, this doesn't appear to harm databases much - at least the
usual kinds - because they tend to avoid sequential reads by using
tricks like indexing and so are used to paying the price for disk head
seeking.
Of course, SSDs would not notice anything unusual with this mode of
operation.
ZFS has aggressive read-ahead for sequential read-aheads, so its worth
noting that the performance problem can be mitigated by having lots of
RAM free for read-ahead, as well as multiple vdevs in the zpool (so that
it can be seeking all disks at once)
- Andrew
> All that said -- I know what the OP is referring to, as I've seen it
> myself (on RELENG_7, and possibly early releases of 8.0). The only
> way to relieve the pain, AFAIK, is to reboot.
I assume the problem is that the memory handling in ZFS is not the best
regarding memory fragmentation (at least in FreeBSD, I do not know of
the memory handling in Solaris has a similar behavior).
Regarding the RELENG_7 systems you have which use ZFS, are they SMP
systems? If yes, do you see problems when a lot of parallel accesses
(let's say several find in parallel) are made? All the commits to ZFS
for 7-stable are triggered by a problem I have in this regard on a
7-stable system, but so far nothing helped.
> I do see some MFC's done about 13 hours ago to RELENG_7 and RELENG_8
> that talk about the ARC and "paging pressure", which to me means
> decreased performance when it occurs... or maybe it helps with the
> kmem exhaustion problem? The brief description in the commit is
> simply not enough to suffice; it's almost like we need a "FreeBSD ZFS
> Newsletter" that documents what all the changes are that get
> committed, what they fix, and what's being worked on/tested in HEAD
> (for potential MFC).
The change you talk about tells ZFS to cleanup the ARC when the max
size is passed, instead when there is real need to get some free
memory. The dresult is that the arc_max setting is more strictly
followed, and that there is no expensive operation to be done to free
the ARC when the system needs free memory.
Bye,
Alexander.
Yes and no. Read ahead will not help performance when the data is so
fragmented that the disk is seek-bound. No matter how much of the file
you can get in RAM, it still needs to be fetched from the drive
platters. (Except if it's smart enough to read sequential chunks from
the raw storage even though they are logically not located nearly, and
in case of torrents, probably belong to different files, which I very
much doubt it does).
I can share a munin plugin for monitoring some of ARC L1/L2
statistics, as well as memory decomposition.
It still WIP (and I'm sure, that not all statistics are properly
gathered), but I hope it may help. And I'm still not quite sure, if
I'm interpreting some ARC parameters correctly.
Just put attached files into /usr/local/etc/munin/plugins.
Cheers,
Wiktor Niesiobedzki
It seems that after having downloaded a few torrents, resources are grabbed
and then not released. From top:
Mem: 2326M Active, 962M Inact, 484M Wired, 82M Cache, 418M Buf, 87M Free
Nothing in userland is using a significant amount of memory. eg rTorrent is
using 41MB according to top. Killing rTorrent does not alleviate the
performance problems.
arcstats.size is hovering around 30-40MB.
[root@leviathan ~]# sysctl kstat.zfs.misc.arcstats.size
kstat.zfs.misc.arcstats.size: 28953448
[root@leviathan ~]# sysctl vm.stats.vm.v_inactive_count
vm.stats.vm.v_inactive_count: 237831
[root@leviathan ~]# sysctl vm.stats.vm.v_active_count
vm.stats.vm.v_active_count: 595762
[root@leviathan ~]# sysctl vm.stats.vm.v_cache_count
vm.stats.vm.v_cache_count: 21472
If no-one has any questions, I'll try Artem's suggestion of wasting a bunch
of memory in Perl/Python and forcing some memory to be swapped out. (I don't
want to do it yet in case someone wants a specific number before I do that).
It would help if you could provide the entire output from:
sysctl kstat.zfs.misc.arcstats
[root@leviathan ~]# sysctl kstat.zfs.misc.arcstats
kstat.zfs.misc.arcstats.hits: 32092629
kstat.zfs.misc.arcstats.misses: 1064835
kstat.zfs.misc.arcstats.demand_data_hits: 30542262
kstat.zfs.misc.arcstats.demand_data_misses: 848959
kstat.zfs.misc.arcstats.demand_metadata_hits: 1550367
kstat.zfs.misc.arcstats.demand_metadata_misses: 215876
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 18329884
kstat.zfs.misc.arcstats.mru_ghost_hits: 114483
kstat.zfs.misc.arcstats.mfu_hits: 13762745
kstat.zfs.misc.arcstats.mfu_ghost_hits: 172573
kstat.zfs.misc.arcstats.deleted: 1735926
kstat.zfs.misc.arcstats.recycle_miss: 2076926
kstat.zfs.misc.arcstats.mutex_miss: 545
kstat.zfs.misc.arcstats.evict_skip: 532474
kstat.zfs.misc.arcstats.hash_elements: 6784
kstat.zfs.misc.arcstats.hash_elements_max: 14351
kstat.zfs.misc.arcstats.hash_collisions: 149862
kstat.zfs.misc.arcstats.hash_chains: 338
kstat.zfs.misc.arcstats.hash_chain_max: 4
kstat.zfs.misc.arcstats.p: 25819136
kstat.zfs.misc.arcstats.c: 107609280
kstat.zfs.misc.arcstats.c_min: 107609280
kstat.zfs.misc.arcstats.c_max: 860874240
kstat.zfs.misc.arcstats.size: 40148272
kstat.zfs.misc.arcstats.hdr_size: 1411072
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 10349
Additionally, from top:
Mem: 2869M Active, 440M Inact, 479M Wired, 91M Cache, 418M Buf, 63M Free
Getting <20MB/s combined read/write at this point (ie copying files from one
directory to another, both source and destination on the zpool).
tank 2.57T 8.31T 11 215 273K 745K
tank 2.57T 8.31T 22 97 816K 540K
tank 2.57T 8.31T 6 67 13.5K 887K
tank 2.57T 8.31T 5 101 639K 190K
tank 2.57T 8.31T 14 81 1.14M 154K
tank 2.57T 8.31T 6 152 12.5K 897K
tank 2.57T 8.31T 11 153 143K 790K
tank 2.57T 8.31T 3 172 134K 566K
tank 2.57T 8.31T 3 105 7.48K 699K
tank 2.57T 8.31T 5 136 138K 383K
Combined read/write of <2MB/s -- pretty pathetic. I then tried Artem's
trick, running ` perl -e '$x="x"x3000000000'` to force a swapout. This
command completed in about 10 seconds, and I then had >5GB of memory showing
as 'Free' according to top. Checking zpool iostat 1 again showed:
tank 2.57T 8.31T 375 477 46.1M 57.9M
tank 2.57T 8.31T 18 472 1.75M 44.8M
tank 2.57T 8.31T 129 0 16.1M 0
tank 2.57T 8.31T 428 0 53.2M 0
tank 2.57T 8.31T 262 947 31.8M 103M
tank 2.57T 8.30T 80 105 9.61M 196K
tank 2.57T 8.30T 612 0 75.8M 0
tank 2.57T 8.30T 155 951 18.4M 103M
tank 2.57T 8.30T 662 0 82.1M 0
tank 2.57T 8.30T 176 945 21.1M 103M
Which is obviously much better, and a respectable rate of performance. This
was all done during the same single `copy` command - I did not stop/restart
the copy when running the perl oneliner.
So it seems based on this that ZFS is keeping too much data cached and not
being smart about when to release 'old' cache entries in favour of new ones.
Any suggestions at this point? The only tuning I have done so far is to
disable prefetch, since my primary usage is streaming HD media, and prefetch
has been known to cause problems in that situation.
6.5GB of "active" memory seems to imply that a user process is growing or a
large number of user processes are being created. I would expect ZFS's cache
to increase the size of "wired" memory.
Sorry, I have not followed this thread closely. Are you sure that the
degradation is ZFS related? Could it be caused by, for instance, a userland
memory leak? What happens to active memory when you restart rtorrent?
Cheers,
-- Norbert Papke.
npa...@acm.org
http://saveournet.ca
Protecting your Internet's level playing field
Last night I had around 6500MB 'Active' again, 1500MB Wired, no inact, ~30MB
buf, no free, and ~100MB swap used. My performance copying ZFS->ZFS was
again slow (<1MB/s). I tried killing rTorrent and no significant amount of
memory was reclaimed - maybe 100MB. `ps aux` showed no processes using any
significant amount of memory, and I was definitely nowhere near 6500MB
usage.
I tried running the perl oneliner again to hog a bunch of memory, and almost
all of the Active memory was IMMEDIATELY marked as Free, and my performance
was excellent again.
I'm not sure what in userland could be causing the issue. The only things
I've installed are rTorrent, lighttpd, samba, smartmontools, vim, bash,
Python, Perl, and SABNZBd. There is nothing that *should* be consuming any
serious amount of memory.
I've two recommendations:
1) Have you considered "upgrading" to RELENG_8 (e.g. 8.0-STABLE) instead
of sticking with 8.0-RELEASE? There's been a recent MFC to RELENG_8
which pertain to ARC drainage. I'm referring to the commit labelled
revision 1.22.2.2 (RELENG_8):
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
2) Have you tried using vfs.zfs.arc_max in loader.conf to limit the ARC
size? I'd recommend picking something like 1GB as a cap (your machine
has 8GB total at present, if I remember right). I believe long ago
someone said this isn't an explicit hard limit on the maximum size of
the ARC, but I believe this was during the RELENG_7 days and the ARC
"stuff" on FreeBSD has changed since then. I wish the tunables were
better documented, or at least explained in detail (hello Wiki!).
Finally:
Does anyone have reservations about me crossposting this thread to
freeb...@freebsd.org, to get some additional eyes?
Reports like this often scare/worry those of us who run servers. :-)
> On Mon, Jan 18, 2010 at 11:29 AM, Norbert Papke <npa...@acm.org> wrote:
>
> > On January 17, 2010, Garrett Moore wrote:
> > > I upgraded my system to 8GB of ram to see if that would help. It hasn't
> > > made much of a difference. After having rTorrent running for a while, my
> > > performance again tanked. Around 6.5GB of memory was showing as 'Active'
> > > according to top.
> >
> > 6.5GB of "active" memory seems to imply that a user process is growing or a
> > large number of user processes are being created. I would expect ZFS's
> > cache
> > to increase the size of "wired" memory.
> >
> > Sorry, I have not followed this thread closely. Are you sure that the
> > degradation is ZFS related? Could it be caused by, for instance, a
> > userland
> > memory leak? What happens to active memory when you restart rtorrent?
--
> On Tue, Jan 19, 2010 at 11:40:50AM -0500, Garrett Moore wrote:
>> I've been watching my memory usage and I have no idea what is consuming
>> memory as 'Active'.
>>
>> Last night I had around 6500MB 'Active' again, 1500MB Wired, no inact, ~30MB
>> buf, no free, and ~100MB swap used. My performance copying ZFS->ZFS was
>> again slow (<1MB/s). I tried killing rTorrent and no significant amount of
>> memory was reclaimed - maybe 100MB. `ps aux` showed no processes using any
>> significant amount of memory, and I was definitely nowhere near 6500MB
>> usage.
>>
>> I tried running the perl oneliner again to hog a bunch of memory, and almost
>> all of the Active memory was IMMEDIATELY marked as Free, and my performance
>> was excellent again.
>>
>> I'm not sure what in userland could be causing the issue. The only things
>> I've installed are rTorrent, lighttpd, samba, smartmontools, vim, bash,
>> Python, Perl, and SABNZBd. There is nothing that *should* be consuming any
>> serious amount of memory.
>
> I've two recommendations:
>
> 1) Have you considered "upgrading" to RELENG_8 (e.g. 8.0-STABLE) instead
> of sticking with 8.0-RELEASE? There's been a recent MFC to RELENG_8
> which pertain to ARC drainage. I'm referring to the commit labelled
> revision 1.22.2.2 (RELENG_8):
>
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
This patch can be merged stand-alone if necessary, no need to go to
RELENG_8 if there are reservations.
> 2) Have you tried using vfs.zfs.arc_max in loader.conf to limit the ARC
> size? I'd recommend picking something like 1GB as a cap (your machine
Or even less... to be determined by experimenting.
> has 8GB total at present, if I remember right). I believe long ago
> someone said this isn't an explicit hard limit on the maximum size of
> the ARC, but I believe this was during the RELENG_7 days and the ARC
> "stuff" on FreeBSD has changed since then. I wish the tunables were
> better documented, or at least explained in detail (hello Wiki!).
The commit you refer to above is just doing this: limiting the arc
more to the arc_max than it was the case before.
This patch is in 7-stable too (in case someone is interested).
Bye,
Alexander.
--
Johnson's First Law:
When any mechanical contrivance fails, it will do so at the
most inconvenient possible time.
http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
I'm having this same issue, although my performance does not go back to
freshly booted levels it still goes from <1MBs to ~11MB/s after running
that Perl one liner. I'm running RELENG 8 as of about 2 days ago
FreeBSD 8.0-STABLE #2 r202777: Fri Jan 22 00:15:43 EST 2010 [...] amd64
Just to be clear it also seems to be related to something rtorrent does
while downloading torrents but it's not rtorrent itself using the memory
because quitting rtorrent doesn't release more than 100MB of multiple GB
of memory marked as active but running the Perl one liner does.
>> I'm not sure what in userland could be causing the issue. The only things
>> I've installed are rTorrent, lighttpd, samba, smartmontools, vim, bash,
>> Python, Perl, and SABNZBd. There is nothing that *should* be consuming any
>> serious amount of memory.
>
> I've two recommendations:
>
> 1) Have you considered "upgrading" to RELENG_8 (e.g. 8.0-STABLE) instead
> of sticking with 8.0-RELEASE? There's been a recent MFC to RELENG_8
> which pertain to ARC drainage. I'm referring to the commit labelled
> revision 1.22.2.2 (RELENG_8):
I definitely have that commit, see above. I just checked my arc size
and it's only using ~170 of a ~600MB limit. This is after running the
Perl script and without the strangely large amount of "active" memory, I
forgot to run on before as well.
This machine is just a personal file server so I can restart it as
needed for testing and I have serial console access to it if needed.
Thanks,
Jonathan
Cheers,
-erwin
--
Erwin Lansing http://droso.org
Prediction is very difficult
especially about the future er...@FreeBSD.org
Last night I tried ZFS with pool on iSCSI connected Dell MD3000i and I
was suprised by too low speed of simple cp -a command (copying from UFS
partition to ZFS) The write speed was about 2MB/s only.
After looking in to ARC stuff, I realized some weird values:
ARC Size:
Current Size: 1 MB (arcsize)
Target Size (Adaptive): 205 MB (c)
Min Size (Hard Limit): 205 MB (zfs_arc_min)
Max Size (Hard Limit): 1647 MB (zfs_arc_max)
(stats from script http://cuddletech.com/arc_summary/
freebsd version
http://bitbucket.org/koie/arc_summary/changeset/dbe14d2cf52b/ )
I don't know why it shows Current Size 1MB.
I tried the perl oneliner from this thread. Then I got about 5GB free
merory and Target Size and Current Size growed to 1647MB. Write speed
increase to about 8MB/s and after few minuts slowly dropped to 2MB/s and
ARC Current Size dropped to 1MB again.
This server is not in production and was idle. Just copying the data
from one partition to another.
Today I tried serving the data by Lighttpd.
There is impressive iSCSI read performance - because of ZFS prefetch, it
can achieve 880Mbits of read from iSCSI, but serving by Lighttpd only
about 66Mbits
bce0 - internet
bce1 - iSCSI to storage MD3000i
bce0 bce1
Kbps in Kbps out Kbps in Kbps out
2423.22 65481.56 855970.7 4348.73
2355.26 63911.74 820561.3 4846.08
2424.87 65998.62 848937.1 4312.37
2442.78 66544.95 858019.0 4356.64
iostat -x
extended device statistics
device r/s w/s kr/s kw/s wait svc_t %b
da1 1596.8 3.6 102196.7 22.2 13 7.4 97
da1 1650.2 2.9 105612.7 55.7 16 7.4 103
da1 1647.3 0.0 105422.9 0.0 13 7.2 100
da1 1636.5 2.3 104735.4 20.0 16 7.3 100
da1 1642.9 0.0 105141.1 0.0 13 7.3 100
~/bin/arcstat.pl -f
Time,read,hits,Hit%,miss,miss%,dmis,dm%,mmis,mm%,arcsz,c 30
Time read hits Hit% miss miss% dmis dm% mmis mm% arcsz c
12:18:05 16K 15K 94 838 5 570 3 1 0 16933296
215902720
12:18:36 16K 15K 94 839 5 571 3 0 0 21488288
215902720
12:19:06 16K 15K 94 836 5 569 3 1 0 17228688
215902720
12:19:37 16K 15K 94 839 5 572 3 4 1 22002672
215902720
12:20:07 16K 15K 94 841 5 570 3 1 0 27784960
215902720
12:20:38 16K 15K 94 838 5 569 3 0 0 21839472
215902720
12:21:08 16K 15K 94 837 5 568 3 0 0 28244992
215902720
12:21:39 16K 15K 94 833 5 565 3 1 0 28744416
215902720
12:22:09 16K 15K 94 842 5 576 3 4 1 28646656
215902720
12:22:39 16K 15K 94 840 5 575 3 3 0 28903696
215902720
12:23:10 15K 15K 94 821 5 561 3 0 0 28765904
215902720
12:23:40 16K 15K 94 828 5 566 3 0 0 28395840
215902720
12:24:11 16K 15K 94 828 5 568 3 0 0 32063408
215902720
12:24:41 16K 15K 94 834 5 570 3 0 0 29800976
215902720
12:25:12 15K 15K 94 820 5 562 3 1 0 29066512
215902720
# ~/bin/arc_summary.pl
System Memory:
Physical RAM: 8169 MB
Free Memory : 0 MB
ARC Size:
Current Size: 22 MB (arcsize)
Target Size (Adaptive): 205 MB (c)
Min Size (Hard Limit): 205 MB (zfs_arc_min)
Max Size (Hard Limit): 1647 MB (zfs_arc_max)
ARC Size Breakdown:
Most Recently Used Cache Size: 5% 11 MB (p)
Most Frequently Used Cache Size: 94% 194 MB (c-p)
ARC Efficency:
Cache Access Total: 81843958
Cache Hit Ratio: 95% 78525502 [Defined State
for buffer]
Cache Miss Ratio: 4% 3318456 [Undefined
State for Buffer]
REAL Hit Ratio: 95% 78418994 [MRU/MFU Hits Only]
Data Demand Efficiency: 97%
Data Prefetch Efficiency: 10%
CACHE HITS BY CACHE LIST:
Anon: --% Counter Rolled.
Most Recently Used: 2% 2209869 (mru)
[ Return Customer ]
Most Frequently Used: 97% 76209125 (mfu)
[ Frequent Customer ]
Most Recently Used Ghost: 1% 965711 (mru_ghost)
[ Return Customer Evicted, Now Back ]
Most Frequently Used Ghost: 0% 176871 (mfu_ghost)
[ Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
Demand Data: 97% 76770304
Prefetch Data: 0% 126644
Demand Metadata: 2% 1628528
Prefetch Metadata: 0% 26
CACHE MISSES BY DATA TYPE:
Demand Data: 63% 2122089
Prefetch Data: 32% 1063449
Demand Metadata: 4% 132894
Prefetch Metadata: 0% 24
---------------------------------------------
# sysctl kstat.zfs.misc.arcstats
kstat.zfs.misc.arcstats.hits: 75409326
kstat.zfs.misc.arcstats.misses: 3144748
kstat.zfs.misc.arcstats.demand_data_hits: 73731356
kstat.zfs.misc.arcstats.demand_data_misses: 2003526
kstat.zfs.misc.arcstats.demand_metadata_hits: 1551917
kstat.zfs.misc.arcstats.demand_metadata_misses: 132730
kstat.zfs.misc.arcstats.prefetch_data_hits: 126027
kstat.zfs.misc.arcstats.prefetch_data_misses: 1008468
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 26
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 24
kstat.zfs.misc.arcstats.mru_hits: 2105758
kstat.zfs.misc.arcstats.mru_ghost_hits: 914887
kstat.zfs.misc.arcstats.mfu_hits: 73197609
kstat.zfs.misc.arcstats.mfu_ghost_hits: 171171
kstat.zfs.misc.arcstats.deleted: 2367973
kstat.zfs.misc.arcstats.recycle_miss: 412788
kstat.zfs.misc.arcstats.mutex_miss: 2865
kstat.zfs.misc.arcstats.evict_skip: 17459
kstat.zfs.misc.arcstats.hash_elements: 2478
kstat.zfs.misc.arcstats.hash_elements_max: 28921
kstat.zfs.misc.arcstats.hash_collisions: 86135
kstat.zfs.misc.arcstats.hash_chains: 25
kstat.zfs.misc.arcstats.hash_chain_max: 3
kstat.zfs.misc.arcstats.p: 14908416
kstat.zfs.misc.arcstats.c: 215902720
kstat.zfs.misc.arcstats.c_min: 215902720
kstat.zfs.misc.arcstats.c_max: 1727221760
kstat.zfs.misc.arcstats.size: 30430560
kstat.zfs.misc.arcstats.hdr_size: 555072
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 135489
This is on FreeBSD 7.2-STABLE #0: Sun Dec 6 23:21:17 CET 2009
ro...@dust.hrej.cz:/usr/obj/usr/src/sys/GENERIC amd64
Can somebody tell me, why ARC Current Size is dropping too low? (1-20MB
if arc_min is 205MB)
The system have 8GB of memory and 8 CPU cores:
last pid: 83605; load averages: 0.17, 0.15, 0.10 up 36+10:34:34
12:29:05
58 processes: 1 running, 56 sleeping, 1 zombie
CPU: 0.1% user, 0.0% nice, 2.3% system, 1.7% interrupt, 95.8% idle
Mem: 237M Active, 6259M Inact, 1154M Wired, 138M Cache, 827M Buf, 117M Free
Swap: 8192M Total, 96K Used, 8192M Free
I have no loader.conf tunning on this machine.
Miroslav Lachman
[...]
> Last night I tried ZFS with pool on iSCSI connected Dell MD3000i and I
> was suprised by too low speed of simple cp -a command (copying from UFS
> partition to ZFS) The write speed was about 2MB/s only.
>
> After looking in to ARC stuff, I realized some weird values:
>
> ARC Size:
> Current Size: 1 MB (arcsize)
> Target Size (Adaptive): 205 MB (c)
> Min Size (Hard Limit): 205 MB (zfs_arc_min)
> Max Size (Hard Limit): 1647 MB (zfs_arc_max)
>
> (stats from script http://cuddletech.com/arc_summary/
> freebsd version
> http://bitbucket.org/koie/arc_summary/changeset/dbe14d2cf52b/ )
>
> I don't know why it shows Current Size 1MB.
[...]
> Today I tried serving the data by Lighttpd.
> There is impressive iSCSI read performance - because of ZFS prefetch, it
> can achieve 880Mbits of read from iSCSI, but serving by Lighttpd only
> about 66Mbits
>
> bce0 - internet
> bce1 - iSCSI to storage MD3000i
>
> bce0 bce1
> Kbps in Kbps out Kbps in Kbps out
> 2423.22 65481.56 855970.7 4348.73
> 2355.26 63911.74 820561.3 4846.08
> 2424.87 65998.62 848937.1 4312.37
> 2442.78 66544.95 858019.0 4356.64
[...]
> ARC Size:
> Current Size: 22 MB (arcsize)
> Target Size (Adaptive): 205 MB (c)
> Min Size (Hard Limit): 205 MB (zfs_arc_min)
> Max Size (Hard Limit): 1647 MB (zfs_arc_max)
>
> ARC Size Breakdown:
> Most Recently Used Cache Size: 5% 11 MB (p)
> Most Frequently Used Cache Size: 94% 194 MB (c-p)
[...]
> Can somebody tell me, why ARC Current Size is dropping too low? (1-20MB
> if arc_min is 205MB)
>
> The system have 8GB of memory and 8 CPU cores:
>
> last pid: 83605; load averages: 0.17, 0.15, 0.10 up 36+10:34:34 12:29:05
> 58 processes: 1 running, 56 sleeping, 1 zombie
> CPU: 0.1% user, 0.0% nice, 2.3% system, 1.7% interrupt, 95.8% idle
> Mem: 237M Active, 6259M Inact, 1154M Wired, 138M Cache, 827M Buf, 117M Free
> Swap: 8192M Total, 96K Used, 8192M Free
Hmmm, it seems related to ZFS + Sendfile bug as was pointed in older thread:
Performance issues with 8.0 ZFS and sendfile/lighttpd
http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052595.html
http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052629.html
I tried the test with Lighttpd again, but with "writev" instead of
sendfile in lighttpd.conf (server.network-backend = "writev") and with
this settings it triple the performance!
Now Lighttpd is serving about 180Mbits instead of 66Mbits and ARC
Current Size is constantly on its max size:
# ~/bin/arc_summary.pl
System Memory:
Physical RAM: 8169 MB
Free Memory : 0 MB
ARC Size:
Current Size: 1647 MB (arcsize)
Target Size (Adaptive): 1647 MB (c)
Min Size (Hard Limit): 205 MB (zfs_arc_min)
Max Size (Hard Limit): 1647 MB (zfs_arc_max)
ARC Size Breakdown:
Most Recently Used Cache Size: 99% 1643 MB (p)
Most Frequently Used Cache Size: 0% 3 MB (c-p)
ARC Efficency:
Cache Access Total: 126994437
Cache Hit Ratio: 94% 119500977 [Defined State
for buffer]
Cache Miss Ratio: 5% 7493460 [Undefined
State for Buffer]
REAL Hit Ratio: 93% 118808103 [MRU/MFU Hits Only]
Data Demand Efficiency: 97%
Data Prefetch Efficiency: 14%
CACHE HITS BY CACHE LIST:
Anon: --% Counter Rolled.
Most Recently Used: 2% 3552568 (mru)
[ Return Customer ]
Most Frequently Used: 96% 115255535 (mfu)
[ Frequent Customer ]
Most Recently Used Ghost: 1% 1277990 (mru_ghost)
[ Return Customer Evicted, Now Back ]
Most Frequently Used Ghost: 0% 464787 (mfu_ghost)
[ Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
Demand Data: 96% 114958883
Prefetch Data: 0% 713418
Demand Metadata: 3% 3828650
Prefetch Metadata: 0% 26
CACHE MISSES BY DATA TYPE:
Demand Data: 40% 3017229
Prefetch Data: 57% 4324961
Demand Metadata: 2% 151246
Prefetch Metadata: 0% 24
# ~/bin/arcstat.pl -f
Time,read,hits,Hit%,miss,miss%,dmis,dm%,mmis,mm%,arcsz,c 30
Time read hits Hit% miss miss% dmis dm% mmis mm% arcsz c
14:04:45 5K 4K 87 672 12 53 1 2 0
1727635056 1727221760
14:05:16 5K 4K 86 679 13 48 1 1 0
1727283200 1727221760
14:05:46 5K 5K 88 674 11 55 1 1 0
1727423184 1727221760
14:06:17 5K 4K 87 668 12 51 1 0 0
1727590560 1727221760
14:06:47 5K 5K 88 665 11 56 1 1 0
1727278896 1727221760
14:07:18 5K 5K 88 664 11 53 1 1 0
1727347632 1727221760
# # ifstat -i bce0,bce1 -b 10
bce0 bce1
Kbps in Kbps out Kbps in Kbps out
6673.90 184872.8 679110.0 3768.23
6688.00 185420.0 655232.8 3834.10
7737.68 214640.4 673375.7 3735.96
6993.61 193602.6 671239.3 3737.48
7198.54 198665.0 688677.0 4037.28
8062.61 222400.4 683966.9 3790.40
There is also big change in memory usage:
last pid: 92536; load averages: 0.19, 0.16, 0.16 up 36+12:22:26
14:16:57
60 processes: 1 running, 58 sleeping, 1 zombie
CPU: 0.0% user, 0.0% nice, 2.5% system, 3.3% interrupt, 94.1% idle
Mem: 1081M Active, 172M Inact, 2800M Wired, 3844M Cache, 827M Buf, 8776K
Free
Swap: 8192M Total, 104K Used, 8192M Free
More Active, less Inact (172M instead of 6259M!) more Cache (3844M
instead of 138M) Buf and Free are close in both cases.
Do somebody know about some fixes of ZFS + Sendfile problem commited in
to 8.x or HEAD?
How can we test if it is general problem with sendfile or local problem
with Lighttpd?
Miroslav Lachman