Unfortunately, we have to rotate the logs of this process once per
day. When we do, it fork()s and exec()s about 16-20 child processes
as helpers. Since it's got this multi-million-entry page table,
that's a disaster, because it has to copy all those page table entries
for each child, then throw them out. This takes a couple of minutes
of 100% CPU usage, during which time the machine is pretty much
unresponsive.
Someone on the squid list suggested we try the new superpages feature
(vm.pmap.pg_ps_enabled) in 7.2. We did, and after some tuning, we got
it to work.
Here's some "sysctl vm.pmap" for a similar machine with 16GB of RAM
that does NOT have this setting enabled:
vm.pmap.pv_entry_count: 2307899
vm.pmap.pde.promotions: 0
vm.pmap.pde.p_failures: 0
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 0
vm.pmap.pv_entry_max: 4276871
vm.pmap.pg_ps_enabled: 0
Now, here is the machine that does have it, just prior to the daily
rotation mentioned above:
vm.pmap.pv_entry_count: 61361
vm.pmap.pde.promotions: 23123
vm.pmap.pde.p_failures: 327946
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 17848
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1
So it obviously this feature makes a huge difference and is a
brilliant idea. :-)
My (limited) understanding is that one of the primary benefits of this
feature is to help situations like ours... a page table that's 512x
smaller can be copied 512x faster. However, in practice this doesn't
happen. It's like fork() breaks up the squid process into 4kb pages
again. Here's the same machine's entries just after rotation:
vm.pmap.pv_entry_count: 1908056
vm.pmap.pde.promotions: 23212
vm.pmap.pde.p_failures: 413171
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21470
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1
So some 3,600 superpages spontaneously turned into 1,850,000 4k pages.
Once this happens, squid seems reluctant to use more superpages until
its restarted. We get a lot of p_failures and a slow-but-steady
stream of demotions. Here's the same machine just now:
vm.pmap.pv_entry_count: 2022786
vm.pmap.pde.promotions: 25281
vm.pmap.pde.p_failures: 996027
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21683
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1
And a few minutes later:
vm.pmap.pv_entry_count: 2021556
vm.pmap.pde.promotions: 25331
vm.pmap.pde.p_failures: 1001773
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21684
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1
(There were *no* p_failures or demotions in the several hours prior to
rotation.)
This trend continues... the pv_entry_count bounces up and down even
though memory usage is increasing, so it's like it's trying to recover
and convert things back (promotions), but it's having a lot of trouble
(p_failures).
It's not clear to me if this might be a problem with the superpages
implementation, or if squid does something particularly horrible to
its memory when it forks to cause this, but I wanted to ask about it
on the list in case somebody who understands it better might know
whats going on. :-)
This is on FreeBSD-STABLE 7.2 amd64 r198976M.
Thanks!
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"
No, it isn't:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
75086 squid 1 4 0 12571M 12584M kqread 6 31:31 0.68% squid
Thanks!
s/fork/vfork/ and you should be fine.
DES
--
Dag-Erling Smørgrav - d...@des.no
..and you should look into replacing Squid with Varnish, of course.
vfork() should mitigate this -- i suggest replacing.
respectfully,
=jt
I did not mean to suggest that we were asking for help solving a
problem with squid rotation. I provided that information as
background to discuss what we observed as a potential misbehavior in
the new VM superpages feature, in the hope that if there is a problem
with the new feature, we can help find/resolve it or, if this is
working as intended, hopefully gain some insight as to what's going
on.
Thanks!
Im sure you will get a lot of lovely answers to this but best keep things
simple. WHy not just syslog it of to another server and offload all the
compression to that box. You could even back it with zfs nad do on the fly
gzip compression at the file system level, or use syslog-ng to do it. If you
are worried about zfs and bsd use (open)*solaris or another filesystem with
with inline compression
Or send squids logs to a small buffer process which you then HUP when
rotating logs.
Also, I don't really understand why squid would fork when you tell it to
rotate, seems like a design defect.
--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C
Anyway. The thread is about superpage demotion and copying, not what
Squid is or isn't doing in her configuration. :)
Adrian
2009/11/27 Daniel O'Connor <doco...@gsoft.com.au>:
OK.
> Anyway. The thread is about superpage demotion and copying, not what
> Squid is or isn't doing in her configuration. :)
Yeah I understand that but if you can avoid the huge problem with a deft
rearrangement that may help your production environment and give you
more time for a real solution :)
I talked with Alan Cox some about this off-list and there is a case that can
cause this behavior if the parent squid process takes write faults on a
superpage before the child process has called exec() then it can result in
superpages being fragmented and never reassembled. Using vfork() should
prevent this from happening. It is a known issue, but it will probably be
some time before it is addressed. There is lower hanging fruit in other areas
in the VM that will probably be worked on first.
--
John Baldwin
OK, as long as somebody who knows more than me knows whats going on,
that's good enough for me. :)
Thanks!
For me the whole threads puzzles me.
Especially because vfork is often called a solution.
Scenario A
Parent with super page
fork/exec
This problem can happen because there is a race.
The parent now has it's super pages fragmented permanently!?
the child throws away his pages because of the exec!?
Scenario B
Parent with super page
vfork/exec
This problem won't happen because the child has no pseudo copy of the
parents memory and then starts with a completely new map.
Scenario C
Parent with super page
fork/ no exec
The problem can happen because the child shares the same memory over
it's complete lifetime.
The parent can get it's super pages fragmented over time.
I don't see a use case for scenario A, because vfork is there since
over 16 years.
I use fork myself, because it is easier sometimes, but people writing
big programms such as squid should know better.
If squid doesn't use vfork they likely have a reason.
With scenario C I don't see how vfork can help, since this is not a legal
case for vfork.
I use quid myself, but don't know how it handles it's childs.
But isn't the whole story about such slave childs that they share memory
with the master? - How can vfork be solution for this case?
How can fragmentation of super pages be avoided at all?
I obviously don't have enough clue about this to understand those details.
Hope that someone can enlighten me.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter <ti...@cicely7.cicely.de> wrote:
> I use fork myself, because it is easier sometimes, but people writing
> big programms such as squid should know better.
> If squid doesn't use vfork they likely have a reason.
Actually they are probably going to switch to vfork(). They were
previously not using it because they thought there was some ambiguity
about whether it was going to be around long term.
I actually am not a huge fan of vfork() since it stalls the parent
process until the child exec()'s.
To me, this case actually highlights why that's an issue. If the
explanation is that stuff is happening in the parent process between
fork() and the child's exec() causes the fragmentation, that's stuff
that would be deferred in a vfork() regime, with unknown potential
consequences. (At a minimum, decreased performance.)
But that's personal and largely uninformed opinion. :)
> Also...
>
> On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter <ti...@cicely7.cicely.de> wrote:
>> I use fork myself, because it is easier sometimes, but people writing
>> big programms such as squid should know better.
>> If squid doesn't use vfork they likely have a reason.
>
> Actually they are probably going to switch to vfork(). They were
> previously not using it because they thought there was some ambiguity
> about whether it was going to be around long term.
Well, the worst that would likely happen to vfork() is it would become an
alias of fork(), and you'd be back to where you are now (or better if
fork() were fixed in the meantime). I'd be more worried about the
mysterious bugs which it's so easy to introduce with vfork() if you do
anything at all nontrivial before exec() and accidentally touch the
parent's memory.
What about using posix_spawn(3)? This is implemented in terms of
vfork(), so you'll gain the same performance advantages, but it avoids
many of vfork's pitfalls. Also, since it's a POSIX standard function, you
needn't worry that it will go away or change its semantics someday.
> I actually am not a huge fan of vfork() since it stalls the parent
> process until the child exec()'s.
If you're doing so much work between vfork() and exec() that this delay is
significant, then I would think you're really abusing vfork().
> To me, this case actually highlights why that's an issue. If the
> explanation is that stuff is happening in the parent process between
> fork() and the child's exec() causes the fragmentation, that's stuff
> that would be deferred in a vfork() regime, with unknown potential
> consequences. (At a minimum, decreased performance.)
Not necessarily. In the fork() case, presumably copy-on-write is to blame
for the fragmentation. In the vfork() case, there's no copy at all.
--
Nate Eldredge
na...@thatsmathematics.com
FYI, this comment has been removed a couple of weeks ago in HEAD and
the STABLE branches.
- Christian
--
Christian Brueffer ch...@unixpages.org brue...@FreeBSD.org
GPG Key: http://people.freebsd.org/~brueffer/brueffer.key.asc
GPG Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D
The main process will stall if you have a busy proxy and there's some
temporary shortage of something which makes connect() take longer than
usual, the main process will stall, potentially causing the shortage
to become worse.
2c,
Adrian
(With his (ex-, kinda) Squid hacker hat on.)
2009/12/10 Linda Messerschmidt <linda.mes...@gmail.com>:
Actually, the fact that vfork() doesn't let the parent execute until the
child has called exec() also closes the race, as it were, and that was the
primary reason in my mind for saying that vfork() would prevent it.
> Scenario C
> Parent with super page
> fork/ no exec
> The problem can happen because the child shares the same memory over
> it's complete lifetime.
> The parent can get it's super pages fragmented over time.
>
> I don't see a use case for scenario A, because vfork is there since
> over 16 years.
> I use fork myself, because it is easier sometimes, but people writing
> big programms such as squid should know better.
> If squid doesn't use vfork they likely have a reason.
> With scenario C I don't see how vfork can help, since this is not a legal
> case for vfork.
> I use quid myself, but don't know how it handles it's childs.
> But isn't the whole story about such slave childs that they share memory
> with the master? - How can vfork be solution for this case?
> How can fragmentation of super pages be avoided at all?
In Linda's case it was a fork to run a log rotation binary, so it was A).
For C) I think you would want to map the pages MAP_SHARED (or use minherit(2)
with INHERIT_SHARE) in which case they would not be COW'd on fork() and you
would keep the superpages. Assuming that you explicitly want to share the
memory with your child processes you already have to do this to really do
sharing anyway.
--
John Baldwin
> What about using posix_spawn(3)? This is implemented in terms of vfork(),
> so you'll gain the same performance advantages, but it avoids many of
> vfork's pitfalls. Also, since it's a POSIX standard function, you needn't
> worry that it will go away or change its semantics someday.
Just as a note here: while we do posix_spawn(3) as a library function, Mac OS
X does it as a system call. As a result, they can implement certain spawn
flags that we can't, among others, the ability to have the newly created
process/image be suspended before its first instruction executes. This would
be very useful when debugging the runtime linker, among other things. On the
other hand, it's quite a complex kernel code path...
Robert N M Watson
Computer Laboratory
University of Cambridge
I'm not sure how you are defining "problem". If we define "problem" as I
would, i.e., that "re-promotion can never occur", then Scenario C is not
a problem scenario, only Scenario A is.
The source of the problem in Scenario A is basically that we have two ways
of handling copy-on-write faults. Before the exec() occurs, copy-on-write
faults are handled as you might intuit from the name, a new physical copy is
made. If the entirety of the 2MB region is written to before the exec(),
then
this region will be promoted to a superpage. However, once the exec()
occurs,
copy-on-write faults are "optimized". Specifically, the kernel recognizes
that
the underlying physical page is no longer shared with the child and simply
restores write access to it. It is the combination of these two methods
that
effectively blocks re-promotion because the underlying 4KB physical pages
within a 2MB region are no longer contiguous.
In other words, once the first page within a region has been copied, you
have
a choice to make: Do you perform avoidable copies or do you abandon the
possibility of ever creating a superpage. The former has a significant
one-time cost and the latter has a small recurring cost. Not knowing how
much the latter will add up to, I chose the former. However, that choice
may change in time, particularly, if I find an effective heuristic for
choosing
between the two options.
Anyway, please keep trying superpages with large memory applications like
this. Reports like this help me to prioritize my efforts.
Regards,
Alan