Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

anomalous SIGKILL

1 view
Skip to first unread message

Mike

unread,
Oct 7, 2007, 12:28:37 PM10/7/07
to
I occasionally get the simple message "Killed", when I try to execute
a
particular program. Using strace on it shows only two lines: execve
(...) and
...got SIGKILL. Why is it doing this?

The program runs as a daemon with one name "drd", and as an user
interface with another name "dr", using a symbolic link. The user
interface
code is trivially simple, using only ncurses and two files.

Also when this happens, the running daemon is apparently unable to
execute new processes using system ("...") calls, which it normally
does several times a minute. It is as if new processes can't start,
but
everything else seems to work. The number of processes is small,
and the CPU load is small.

-Mike

Måns Rullgård

unread,
Oct 7, 2007, 4:34:26 PM10/7/07
to
Mike <michael.h....@gmail.com> writes:

Sounds like you've hit the limit for number of processes. Check your
ulimit settings.

--
Måns Rullgård
ma...@mansr.com

Paul Pluzhnikov

unread,
Oct 7, 2007, 5:31:58 PM10/7/07
to
Måns Rullgård <ma...@mansr.com> writes:

> Mike <michael.h....@gmail.com> writes:
>
>> ...got SIGKILL. Why is it doing this?

Perhaps because OOM killer decided that you consumed too much RAM?

>> Also when this happens, the running daemon is apparently unable to
>> execute new processes using system ("...") calls

You mean "just before this happens"?
It's hard to expect process that has been killed with SIGKILL to
be able to execute anything.

>> but everything else seems to work. The number of processes is small,
>> and the CPU load is small.
>
> Sounds like you've hit the limit for number of processes.

Not really: hitting that limit will not in and of itself cause the
process to be terminated with SIGKILL, and he stated that "number
of processes is small".

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.

Måns Rullgård

unread,
Oct 7, 2007, 5:46:48 PM10/7/07
to
Paul Pluzhnikov <ppluzhn...@charter.net> writes:

> Måns Rullgård <ma...@mansr.com> writes:
>
>> Mike <michael.h....@gmail.com> writes:
>>
>>> ...got SIGKILL. Why is it doing this?
>
> Perhaps because OOM killer decided that you consumed too much RAM?
>
>>> Also when this happens, the running daemon is apparently unable to
>>> execute new processes using system ("...") calls
>
> You mean "just before this happens"?
> It's hard to expect process that has been killed with SIGKILL to
> be able to execute anything.
>
>>> but everything else seems to work. The number of processes is small,
>>> and the CPU load is small.
>>
>> Sounds like you've hit the limit for number of processes.
>
> Not really: hitting that limit will not in and of itself cause the
> process to be terminated with SIGKILL, and he stated that "number

I realised this just after posting. He could be hitting some other
ulimit though.

--
Måns Rullgård
ma...@mansr.com

Mike

unread,
Oct 7, 2007, 9:51:56 PM10/7/07
to

> > Perhaps because OOM killer decided that you consumed too much RAM?

Possibly, but the "top" program shows the daemon using only 6% of
memory.

>
> >>> Also when this happens, the running daemon is apparently unable to
> >>> execute new processes using system ("...") calls
>
> > You mean "just before this happens"?
> > It's hard to expect process that has been killed with SIGKILL to
> > be able to execute anything.

No, the daemon runs continuously. The user interface program runs
occasionally.
(same executable program actually, but different filename, like gzip/
gunzip).
When the interface program won't run because it gets SIGKILL, that
happens
precisely when the daemon system ("...") calls fail also.

> >>> but everything else seems to work. The number of processes is small,
> >>> and the CPU load is small.
>
> >> Sounds like you've hit the limit for number of processes.
>
> > Not really: hitting that limit will not in and of itself cause the
> > process to be terminated with SIGKILL, and he stated that "number
>
> I realised this just after posting. He could be hitting some other
> ulimit though.

The programs are all run as root. "ulimit" returns "unlimited".
However, killing other processes seems to temporarily solve the
problem.

-Mike

Paul Pluzhnikov

unread,
Oct 7, 2007, 11:07:45 PM10/7/07
to
Mike <michael.h....@gmail.com> writes:

>> > You mean "just before this happens"?
>

> When the interface program won't run because it gets SIGKILL, that
> happens precisely when the daemon system ("...") calls fail also.

See if you can run 'strace -fpo /tmp/junk.trace <daemon-pid>' when
the problem happens, and if you can determine which part of system(3)
is failing.

Probably clone(2) is failing, but why? If it's ENOMEM, then something
is taking up all that memory; if it's EAGAIN, too many processes
(or perhaps too many threads).

> The programs are all run as root. "ulimit" returns "unlimited".

You want 'ulimit -a' (it's highly unusual for all the separate
limits to be "unlimited").

Mike

unread,
Oct 9, 2007, 9:23:22 PM10/9/07
to
On Oct 8, 3:07 am, Paul Pluzhnikov <ppluzhnikov-...@charter.net>
wrote:

> You want 'ulimit -a' (it's highly unusual for all the separate
> limits to be "unlimited").

Ah yes, I see that 'ulimit -a' is not unlimited for root for:

open files (-n) 1024
pipe size (512 bytes, -p) 8
max user processes (-u) 10234

However, I think that the problem is caused by an out-of-memory
condition
caused by another program - the 'opera' browser. Killing the browser
cleared
up the problem previously, and then today, the browser crashed while
opening
a new tab, producing an out-of-memory kernel message in "/var/log/
messages":
...
Oct 10 00:35:52 A241105 kernel: HighMem: 1*4kB 0*...
Oct 10 00:35:52 A241105 kernel: Swap cache: add 339056, delete 339009,
find 109385/123622, race 0+0
Oct 10 00:35:52 A241105 kernel: Free swap: 0kB
Oct 10 00:35:52 A241105 kernel: 327488 pages of RAM
Oct 10 00:35:52 A241105 kernel: 98112 pages of HIGHMEM
Oct 10 00:35:52 A241105 kernel: 3790 reserved pages
Oct 10 00:35:52 A241105 kernel: 10488 pages shared
Oct 10 00:35:52 A241105 kernel: 47 pages swap cached
Oct 10 00:35:52 A241105 kernel: Out of Memory: Killed process 2218
(opera).


This make me wonder if RAM size plus swap size were larger than 4GB,
would
this still happen? 'Top' shows:

Mem: 1295052k total, 1258848k used, 36204k free, 23956k
buffers
Swap: 793760k total, 793756k used, 4k free, 56492k
cached

-Mike

Mike

unread,
Oct 10, 2007, 11:25:48 AM10/10/07
to
The problem has returned, without 'opera' running.

'Top' shows zero free swap space. It shows the largest memory use
by X, at 35%.

As expected, I killed a non-essential process and the problem
cleared.

What is going on?

-Mike

John Reiser

unread,
Oct 10, 2007, 11:48:03 AM10/10/07
to

The kernel believes that it is out of memory. Look in /proc/meminfo,
"ps axl" (VSZ and RSS), "df" and /proc/mounts (for RAM-based filesystems
such as /dev/shm, tmpfs, etc.) to see where the pages of RAM+swap went.
Are any RAM disks in use? [/var/log/messages: "RAMDISK driver initialized:
xx RAM disks of xxxxxK size 1024 blocksize"] Show "uname -a" and "uptime".
Contrast with a freshly-booted system. Take snapshots once per hour;
more often when swap drops below 15% free. For a suspicious process,
consult /proc/<pid>/smaps .

--

Mike

unread,
Oct 13, 2007, 2:47:14 PM10/13/07
to
Update:
The problem seems to stem from running a browser (firefox or opera),
which then causes X to consume large amounts of memory. If those
programs aren't run after boot-up, then the 800MB of swap never gets
used at all.

Oddly, right after reboot, even the RAM usage is small, which then
gradually rises over many hours to consume almost the entire
1.3GB of RAM, except for perhaps 20 or 30MB, and settles there.

Does this seem like normal behavior?

-Mike


Måns Rullgård

unread,
Oct 13, 2007, 2:53:42 PM10/13/07
to
Mike <michael.h....@gmail.com> writes:

Most of that used RAM is probably disk cache. Check /proc/meminfo for
a breakdown.

--
Måns Rullgård
ma...@mansr.com

Mike

unread,
Oct 13, 2007, 6:39:39 PM10/13/07
to
On Oct 13, 1:53 pm, Måns Rullgård <m...@mansr.com> wrote:

> Most of that used RAM is probably disk cache. Check /proc/meminfo for
> a breakdown.

Cached is 745MB, so it's more than half.

Also, to answer John's question, there is no RAMDISK.

My question now is, assuming there are no bugs causing a memory leak
in the
browser or X code, should it run out of swap space? Why?

-Mike

Måns Rullgård

unread,
Oct 13, 2007, 6:53:09 PM10/13/07
to
Mike <michael.h....@gmail.com> writes:

If you try to run too many applications at once you can certainly run
out of memory, both RAM and swap space. Your only choices are to not
run all those apps at the same time, install more physical RAM in the
machine, or create more swap space.

--
Måns Rullgård
ma...@mansr.com

Mike

unread,
Oct 13, 2007, 8:42:24 PM10/13/07
to
On Oct 13, 5:53 pm, Måns Rullgård <m...@mansr.com> wrote:
> Mike <michael.h.william...@gmail.com> writes:

> > My question now is, assuming there are no bugs causing a memory leak
> > in the browser or X code, should it run out of swap space? Why?
>
> If you try to run too many applications at once you can certainly run
> out of memory, both RAM and swap space. Your only choices are to not
> run all those apps at the same time, install more physical RAM in the
> machine, or create more swap space.

OK, thanks for the advice. I'll try more swap space.
However, I don't consider the computer to be heavily loaded.

Also, it seems odd to me, that by adding just one
more application (firefox or the opera browser), that this
gradually causes all of the 800MB swap space to be
used up, when otherwise it is never used.

-Mike

Paul Pluzhnikov

unread,
Oct 13, 2007, 10:30:54 PM10/13/07
to
Mike <michael.h....@gmail.com> writes:

> Also, it seems odd to me, that by adding just one
> more application (firefox or the opera browser), that this
> gradually causes all of the 800MB swap space to be
> used up, when otherwise it is never used.

It is extremely unlikely that running out of swap is not caused by
a leak in either the browser(s), or the X server.

What can you do about it? One of your messages implies that X is
"bigger" than firefox.

You may want to make sure you have the latest X; but if that's
still showing the problem, you can either try to analyze X for
leaks, or give up and just not run firefox/opera on that system
(or restart X daily).

Mike

unread,
Oct 20, 2007, 2:16:34 PM10/20/07
to
On Oct 13, 9:30 pm, Paul Pluzhnikov <ppluzhnikov-...@charter.net>
wrote:

> or give up and just not run firefox/opera on that system
> (or restart X daily).

Update:
I have discovered that the konqueror browser does not cause the
problem.
It comes with KDE for SuSE linux, but firefox and opera were
separately
downloaded. With konqueror, no swap space gets used. This appears to
be true even when viewing weather satellite animations, which I think
are fairly big memory consumers.

-Mike

0 new messages