What happens to a tcl script, if the system runs out of memory?

George Petasis

unread,

Jan 6, 2012, 3:33:22 PM1/6/12

to

Hi all,

I have a linux server which recently started to lock-up, and I suspect a
tcl script of consuming all system memory.

If I remember correctly, the core memory allocation functions will never
return NULL, but wait until memory is available again.
Does this still hold?

Is it possible that a tcl script may consume all the available memory,
and stopped there waiting for more memory?
And the process gets never killed, but makes the server inaccessible?
(I observe this on a Fedora 14 64-bit linux system).

George

Uwe Klein

unread,

Jan 6, 2012, 4:17:32 PM1/6/12

to

lockup should have other reasons.

OOM condition is met by the OOM-killer ;-)
http://prefetch.net/blog/index.php/2009/09/30/how-the-linux-oom-killer-works/

You can have situations that consumes all available cpu cycles.

you can test by pinging the host ( should return with normal delay )
and then doing an ssh login ( usually you get a login prompt and
then things get sticky )
if it takes ages or forever to get a shell prompt "someone" takes all cycles.

Have an ssh session active before anything happens with top running.
I have seen applications or the X-Server going into that mode.
Typical effect is that the culprit is running an ultrafast loop over
timeofday() and ?poll/select? . I've never fathomed why that happens.
My tentative guess is someone miscalculated the timeout ;-)

uwe

Alexandre Ferrieux

unread,

Jan 6, 2012, 6:45:44 PM1/6/12

to

On Jan 6, 9:33 pm, George Petasis <petas...@yahoo.gr> wrote:
> Hi all,
>
> I have a linux server which recently started to lock-up, and I suspect a
> tcl script of consuming all system memory.
>
> If I remember correctly, the core memory allocation functions will never
> return NULL, but wait until memory is available again.
> Does this still hold?
>
> Is it possible that a tcl script may consume all the available memory,
> and stopped there waiting for more memory?

What's taking time is paging, when those allocated area must land on
real RAM (pushing some other pages out, with a slow write) from the
swap partition, assuming that is the issue.

> And the process gets never killed, but makes the server inaccessible?
> (I observe this on a Fedora 14 64-bit linux system).

Yes, I have seen this too. The oom_killer eventually kicks in, but the
system has been unusable for quite some time before (and may awaken
dizzy ;-).

To debug this kind of thing, the best is to catch the growing memory
consumption before it gets dangerous. One easy way is 'ulimit'. Give
it something like 25% of the RAM, you'll stay in the safe area for the
global system. If the program eats up memory, it'll start getting
null mallocs, and most likely Tcl_Panic'ing.

-Alex

George Petasis

unread,

Jan 7, 2012, 1:54:58 AM1/7/12

to

Στις 6/1/2012 23:17, ο/η Uwe Klein έγραψε:
> George Petasis wrote:
>> Hi all,
>>
>> I have a linux server which recently started to lock-up, and I suspect
>> a tcl script of consuming all system memory.
>>
>> If I remember correctly, the core memory allocation functions will
>> never return NULL, but wait until memory is available again.
>> Does this still hold?
>>
>> Is it possible that a tcl script may consume all the available memory,
>> and stopped there waiting for more memory?
>> And the process gets never killed, but makes the server inaccessible?
>> (I observe this on a Fedora 14 64-bit linux system).
>>
>> George
>
> lockup should have other reasons.
>
> OOM condition is met by the OOM-killer ;-)
> http://prefetch.net/blog/index.php/2009/09/30/how-the-linux-oom-killer-works/
>
>
> You can have situations that consumes all available cpu cycles.
>
> you can test by pinging the host ( should return with normal delay )

Yes, the host responds to ping.

> and then doing an ssh login ( usually you get a login prompt and
> then things get sticky )

No prompt in ssh.

> if it takes ages or forever to get a shell prompt "someone" takes all
> cycles.
>
> Have an ssh session active before anything happens with top running.
> I have seen applications or the X-Server going into that mode.
> Typical effect is that the culprit is running an ultrafast loop over
> timeofday() and ?poll/select? . I've never fathomed why that happens.
> My tentative guess is someone miscalculated the timeout ;-)
>
> uwe

There are no X applications running, except a realvnc server with a
gnome session, which contains a terminal.

I have been waiting for 4 days (!) now for the OOM killer to kick-in,
but I don't know if it is going to happen. The server runs for two years
now (without problems), but it is the second time it locks the last
three weeks (the previous was resolved with a reset).

And I know before-hand that the app that can consume almost all physical
RAM is a tcl script. This is why I am suspecting the tcl allocation
routines. If tcl panics after a while it cannot get more memory, the
server will come back again. But will tcl panic?

George

George Petasis

unread,

Jan 7, 2012, 2:06:50 AM1/7/12

to

Στις 7/1/2012 01:45, ο/η Alexandre Ferrieux έγραψε:
> On Jan 6, 9:33 pm, George Petasis<petas...@yahoo.gr> wrote:
>> Hi all,
>>
>> I have a linux server which recently started to lock-up, and I suspect a
>> tcl script of consuming all system memory.
>>
>> If I remember correctly, the core memory allocation functions will never
>> return NULL, but wait until memory is available again.
>> Does this still hold?
>>
>> Is it possible that a tcl script may consume all the available memory,
>> and stopped there waiting for more memory?
>
> What's taking time is paging, when those allocated area must land on
> real RAM (pushing some other pages out, with a slow write) from the
> swap partition, assuming that is the issue.

Yes, I know. And paging is much slower in my system, as the swap is on a
raid-1 disk (mirroring with dmraid).

>
>> And the process gets never killed, but makes the server inaccessible?
>> (I observe this on a Fedora 14 64-bit linux system).
>
> Yes, I have seen this too. The oom_killer eventually kicks in, but the
> system has been unusable for quite some time before (and may awaken
> dizzy ;-).

Have you observed inactivity for days? I am on the forth day now, and
the server is not physically accessible by me.

>
> To debug this kind of thing, the best is to catch the growing memory
> consumption before it gets dangerous. One easy way is 'ulimit'. Give
> it something like 25% of the RAM, you'll stay in the safe area for the
> global system. If the program eats up memory, it'll start getting
> null mallocs, and most likely Tcl_Panic'ing.

I hope tcl eventually panics :D. And I hope it does that soon...
ulimit is an option, as is setrlimit.
But it is interesting that a tcl script can cause so much trouble to a
linux system...

George

Uwe Klein

unread,

Jan 7, 2012, 4:27:33 AM1/7/12

to

George Petasis wrote:
> And I know before-hand that the app that can consume almost all physical
> RAM is a tcl script. This is why I am suspecting the tcl allocation
> routines. If tcl panics after a while it cannot get more memory, the
> server will come back again. But will tcl panic?
>

OOM wont happen, me thinks.

IMHO you have the "fast loop over timeofday" thing on your hands.
doesn't vnc run by way of a "virtual" X-Server?

uwe

to...@flughafenstr.de

unread,

Jan 7, 2012, 9:11:05 AM1/7/12

to

Heh. Any process which is allowed to serve itself with resources as it
likes to is capable of causing trouble :)

I concur that it may be either some process hogging the CPU or (more
probably) the system swapping to death (Did you configure a swap space?
How big, compared to RAM?).

The idea with ulimit is the way to go, I think.

Regards
-- tomás

tombert

unread,

Jan 7, 2012, 12:39:15 PM1/7/12

to

Hi,

just for info, I run into similar questions when using all X resources:
https://sourceforge.net/tracker/?func=detail&aid=3461051&group_id=10894&atid=110894

Unfort. there is no error when no resources are left ...

thomas

Alexandre Ferrieux

unread,

Jan 7, 2012, 12:48:44 PM1/7/12

to

On Jan 7, 6:39 pm, tombert <tomb...@live.at> wrote:
> Hi,
>

> just for info, I run into similar questions when using all X resources:https://sourceforge.net/tracker/?func=detail&aid=3461051&group_id=108...

>
> Unfort. there is no error when no resources are left ...
>
> thomas

Sorry, I fail to see the link with this thread... Have you measured
the memory consumption of your process when things start misbehaving ?
That does not appear in your report. Is the rest of the system
unreponsive ? Doesn't seem so, because you say you [exit] then (and
Tcl crashes).

If you feel the community is not responsive enough about your bug,
then ping by adding a comment to the report, or start a separate
thread here; but please don't mix with another, unrelated thread.

-Alex

George Petasis

unread,

Jan 7, 2012, 3:25:45 PM1/7/12

to

Στις 7/1/2012 16:11, ο/η to...@flughafenstr.de έγραψε:
>
> Heh. Any process which is allowed to serve itself with resources as it
> likes to is capable of causing trouble :)
>
> I concur that it may be either some process hogging the CPU or (more
> probably) the system swapping to death (Did you configure a swap space?
> How big, compared to RAM?).

I still think that the process is known, its a tcl script. I think on
Monday (that somebody will press the reset button on the pc), I am going
to find the log from the tcl script incomplete :-)

The swap is not much, it is about 5GB (the pc had 4GB initially, now it
has 8GB RAM).

>
> The idea with ulimit is the way to go, I think.

I am going to place a limit, although I think this will still block the
process (tcl will wait for ever to allocate more memory, and my "after"
event to terminate the process if not finished withing 8 hours, will
never fire, as it is not firing also now). I am leaning towards
setrlimit, as ulimit is a bash thing...

George

Alexandre Ferrieux

unread,

Jan 7, 2012, 3:42:21 PM1/7/12

to

On Jan 7, 9:25 pm, George Petasis <petas...@yahoo.gr> wrote:
>
> > The idea with ulimit is the way to go, I think.
>
> I am going to place a limit, although I think this will still block the
> process (tcl will wait for ever to allocate more memory, and my "after"
> event to terminate the process if not finished withing 8 hours, will
> never fire, as it is not firing also now).

No, there is no such thing as a long-blocking malloc when not paging.
Assuming you have no swap, or that you set limits such that your
process can be satisfied within physical RAM, then Tcl will not "wait
for ever to allocate memory": it will either instantly get it or
instantly get null (in which case it'll panic or error out). And when
it gets it, it will be real, usable RAM, with no disk IO.

> I am leaning towards
> setrlimit, as ulimit is a bash thing...

Yeah, setrlimit is the syscall behind bash's ulimit; if you prefer
writing C code to avoid one line of shell, be my guest ;-)

-Alex

Georgios Petasis

unread,

Jan 8, 2012, 7:15:40 AM1/8/12

to

Στις 7/1/2012 22:42, ο/η Alexandre Ferrieux έγραψε:
> On Jan 7, 9:25 pm, George Petasis<petas...@yahoo.gr> wrote:
>>
>>> The idea with ulimit is the way to go, I think.
>>
>> I am going to place a limit, although I think this will still block the
>> process (tcl will wait for ever to allocate more memory, and my "after"
>> event to terminate the process if not finished withing 8 hours, will
>> never fire, as it is not firing also now).
>
> No, there is no such thing as a long-blocking malloc when not paging.
> Assuming you have no swap, or that you set limits such that your
> process can be satisfied within physical RAM, then Tcl will not "wait
> for ever to allocate memory": it will either instantly get it or
> instantly get null (in which case it'll panic or error out). And when
> it gets it, it will be real, usable RAM, with no disk IO.

Yes, this is correct, I just looked at the sources.

George