There was a myth than languages with garbage collector are free of
this problem. Of course, until someone leaves forgotten reference to
10 MB of stuff..
How is in Prolog?... Are memory leaks possible?
I have Prolog application that is implemented as a "running forever
server", and has rather simple structure
a) some facts are asserted. These facts contain data of some
constraint problem
b) predicate is executed that solves constraint problem. This
predicate asserts list with results,
c) more predicates are executed that retrieve results from this list
d) When everything is done, cleanup predicate is called than retracts
all facts and result list.
In some circumstances, this server leaks memory.
I did the homework - I checked about a dozen papers on Prolog GC and
memory management. In one, "Understanding Memory Management in Prolog
Systems" by Castro and Costa, there is Table 3 that shows "garbage
collector efficiency". According to this table, GC efficiency for
non-deterministic predicates in not always perfect.
Questions:
1. Is memory leak in the scheme mentioned above possible? If yes, when
and why?
2. What is the strategy to manage memory in long-running Prolog
applications?...
I am asking these questions as "generic" ones, not regarding any
specific Prolog implementation.
A.L.
Obviously memory management is implementation dependent.
Some memory managers have problems with cyclic data definitions. (Like the
conservative garbage collectors buildt on top of C++)
Though not exactly a leak some procedures allocate data that never gets
deleted.
In general don't define a term before you need it and make sure it
disappears when it goes out of scope.
(This is the problems with some Java libraries like Swing..)
Be warned that I usually program (Common) Lisp not Prolog. Never the less
memory mangement rears it ugly head in any language.
--------------
John Thingstad
There is no easy answer, and definitely not if you want it to be
`generic'.
> I have Prolog application that is implemented as a "running forever
> server", and has rather simple structure
>
> a) some facts are asserted. These facts contain data of some
> constraint problem
> b) predicate is executed that solves constraint problem. This
> predicate asserts list with results,
> c) more predicates are executed that retrieve results from this list
> d) When everything is done, cleanup predicate is called than retracts
> all facts and result list.
>
> In some circumstances, this server leaks memory.
>
> I did the homework - I checked about a dozen papers on Prolog GC and
> memory management. In one, "Understanding Memory Management in Prolog
> Systems" by Castro and Costa, there is Table 3 that shows "garbage
> collector efficiency". According to this table, GC efficiency for
> non-deterministic predicates in not always perfect.
>
> Questions:
>
> 1. Is memory leak in the scheme mentioned above possible? If yes, when
> and why?
There is not just one garbage collector in Prolog. Typically Prolog
systems refer to the stack garbage collector when they state `garbage
collector'. This one is supposed to be `perfect', provided you either
use a failure driven loop or a deterministic tail-recursive loop. Prolog
GC generally deals with cyclic terms (if the Prolog system itself deals
with them).
I don't think there is a single estabished tradition to GC clauses from
dynamic predicates. SWI-Prolog for example GCs clauses on predicate
basis if the predicate is sufficiently dirty and there is no open
choice point on it. The predicate itself is never deleted. Neither are
name/arity pairs nor modules. Atoms are subject to atom garbage
collection.
> 2. What is the strategy to manage memory in long-running Prolog
> applications?...
Typically, I'd say the main loop must be failure-driven or subject to
last-call optimization. Dynamic code and records must be
retracted/erased by the user (a good alternative is not to create any
:-). The user should not keep creating new predicates, name/arity pairs
or modules. Under those constraints you should be fine with SWI-Prolog
and I believe similar constraints hold for most Prolog systems.
> I am asking these questions as "generic" ones, not regarding any
> specific Prolog implementation.
The Prolog standard doesn't specify any of this ...
--- Jan
> How is in Prolog?... Are memory leaks possible?
It depends on what you mean by a memory leak.
Of course the programmer can "mess up" - here is an example that
causes a memory leak in some Prolog systems:
a(N,_) :- N > 0, M is N - 1, a(M,foo(1,2,3,4,5)).
a(N,_) :- N =< 0.
Try the query ?- a(<I>,[]).
with <I> equal to 10000000 in SICStus, Yap, SWI and you will see.
Did the programmer mess up, or the implementation ?
In B-Prolog (and hProlog) this query runs in a flash.
Let's remember what went "wrong" as (1).
> a) some facts are asserted. These facts contain data of some
> constraint problem
> b) predicate is executed that solves constraint problem. This
> predicate asserts list with results,
> c) more predicates are executed that retrieve results from this list
> d) When everything is done, cleanup predicate is called than retracts
> all facts and result list.
<
> In some circumstances, this server leaks memory.
Maybe I am assuming too much here, but is this process repeated, and
after some repetitions, you get into memory problems ?
If that is the case, it means that the cleanup is not good enough (for
the Prolog implementation you are working with). Again there is the
question: did you mess up, or are you dealing with a bad implementation.
> I did the homework
We know you - you are a serious guy.
> I checked about a dozen papers on Prolog GC and
> memory management. In one, "Understanding Memory Management in Prolog
> Systems" by Castro and Costa, there is Table 3 that shows "garbage
> collector efficiency". According to this table, GC efficiency for
> non-deterministic predicates in not always perfect.
Most often, the problem (as far as I know about GC in Prolog) is not
that GC efficiency for non-deterministic predicates in not always
perfect, but that Prolog systems consider some predicates as
non-deterministic, while the programmer thought they were
deterministic. The a/2 predicate above shows that.
One thing to realise is: GC can't be perfect (as in "reading the
programmers mind about what data is usefull in the future"). Being
perfect about garbage is equivalent to the Halting Problem. (let me
mark this as (a)). So a programmer must help the system, however good
it is.
> 1. Is memory leak in the scheme mentioned above possible? If yes, when
> and why?
> 2. What is the strategy to manage memory in long-running Prolog
> applications?...
>
> I am asking these questions as "generic" ones, not regarding any
> specific Prolog implementation.
First of all, the generic answer to both of them is by nature very
unsatisfactory: (1) yes, because if (a); (2) there is none, because
Prolog implementations (can) have their own idea about reachability,
usefulness logic, determinism ... In general, you should even be
prepared for the odd Prolog implementation that even in the presence
of a failure driven loop, refuses to collect any garbage (as you would
define it) because it keeps all data about any execution it ever
performed so that it can give you instant feedback on those runs.
But you are in luck today, there are some things you can do as a
programmer, at least when you use a reasonable system like Yap,
SICStus, SWI ...
Try to find out whether your (local/trail/choicepoint) stacks grow
beyond what you expected. If for the a/2 predicate above you expected
constant size, and you observe something else
- put cuts in your "non-deterministic" predicates
- make predicates tail-recursive
If you observe that the heap (global stack) grows more than you
expected, and your other stacks are constant size, then there are two
possibilities:
- your program really needs more space than is available
- you are responsible for keeping data in you arguments that
is actually garbage
Compare the latter to an implementation of a stack in a garbage
collected imperative language (e.g. Java): when you pop, you better
also nil the top, otherwise the popped elements potentially remain
live long after you expected them to die.
It would be nice to have tools that tell you about predicates that
leave choicepoints. In hProlog, I have
?- det(somegoal).
It is very cheap, but when choicepoints are left, it prints out a
backtrace of the choicepoints. For instance:
?- det(queens(4,L)).
/* choice points are left by */
select/3.
queens/3.
select/3.
queens/3.
select/3.
queens/3.
select/3.
queens/3.
range/3.
range/3.
range/3.
The same could be done for detecting non-tail recursive predicates.
in the presence of dynamic code, your problems only get worse: does
the system detect properly that the last clause was selected (this
might depend on indexing of dynamic code) ? When does the system
garbage collect dynamic code (this affects detection of the last
clause) ? How good is this dyncode gc ? ...
There is no generic answer to those questions. My advice would be:
don't use dynamic code - Jan SWI would probably agree to a large
extent with that.
And to agree even more with Jan, I would say: use failure driven
loops; do your cleanup after backtracking.
And also use tail-recursion and deterministic predicates whenever
possible.
Don't use dynamic pred - use findall when you can. Paul Tarau coined
the term "poor man's garbage collection" for findall - as usual, he
was spot on :-)
Cheers
Bart Demoen
>On Sat, 26 Jan 2008 09:19:20 -0600, A.L wrote:
>
>> How is in Prolog?... Are memory leaks possible?
>
>It depends on what you mean by a memory leak.
>Of course the programmer can "mess up" - here is an example that
>causes a memory leak in some Prolog systems:
>
>a(N,_) :- N > 0, M is N - 1, a(M,foo(1,2,3,4,5)).
>a(N,_) :- N =< 0.
>
>Try the query ?- a(<I>,[]).
>with <I> equal to 10000000 in SICStus, Yap, SWI and you will see.
>
Great. Very insightful example. SICStus crashes with this argument;
with 1000000 consumes 160 MB and keeps this memory forever...
>
>> a) some facts are asserted. These facts contain data of some
>> constraint problem
>> b) predicate is executed that solves constraint problem. This
>> predicate asserts list with results,
>> c) more predicates are executed that retrieve results from this list
>> d) When everything is done, cleanup predicate is called than retracts
>> all facts and result list.
><
>> In some circumstances, this server leaks memory.
>
>Maybe I am assuming too much here, but is this process repeated, and
>after some repetitions, you get into memory problems ?
>
>If that is the case, it means that the cleanup is not good enough (for
>the Prolog implementation you are working with). Again there is the
>question: did you mess up, or are you dealing with a bad implementation.
>
Yes - I suspect that there is something wrong with my programming, but
in context of the above example I don't think that SICStus is very
supportive :)
Just to be more technical:
1. I am using SICStus,
2. I have application written long ago, in SICStus 3.10.2. It is
embedded in J++ (Microsoft dialect of Java, now known as J#) using
in-process DLL that was implemented using SICStus Visual Basic
interface. This application runs in production environment, 7/24,
since 2004 without any problems
3. The application mentioned above was obsoleted and rewritten in Sun
Java and SICStus 3.12.8. Since the official interface to SICStus is
now Prolog Beans, I replaced JNI by Prolog Beans. However, the Prolog
code was not changed at all. This was as simple as adding a layer of
code that registers predicates for Prolog Beans.
This application crashes after about 50,000 calls to the server
4. New application was written; much larger and more complex, also
with Prolog Beans interface. This one crashes after 200 calls to the
server.
This is the motivation to my questions. Generally, I don't believe in
"this-is-compiler-error" mantra, and I strongly believe that I have
screwed up. Especially in the light of above example.
Just a warning for Prolog enthusiasts :)
From the other perspective, there is close to nothing in Prolog books
about memory management. Exception is "The Kraft Of Prolog", but this
is just a bit short and just a bit outdated.
Generally, there is very little about software engineering aspects of
Prolog programming, efficiency and such. Especially in context of
implementing large projects. It would be good to have something like
"Effective Java" by Josh Bloch, but titled "Effective Prolog"...
Thanks for comments.
A.L.
>On 2008-01-26, A.L <alew...@fala2005.com> wrote:
>> Well, we know that memory leaks are everyday business when programming
>> in C++.
>>
>> There was a myth than languages with garbage collector are free of
>> this problem. Of course, until someone leaves forgotten reference to
>> 10 MB of stuff..
>>
>> How is in Prolog?... Are memory leaks possible?
>
>There is no easy answer, and definitely not if you want it to be
>`generic'.
Thanks for comments. I wrote longer response to Bart's post. Your
recommendations are exactly as his recommendations, therefore I
comment his post only.
A.L.
> 1. I am using SICStus,
> 2. I have application written long ago, in SICStus 3.10.2. It is
> embedded in J++ (Microsoft dialect of Java, now known as J#) using
> in-process DLL that was implemented using SICStus Visual Basic
> interface. This application runs in production environment, 7/24,
> since 2004 without any problems
> 3. The application mentioned above was obsoleted and rewritten in Sun
> Java and SICStus 3.12.8. Since the official interface to SICStus is
> now Prolog Beans, I replaced JNI by Prolog Beans. However, the Prolog
> code was not changed at all. This was as simple as adding a layer of
> code that registers predicates for Prolog Beans.
>
> This application crashes after about 50,000 calls to the server
>
> 4. New application was written; much larger and more complex, also
> with Prolog Beans interface. This one crashes after 200 calls to the
> server.
Have you tried looking at the output from statistics after 50 and after
a 100 calls to the server ?
This will at least uncover any growing "normal" Prolog execution stacks/heaps.
But maybe it is just a leak in the SICStus Prolog Beans which you need to make
SICS aware off, or once you can locate it, work around it ?
Cheers
Bart Demoen
I don't know where you got that from Bart, but it isn't *my* advice. The
big problem of failure driven loops is they tend to succeed without
having done all the work. The other problem is that often there some
state to be preserved and it goas into the dynamic database.
If there is no state at all, forall/2 is a good way to do a failure
driven loop though. I.e., this is fine:
forall(repeat, main_goal).
Cheers --- Jan
> Did the programmer mess up, or the implementation ?
> In B-Prolog (and hProlog) this query runs in a flash.
For the non-implementers: this is because in the first three
implementations, the Prolog compiler's indexing is not clever enough
to discriminate between the two clauses, and so every recursive call
pushes a choicepoint that is useless and "left behind". The two
latter implementations have a more clever indexing, and do not push
any choicepoint.
The lesson for the user is that the degree of indexing is somewhat
implementation dependent.
SICStus comes with a static analysis tool, which knows precisely what
the compiler does and what it does not. The tool takes a piece of
Prolog code and points out predicates that might cause "choicepoints
to be left behind" that the user may not be aware of:
jastreb.sics.se>cat bart.pl
bart(N) :-
bart(N, []).
bart(N, L) :- N > 0, M is N - 1, a(M,foo(1,2,3,4,5)).
bart(N,_) :- N =< 0.
jastreb.sics.se>bin/spdet bart.pl
/home/matsc/sicstus4/bart.pl
* Non-determinate: user:bart/2 (clause 1)
* Indexing cannot distinguish this from clause 2.
In this case, the implementation can be helped by inserting a cut
after "N>0".
-=-=-=-=-=-=-=-
In my experience, a more difficult kind of "memory leak" is memory
fragmentation. Many Prolog implementations manage their own memory;
others rely on malloc & free or similar. In both cases, memory can
easily become fragmented when the process has run for a long while,
filling the memory with blocks that are too small to be useful. Some
memory managers (SICStus's for example) coalesce adjacent free blocks
as far as possible, but I'm not aware of any Prolog that move blocks
around and completely defragment memory, which would be extremely
complex, what with all the necessary relocation and everything. The
only safe cure I know of is to restart the process.
--Mats
This is indeed a problem I have seen, though Prolog isn't the only
language suffering. In fact, Prolog may not be that bad as lots of
things are allocated on the stacks and these are nicely compacted.
One does need to be careful with dynamic code, records and atoms
though. Dynamic code might often not be that bad as quite often it
uses asserts and retracts clauses of the same structure that happen to
result in a perfect match. Atoms are a bigger problem, especially if
code uses lots of long atoms (SWI-Prolog has no limit except for
available memory).
Many programs that run 24x7 execute a limited set of similar routines
frequently and for them memory often stops growing after a while.
--- Jan
I am now instrumenting my code with statistics gathering tools to
figure it out, especially where are leaks. One possibility is that
Java part has leaks and limits memory available for Prolog. I don't
have "smoking gun" yet...
A.L.
>O
>
>In my experience, a more difficult kind of "memory leak" is memory
>fragmentation. Many Prolog implementations manage their own memory;
>others rely on malloc & free or similar. In both cases, memory can
>easily become fragmented when the process has run for a long while,
>filling the memory with blocks that are too small to be useful. Some
>memory managers (SICStus's for example) coalesce adjacent free blocks
>as far as possible, but I'm not aware of any Prolog that move blocks
>around and completely defragment memory, which would be extremely
>complex, what with all the necessary relocation and everything. The
>only safe cure I know of is to restart the process.
>--Mats
Agree. I had this problem with this early application, although not
necessary created by Prolog. Prolog was crashing because lack of
memory, despite enough memory was available, but fragmentation was
done by Java. Objects created using JNI are "pinned" - they cannot be
moved by GC and memory gets fragmented. Because of this, Java was
crashing as frequently as Prolog.
Current application doesn't run in continuous mode like the old one -
our ambition is to complete single session without restart. However,
if it is not possible, we will restart server in the middle of the
process.
A.L.
Maybe stupid question, but why?... When atoms are GCd?...
A.L.
>On 2008-01-28, bart demoen <b...@cs.kuleuven.be> wrote:
>> And to agree even more with Jan, I would say: use failure driven
>> loops; do your cleanup after backtracking.
>
>I don't know where you got that from Bart, but it isn't *my* advice. The
>big problem of failure driven loops is they tend to succeed without
>having done all the work.
All the work - regarding memory management?..
>The other problem is that often there some
>state to be preserved and it goas into the dynamic database.
>
>If there is no state at all, forall/2 is a good way to do a failure
>driven loop though. I.e., this is fine:
>
> forall(repeat, main_goal).
... and if there is no forall?...
A.L.
All the work as in
loop :-
( repeat,
step1,
step2,
step3,
fail
; true
).
and step2 fails while this is not intended. This programs nicely
does step1 and some bit of step2 in a loop but never warns that it
doesn't do the whole thing.
>>The other problem is that often there some
>>state to be preserved and it goas into the dynamic database.
>>
>>If there is no state at all, forall/2 is a good way to do a failure
>>driven loop though. I.e., this is fine:
>>
>> forall(repeat, main_goal).
>
> ... and if there is no forall?...
forall(Cond, Action) :-
\+ (Cond, \+(Action)).
--- Jan
These atoms are (at least for SWI-Prolog) allocated through malloc()
and so they are spread everywhere. GC nicely releases them, calling
free(). Repeating this process causes fragmentation, details
depending on the malloc()/free() implementation used. The more long
atoms, he higher the likelyhood they have different length.
True, it is possible to implement a different schema and compact the
strings. The only drawback is that in the current implementation the
C interface can give a pointer to the string which is guaranteed to be
safe as long as the atom remains reachable.
--- Jan
Atoms that occur literally in your code will never be gc'd.
But you must be careful if your code dynamically creates
new atoms (or functors) at runtime: they will probably only
be useful for a while, but they nevertheless end up in an
atom/functor table, which will only be cleaned up infrequently
by a separate garbage collector.
Primitives that may create new entries in this table are:
- atom_concat/2, sub_atom/5, atom_chars/2 and similar
- read/2 and similar
- functor/3, =../2
- gensym/2
since they can all create new atom/functors that haven't
been seen before.
-- Joachim
> Atoms that occur literally in your code will never be gc'd.
This is not universally true: it seems like those atoms are gc'd
in SWI-Prolog - and they were in BIM-Prolog.
Cheers
Bart Demoen
I guess Joachim means with 'literally in your code' that is is part of
the program. Any atom appearing in the program is referenced by some
predicate and will not be GC'ed. I can think of only two exceptions:
abolish the code or do term-expansion such that what is 'literally in
your code' isn't what goes into the Prolog DB :-)
SWI-Prolog does reference-counting on atoms referenced from predicates,
records and foreign code. Atom-gc marks all atoms reachable from the
stacks of all threads.
Cheers --- Jan
When I wrote that atoms seem gc'd in SWI even when appearing literally in your code,
I did that based on a small experiment - here is the code:
s(N) :-
N > 0,
open('foo.pl',write,S),
write(S,'foo(asd'), write(S,N), write(S,').'), nl(S),
close(S),
[foo],
M is N - 1,
s(M).
Calling ?- s(10000). results in 10000 programs being consulted, and each of
them has a different atom literally appearing in it - at least to my understanding
of those words. Statistics at the end says:
3,415 atoms, 1,208 functors, 1,388 predicates, 18 modules, 28,935 VM-codes
and
1 atom garbage collections gained 8,520 atoms in 0.00 seconds.
Is that consistent with what any of us has been claiming ?
Cheers
Bart Demoen
I think it is consistent with all views. Joachim probably claims that
`litterally in your code' only applies to the last version you loaded.
You are right that at some point it was literally in the code and I am
right because reloading the same file abolishes the old definitions in
SWI-Prolog.
We are all happy :-)
I also think reloading files for anything but development purposes is
a bad habbit, so I think a Prolog system that keeps these atoms (or
even the 10,000 clauses) around is equally acceptable.
--- Jan
I've found that run to completion/restart is more reliable for most
environments, not just Prolog.
Dhu
OK, but must be done when I want, not when COMPUTER wants
A.L.
Yes. That is "run to completion".
Dhu
> I think it is consistent with all views. Joachim probably claims that
> `litterally in your code' only applies to the last version you loaded.
> You are right that at some point it was literally in the code and I am
> right because reloading the same file abolishes the old definitions in
> SWI-Prolog.
>
> We are all happy :-)
I recommend you for the post of EU-peacekeeper at the next ISO meeting !
> I also think reloading files for anything but development purposes is
> a bad habbit, so I think a Prolog system that keeps these atoms (or
> even the 10,000 clauses) around is equally acceptable.
But now you are overdoing it :-)
Cheers
Bart Demoen