Prolog system error while in garbage collection

115 views
Skip to first unread message

Marijn Schraagen

unread,
Oct 27, 2015, 12:28:27 PM10/27/15
to SWI-Prolog
The following problem occurs using the foreign interface:

After many loops using PL_put_variable, PL_put_integer, PL_cons_functor, PL_put_functor, PL_open_query/PL_next_solution/PL_close_query sequences, PL_record/PL_recorded/PL_erase sequences, and assertz(Term,Ref)/erase(Ref) calls, all running without problems, at some moment a PROLOG SYSTEM ERROR appears, with information about relocation cells, garbage collection and a Prolog Stack trace containing my DCG predicates.

My question is: what is the best way of debugging this? I am aware that the question is vague, but I have no clear idea where to start. I suspect that I am introducing a leak or an invalid reference or something out of scope, but I don't know where or why.

Removing the assertz call from the loop makes the problem disappear, however this is the exact purpose of the code so I would like to keep that.

The error occurs when performing PL_next_solution. If I break to Prolog right before that (cin.get() in C++, ctrl-c, (b)reak) and ask the query directly in Prolog (a phrase/3 query) then the same error occurs, however with a different stack (some internals instead of my DCG predicates):
[PROLOG SYSTEM ERROR:  Thread 1
    relocation cells
= 579; relocated_cells = 579, needs_relocation = 578
   
[While in 14-th garbage collection]


PROLOG STACK
:
 
[8] system:read_term_from_atom/3 [PC=1 in supervisor]
 
[7] system:catch/3 [PC=2 in clause 1]
 
[6] $history:read_history_/6 [PC=55 in clause 3]
 
[5] $history:read_history/6 [PC=25 in clause 1]
 
[4] $toplevel:read_query/3 [PC=35 in clause 1]
 
[3] $toplevel:$query_loop/0 [PC=73 in clause 1]
 
[2] system:$c_call_prolog/0 [PC=0 in top query clause]
 
[1] phrase/3 <no clause>
 
[0] system:$c_call_prolog/0 [PC=0 in top query clause]
]

If I trace the query and leap, then it is is executed without any problem and the solutions are presented.

The problem is completely deterministic (i.e., happens always at the same moment), however if I skip a number of (succesful) loop executions before the crash then the problem appears for a different query (later in the loop, surprisingly). Any ideas? I'm willing to supply code, try to write a minimal working example, etc, if necessary.

Jan Wielemaker

unread,
Oct 27, 2015, 12:38:37 PM10/27/15
to Marijn Schraagen, SWI-Prolog
I have to leave, but the best first step is to try the development
release. Most of the times it has less bugs :) When using C(++),
the error might indeed be either Prolog or you violating the
constraints using the API. We get to that if the problem persists.

Cheers --- Jan

On 27-10-15 17:28, Marijn Schraagen wrote:
> The following problem occurs using the foreign interface:
>
> After many loops using PL_put_variable, PL_put_integer, PL_cons_functor,
> PL_put_functor, PL_open_query/PL_next_solution/PL_close_query sequences,
> PL_record/PL_recorded/PL_erase sequences, and
> assertz(Term,Ref)/erase(Ref) calls, all running without problems, at
> some moment a PROLOG SYSTEM ERROR appears, with information about
> relocation cells, garbage collection and a Prolog Stack trace containing
> my DCG predicates.
>
> My question is: what is the best way of debugging this? I am aware that
> the question is vague, but I have no clear idea where to start. I
> suspect that I am introducing a leak or an invalid reference or
> something out of scope, but I don't know where or why.
>
> Removing the assertz call from the loop makes the problem disappear,
> however this is the exact purpose of the code so I would like to keep that.
>
> The error occurs when performing PL_next_solution. If I break to Prolog
> right before that (cin.get() in C++, ctrl-c, (b)reak) and ask the query
> directly in Prolog (a phrase/3 query) then the same error occurs,
> however with a different stack (some internals instead of my DCG
> predicates):
> |
> [PROLOG SYSTEM ERROR:Thread1
> relocation cells =579;relocated_cells =579,needs_relocation =578
>
> [Whilein14-th garbage collection]
>
>
> PROLOG STACK:
> [8]system:read_term_from_atom/3[PC=1insupervisor]
> [7]system:catch/3[PC=2inclause 1]
> [6]$history:read_history_/6[PC=55inclause 3]
> [5]$history:read_history/6[PC=25inclause 1]
> [4]$toplevel:read_query/3[PC=35inclause 1]
> [3]$toplevel:$query_loop/0[PC=73inclause 1]
> [2]system:$c_call_prolog/0[PC=0intop query clause]
> [1]phrase/3<noclause>
> [0]system:$c_call_prolog/0[PC=0intop query clause]
> ]
> |
>
> If I trace the query and leap, then it is is executed without any
> problem and the solutions are presented.
>
> The problem is completely deterministic (i.e., happens always at the
> same moment), however if I skip a number of (succesful) loop executions
> before the crash then the problem appears for a different query (later
> in the loop, surprisingly). Any ideas? I'm willing to supply code, try
> to write a minimal working example, etc, if necessary.
>
> --
> You received this message because you are subscribed to the Google
> Groups "SWI-Prolog" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to swi-prolog+...@googlegroups.com
> <mailto:swi-prolog+...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/swi-prolog.
> For more options, visit https://groups.google.com/d/optout.

Marijn Schraagen

unread,
Oct 28, 2015, 6:30:51 AM10/28/15
to SWI-Prolog, marijn.s...@phil.uu.nl
Following your suggestion I have installed version 7.3.9. The problem still occurs, unfortunately. What would be the best second step?

Thanks in advance, Marijn

Jan Wielemaker

unread,
Oct 28, 2015, 6:52:02 AM10/28/15
to Marijn Schraagen, SWI-Prolog
On 10/28/2015 11:30 AM, Marijn Schraagen wrote:
> Following your suggestion I have installed version 7.3.9. The problem
> still occurs, unfortunately. What would be the best second step?

That depends on the OS and how you installed SWI-Prolog. If you have a
source installation, edit the generated src/Makefile, change COFLAGS to
`COFLAGS=-DO_DEBUG -DSECURE_GC -gdwarf-2 -g3`. Run `make clean`,
`make` and `make install` in `src`.

Now make sure to pass the flag `-d chk_secure` to Prolog (embedded, this
means add this to the argv passed to PL_initialise()/PlEngine()).

Run your program under gdb.

That will make the system a *lot* slower, but might give a hint what
is wrong. Possibly you can relate that to your code. Possibly not.
Then you need to package the stuff so I can have a closer look.
Alternatively you study the garbage collector :) I'm afraid there are
only about three people who understand enough of it :)

Success --- Jan




>
> Thanks in advance, Marijn
>
> On Tuesday, 27 October 2015 17:38:37 UTC+1, Jan Wielemaker wrote:
>
> I have to leave, but the best first step is to try the development
> release. Most of the times it has less bugs :) When using C(++),
> the error might indeed be either Prolog or you violating the
> constraints using the API. We get to that if the problem persists.
>
> Cheers --- Jan
>
>

Marijn Schraagen

unread,
Oct 28, 2015, 12:05:44 PM10/28/15
to SWI-Prolog, marijn.s...@phil.uu.nl
After installing from source (swipl-devel/src with the COFLAGS as well as swipl-devel/packages/cpp with default settings) I now have version 7.3.9-48-g4431f35-DIRTY.

Without the -d chk_secure the behaviour is the same as before (using either plain command line or gdb): normal run for a number of iterations and then a Prolog System Error.

With -d chk_secure the behaviour is different. First, there are some warnings on initial functions using assertz:
0x7f884f8685e8=local(2746): not on attvar chain
[DATA INCONSISTENCY: attvar: not on attvar chain: 0x4f8685e8]
repeated a couple of times.

Then, when entering the loop, the execution already fails in the first iteration (instead of after several iterations as before). Errors like:
[DATA INCONSISTENCY: Term at 0x7ffff7e9fb78 not on global stack]
[DATA INCONSISTENCY: Illegal functor: 0x196c0e]
[DATA INCONSISTENCY: Illegal term at: 0x7ffff7e9fb88: 0x7a908d]
[DATA INCONSISTENCY: functor with mark: 0x4910d]
[DATA INCONSISTENCY: Illegal reference pointer at 0xf7e9fc18 --> 0xf7e9fb70]
occur during PL_close_query.

Then, after
PL_recorded(rec,term_x)
PL_erase
(rec)
PL_cons_functor
(term1,funct2,term_x,...)
PL_call
(term1,NULL)
the following errors are shown:
!Illegal cell in global stack (up) at 0x7ffff7e954a0=global(1168) (*= [M]term at global(1163))
!Illegal cell in global stack (up) at 0x7ffff7e954a8=global(1169) (*= [M]functor [|]/2)
...
!Illegal cell in global stack (down) at 0x7ffff7e97cb0 (*= 0x4912d)
...
followed by
[Thread 1 (main) at Wed Oct 28 14:33:53 2015] pl-gc.c:3663: checkStacks: Assertion failed: scan_global(0)
C
-stack trace labeled "assert_fail":
 
[0] save_backtrace() at /home/marijn/install/swipl-devel/src/os/pl-cstack.c:307 [0x7ffff7b72e82]
 
[1] __assert_fail() at /home/marijn/install/swipl-devel/src/pl-thread.c:6040 [0x7ffff7b2733f]
 
[2] checkStacks() at /home/marijn/install/swipl-devel/src/pl-gc.c:3664 [0x7ffff7ac1f86]
 
[3] PL_open_query() at /home/marijn/install/swipl-devel/src/pl-wam.c:2214 [0x7ffff7a7a836]
 
[4] callProlog() at /home/marijn/install/swipl-devel/src/pl-pro.c:318 [0x7ffff7ae0a3a]
 
[5] PL_call() at /home/marijn/install/swipl-devel/src/pl-fli.c:3645 [0x7ffff7a76edd]
Program received signal SIGABRT, Aborted.

The (simplified) program flow is shown below. Interestingly, the error occurs on the second run of the for-loop, when i=1.

initial_functions_using_assertz();

PlCall("consult('pprint.pl')"; //official SWI-Prolog pretty print library
functor_t funct2
= PL_new_functor(PL_new_atom("print_term"),2);
term_t nl
= PL_new_term_ref();
PL_put_nil
(nl);

PL_new_term_ref
, PL_new_functor, PL_pred for further variables
record_t rec
;
qid_t qr
;

while(iterations){
   
for(int i=0;i<2;i++){
        PL_put_integer
(term_i,i);
        PL_cons_functor
(term_f,funct1,term_a,term_b,term_i);
        qr
= PL_open_query(NULL,PL_Q_NORMAL,pred1,term_f);
       
bool rec_defined = false;
       
while(PL_next_solution(qr)){
            PL_get_arg
(1,term_f,term_x);
            PL_get_arg
(3,term_f,term_y);
            PL_get_integer
(term_y,j);
           
if(check(j))
                rec
= PL_record(term_x);
                rec_defined
= true;
       
}//while
        PL_close_query
(qr);
       
if(rec_defined == true){
            PL_recorded
(rec,term_x);
            PL_erase
(rec);
            PL_cons_functor
(term1,funct2,term_x,nl);
            PL_call
(term1,NULL);        // results in !Illegal cell in global stack, 2nd time of for-loop only
       
}//if
   
}//for
}//while

It seems there is more information now, but I still do not fully understand what the problem is. What do you think?

Thanks again,

Marijn

Jan Wielemaker

unread,
Oct 28, 2015, 12:26:53 PM10/28/15
to Marijn Schraagen, SWI-Prolog
On 10/28/2015 05:05 PM, Marijn Schraagen wrote:
> After installing from source (swipl-devel/src with the COFLAGS as well
> as swipl-devel/packages/cpp with default settings) I now have version
> 7.3.9-48-g4431f35-DIRTY.
>
> Without the -d chk_secure the behaviour is the same as before (using
> either plain command line or gdb): normal run for a number of iterations
> and then a Prolog System Error.
>
> With -d chk_secure the behaviour is different. First, there are some
> warnings on initial functions using assertz:
> |
> 0x7f884f8685e8=local(2746):noton attvar chain
> [DATA INCONSISTENCY:attvar:noton attvar chain:0x4f8685e8]
> |
> repeated a couple of times.

Are you using constraints? If not, this is most likely some data
corruption already.

> Then, when entering the loop, the execution already fails in the first
> iteration (instead of after several iterations as before). Errors like:
> |
> [DATA INCONSISTENCY:Termat 0x7ffff7e9fb78noton globalstack]
> [DATA INCONSISTENCY:Illegalfunctor:0x196c0e]
> [DATA INCONSISTENCY:Illegalterm at:0x7ffff7e9fb88:0x7a908d]
> [DATA INCONSISTENCY:functor withmark:0x4910d]
> [DATA INCONSISTENCY:Illegalreference pointer at 0xf7e9fc18-->0xf7e9fb70]
> |
> occur during PL_close_query.
>
> Then, after
> |
> PL_recorded(rec,term_x)
> PL_erase(rec)
> PL_cons_functor(term1,funct2,term_x,...)
> PL_call(term1,NULL)
> |
> the following errors are shown:
> |
> !Illegalcell inglobalstack (up)at 0x7ffff7e954a0=global(1168)(*=[M]term
> at global(1163))
> !Illegalcell inglobalstack (up)at
> 0x7ffff7e954a8=global(1169)(*=[M]functor [|]/2)
> ...
> !Illegalcell inglobalstack (down)at 0x7ffff7e97cb0(*=0x4912d)
> ...
> |
> followed by
> |
> [Thread1(main)at
> WedOct2814:33:532015]pl-gc.c:3663:checkStacks:Assertionfailed:scan_global(0)
> C-stack trace labeled "assert_fail":
> [0]save_backtrace()at
> /home/marijn/install/swipl-devel/src/os/pl-cstack.c:307[0x7ffff7b72e82]
> [1]__assert_fail()at
> /home/marijn/install/swipl-devel/src/pl-thread.c:6040[0x7ffff7b2733f]
> [2]checkStacks()at
> /home/marijn/install/swipl-devel/src/pl-gc.c:3664[0x7ffff7ac1f86]
> [3]PL_open_query()at
> /home/marijn/install/swipl-devel/src/pl-wam.c:2214[0x7ffff7a7a836]
> [4]callProlog()at
> /home/marijn/install/swipl-devel/src/pl-pro.c:318[0x7ffff7ae0a3a]
> [5]PL_call()at
> /home/marijn/install/swipl-devel/src/pl-fli.c:3645[0x7ffff7a76edd]
> Programreceived signal SIGABRT,Aborted.

This is probably caused by the already corrupt data after the
PL_close_query().

> |
>
> The (simplified) program flow is shown below. Interestingly, the error
> occurs on the second run of the for-loop, when i=1.
>
> |
> initial_functions_using_assertz();
>
> PlCall("consult('pprint.pl')";//official SWI-Prolog pretty print library
> functor_t funct2 =PL_new_functor(PL_new_atom("print_term"),2);
> term_t nl =PL_new_term_ref();
> PL_put_nil(nl);
>
> PL_new_term_ref,PL_new_functor,PL_pred forfurther variables
> record_t rec;
> qid_t qr;
>
> while(iterations){
> for(inti=0;i<2;i++){
> PL_put_integer(term_i,i);
> PL_cons_functor(term_f,funct1,term_a,term_b,term_i);

The scoping of these term references might be critical. You typically
need PL_open_foreign_frame() and PL_close/discard_foreign_frame() to
avoid resource leakage.

Can you make the entire program available somewhere?

Cheers --- Jan

> qr =PL_open_query(NULL,PL_Q_NORMAL,pred1,term_f);
> boolrec_defined =false;

Jan Wielemaker

unread,
Oct 29, 2015, 8:47:18 AM10/29/15
to Marijn Schraagen, SWI-Prolog
Dear Marijn,

I took the opportunity to describe a little more about the options for
debugging such issues. Will become online with the next version. You
can see the LaTeX source at [1]

The function PL_stack_stacks() was just added to the git version. From
gdb you get the same by calling checkStacks(NULL), but this function is not
exported from the shared library. You can call this function from your
program at various points to find out where exactly the data becomes
invalid.

Hope this helps a little. Comments for improving this section are
welcome.

Cheers --- Jan

[1]
https://github.com/SWI-Prolog/swipl-devel/blob/master/man/foreign.doc#L3715

Jan Wielemaker

unread,
Oct 30, 2015, 5:49:25 AM10/30/15
to Marijn Schraagen, SWI-Prolog
On 10/28/2015 05:05 PM, Marijn Schraagen wrote:
> |while(iterations){
> for(inti=0;i<2;i++){
> PL_put_integer(term_i,i);
> PL_cons_functor(term_f,funct1,term_a,term_b,term_i);
> qr =PL_open_query(NULL,PL_Q_NORMAL,pred1,term_f);
> boolrec_defined =false;
> while(PL_next_solution(qr)){
> PL_get_arg(1,term_f,term_x);
> PL_get_arg(3,term_f,term_y);
> PL_get_integer(term_y,j);
> if(check(j))
> rec =PL_record(term_x);
> rec_defined =true;
> }//while
> PL_close_query(qr);
> if(rec_defined ==true){
> PL_recorded(rec,term_x);
> PL_erase(rec);
> PL_cons_functor(term1,funct2,term_x,nl);
> PL_call(term1,NULL);// results in !Illegal cell in global
> stack, 2nd time of for-loop only
> }//if
> }//for
> }//while|

I'm not really sure what the above is doing, but it looks a bit like
this:

pred1(f(A,B,I)) :-
check(I),
call(...).

As a general rule of thumb, do not try to write Prolog control
structures in C(++). It is hopelessly long, hard to debug and
most likely slower as data needs to pass through the C api.

Typically, do the complex term handling stuff in Prolog. Make
the interface to this predicate comfortable to work with from
C and then use a simple control structure in C.

In most cases, you are much better of turning the control around:
wrap the C functionality in Prolog predicates and do the control
from Prolog.

Cheers --- Jan

Marijn Schraagen

unread,
Nov 5, 2015, 9:18:08 AM11/5/15
to SWI-Prolog, marijn.s...@phil.uu.nl
Thanks to Jan the problem has been solved. It turned out that PL_next_solution invalidates term assignments on backtracking, which created a stack problem if the same term is assigned again in the next loop. Manually resetting the term using PL_put_variable makes the term available for subsequent use. In code (provided by Jan):

term_t res = PL_new_term_ref();
...
while(PL_next_solution(phrase_qr)) {
    get_a
= PL_get_arg(1,term_start,res);
   
...
    PL_put_variable
(res);
}
Reply all
Reply to author
Forward
0 new messages