Why does (reset) take so long?

88 views
Skip to first unread message

Ged Byrne

unread,
Jun 20, 2023, 5:36:46 AM6/20/23
to CLIPSESG
Greeting Clipsegers, 

I'm doing a timing exercise as part of my dissertation on graph processing, and one thing I've noticed is that (reset) takes a very long time.

I'm using ClipsPy to call Clips from a Jupyter notebook.  I'm detecting cycles of up to 20 edges in a randomly generated graph with 1,000 notes and 2,000 edges.

I do with this with a separate rule for each cycle length.  For example, here is the rule for a cycle of length 4.

( defrule FindSimpleCyclesLength4
    (edge (fromVid ?vId1 ) (toVid ?vId2&:(<> ?vId2 ?vId1 ) ) )
    (edge (fromVid ?vId2&:(< ?vId1 ?vId2 ) ) (toVid ?vId3&:(<> ?vId3 ?vId1 ?vId2 ) ) )
    (edge (fromVid ?vId3&:(< ?vId1 ?vId3 ) ) (toVid ?vId4&:(<> ?vId4 ?vId1 ?vId2 ?vId3 ) ) )
    (edge (fromVid ?vId4&:(< ?vId1 ?vId4 ) ) (toVid ?vId1 ) )
=>
    (collectCycles (str-cat  ?vId1 "->" ?vId2 "->" ?vId3 "->" ?vId4 ) )
)

The rules are generated by the python.  You can see the code below.

I populate the graph first, then define the rules.  On my laptop it takes 1m 24.7s to find 91,066 cycles.  That's a decent baseline considering I haven't started any optimizations.

What puzzles me is the time taken to perform a (reset) - 4m 54.6s! 

Why does it take so long?  Is there a faster method available?

Regards, 


Ged


Code:

### Clips Environment Initialisation
import clips

env = clips.Environment()

### The Vertex Template
env.build("""
    (deftemplate vertex
        (slot vId (type INTEGER ) )
    )
""" )

### The Edge Template

env.build ("""
(deftemplate edge
    (slot fromVid (type INTEGER ) )
    (slot toVid (type INTEGER ) )
)
""" )

vertex_template = env.find_template('vertex')

def addVertex( vId ):
    vertex_template.assert_fact( vId=vId )

edge_template = env.find_template('edge')

def addEdge( fromVid, toVid ):
    edge_template.assert_fact( fromVid=fromVid, toVid=toVid )

import matplotlib.pyplot as plt
import networkx as nx

n = 1000  
m = 2000  
seed = 20160  # seed random number generators for reproducibility

# Use seed for reproducibility
G = nx.gnm_random_graph(n, m, seed=seed, directed=True)

edges = G.edges()
vertexes = G.nodes()

actual_cycles = set()

def collectCycles( cycle ):
    global actual_cycles
    actual_cycles.add( cycle )

env.define_function( collectCycles )

for vertex in vertexes:
    addVertex( vertex )

for edge in edges:
    fromVid, toVid = edge
    addEdge( fromVid, toVid )

def buildRuleForLength(length):
    rule = "( defrule FindSimpleCyclesLength{l}\n".format(l=length)
    rule += "    (edge (fromVid ?vId1 ) (toVid ?vId2&:(<> ?vId2 ?vId1 ) ) )\n"
    for i in range( 2,length ):
        rule += "    (edge (fromVid ?vId{pre}&:(< ?vId1 ?vId{pre} ) ) (toVid ?vId{cur}&:(<> ?vId{cur} {uniquenessTest} ) ) )\n".format(
            pre=i,
            cur=i+1,
            uniquenessTest =" ".join("?vId{}".format( j ) for j in range( 1, i+1 ) )
        )
    rule += "    (edge (fromVid ?vId{l}&:(< ?vId1 ?vId{l} ) ) (toVid ?vId1 ) )\n".format(l=length)
    rule += ( "=>\n" )
    rule += "    (collectCycles (str-cat {string_template}) )\n".format(
        string_template = "\"->\"".join( " ?vId{} ".format( j ) for j in range(1, length+1) ) )
    rule += ")"
    return rule

for n in range( 20 ):
    env.build( buildRuleForLength( n + 1 ) )
1m 24.6s

env.eval("(run )")
2.5s

import numpy as np

def cycle_id( list ):
    rolled_list = np.roll( list, -np.argmin(list)  )
    return "->".join( str( item ) for item in rolled_list )

expected_cycles = set()
for cycle in nx.simple_cycles( G, length_bound=20 ):
    expected_cycles.add( cycle_id( cycle ) )

print( str( len( expected_cycles ) ) + " " + str( len( actual_cycles ) )  + " " + str( expected_cycles ^ actual_cycles ) )

14.3s

env.eval("(reset)")
4m 51.6s

CLIPS Support

unread,
Jun 20, 2023, 2:51:12 PM6/20/23
to CLIPSESG
I tried reproducing this behavior just within CLIPS, but didn't have any luck. If you create files containing your rules and facts using the save and save-facts functions and attach them, it will be possible to examine this behavior outside of the ClipsPy environment.


         CLIPS (6.4.1 4/8/23)
CLIPS>
(deftemplate edge
    (slot fromVid (type INTEGER))
    (slot toVid (type INTEGER)))
CLIPS>
( defrule FindSimpleCyclesLength20

    (edge (fromVid ?vId1 ) (toVid ?vId2&:(<> ?vId2 ?vId1 ) ) )
    (edge (fromVid ?vId2&:(< ?vId1 ?vId2 ) ) (toVid ?vId3&:(<> ?vId3  ?vId1 ?vId2 ) ) )
    (edge (fromVid ?vId3&:(< ?vId1 ?vId3 ) ) (toVid ?vId4&:(<> ?vId4  ?vId1 ?vId2 ?vId3 ) ) )
    (edge (fromVid ?vId4&:(< ?vId1 ?vId4 ) ) (toVid ?vId5&:(<> ?vId5  ?vId1 ?vId2 ?vId3 ?vId4 ) ) )
    (edge (fromVid ?vId5&:(< ?vId1 ?vId5 ) ) (toVid ?vId6&:(<> ?vId6  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ) ) )
    (edge (fromVid ?vId6&:(< ?vId1 ?vId6 ) ) (toVid ?vId7&:(<> ?vId7  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ) ) )
    (edge (fromVid ?vId7&:(< ?vId1 ?vId7 ) ) (toVid ?vId8&:(<> ?vId8  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ) ) )
    (edge (fromVid ?vId8&:(< ?vId1 ?vId8 ) ) (toVid ?vId9&:(<> ?vId9  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ) ) )
    (edge (fromVid ?vId9&:(< ?vId1 ?vId9 ) ) (toVid ?vId10&:(<> ?vId10  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ) ) )
    (edge (fromVid ?vId10&:(< ?vId1 ?vId10 ) ) (toVid ?vId11&:(<> ?vId11  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ) ) )
    (edge (fromVid ?vId11&:(< ?vId1 ?vId11 ) ) (toVid ?vId12&:(<> ?vId12  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ) ) )
    (edge (fromVid ?vId12&:(< ?vId1 ?vId12 ) ) (toVid ?vId13&:(<> ?vId13  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ) ) )
    (edge (fromVid ?vId13&:(< ?vId1 ?vId13 ) ) (toVid ?vId14&:(<> ?vId14  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ) ) )
    (edge (fromVid ?vId14&:(< ?vId1 ?vId14 ) ) (toVid ?vId15&:(<> ?vId15  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ) ) )
    (edge (fromVid ?vId15&:(< ?vId1 ?vId15 ) ) (toVid ?vId16&:(<> ?vId16  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ) ) )
    (edge (fromVid ?vId16&:(< ?vId1 ?vId16 ) ) (toVid ?vId17&:(<> ?vId17  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ) ) )
    (edge (fromVid ?vId17&:(< ?vId1 ?vId17 ) ) (toVid ?vId18&:(<> ?vId18  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ) ) )
    (edge (fromVid ?vId18&:(< ?vId1 ?vId18 ) ) (toVid ?vId19&:(<> ?vId19  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ?vId18 ) ) )
    (edge (fromVid ?vId19&:(< ?vId1 ?vId19 ) ) (toVid ?vId20&:(<> ?vId20  ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ?vId18 ?vId19 ) ) )
    (edge (fromVid ?vId20&:(< ?vId1 ?vId20 ) ) (toVid ?vId1 ) )
=>)
CLIPS>
(deffunction assertSomeFacts (?group ?size)
   (bind ?s (+ 1 (* (- ?group 1) ?size)))
   (bind ?os ?s)
   (loop-for-count (- ?size 1)
      (assert (edge (fromVid ?s) (toVid (+ ?s 1))))
      (bind ?s (+ ?s 1)))
   (assert (edge (fromVid ?s) (toVid ?os))))
CLIPS> (timer (loop-for-count (?i 45000) (assertSomeFacts ?i 20)) (run))
45.3337240219116
CLIPS> (timer (reset))
0.55456805229187
CLIPS> 

Ged Byrne

unread,
Jun 21, 2023, 6:09:32 AM6/21/23
to CLIPSESG
Thanks for taking the time to test it.  I was expecting it to be common knowledge that somebody could quickly share.

The slow-down must be due to the ClipsPy integration.

CLIPS Support

unread,
Jun 21, 2023, 2:07:22 PM6/21/23
to CLIPSESG
I wouldn't just assume it's ClipsPy. There could be something that's different about your data than the data I used. Try deleting the collectCycles function from your rules since that's specific to your python integration. If the problem still persists, then it's just two lines of code to save your rules and facts to files that can be loaded into CLIPS independent of ClipsPy. That way you're testing the exactly the same thing in the ClipsPy and standalone CLIPS environments. 

Ged Byrne

unread,
Jun 22, 2023, 3:41:03 AM6/22/23
to CLIPSESG
I've been able to replicate the problem in the CLIPS IDE with the batch file below.

I've written up this issue on my Wiki, which includes tables and charts that you might find useful: https://rheofy.com/dw/doku.php?id=retegraphprocessing:elements:prototyping:simple_algorithm:slowreset 

Please note that this is not a problem for me.  This exercise is all about finding the performance constraints for this approach.  Having found them I will work within them and easily avoid the issue.  I'm sharing the details here in the hope that you might find it useful for improving CLIPS.

(deftemplate vertex
    (slot vId (type INTEGER ) )
)
(deftemplate edge
    (slot fromVid (type INTEGER ) )
    (slot toVid (type INTEGER ) )
)
(deftemplate cycle
    (slot edges (type STRING ) )
)
(loop-for-count (?vId 1 1000) do
    (assert (vertex (vId ?vId)))
)

(defrule randomly-create-edges
    (vertex (vId ?vid1))
    (vertex (vId ?vid2))
    =>
    (if (= (random 1 1000) 1) then
        (assert (edge (fromVid ?vid1) (toVid ?vid2)))
    )
    (if (= (random 1 1000) 1) then
        (assert (edge (fromVid ?vid2) (toVid ?vid1)))
    )
)
( defrule FindSimpleCyclesLength20
    (edge (fromVid ?vId1 ) (toVid ?vId2&:(<> ?vId2 ?vId1 ) ) )
    (edge (fromVid ?vId2&:(< ?vId1 ?vId2 ) ) (toVid ?vId3&:(<> ?vId3 ?vId1 ?vId2 ) ) )
    (edge (fromVid ?vId3&:(< ?vId1 ?vId3 ) ) (toVid ?vId4&:(<> ?vId4 ?vId1 ?vId2 ?vId3 ) ) )
    (edge (fromVid ?vId4&:(< ?vId1 ?vId4 ) ) (toVid ?vId5&:(<> ?vId5 ?vId1 ?vId2 ?vId3 ?vId4 ) ) )
    (edge (fromVid ?vId5&:(< ?vId1 ?vId5 ) ) (toVid ?vId6&:(<> ?vId6 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ) ) )
    (edge (fromVid ?vId6&:(< ?vId1 ?vId6 ) ) (toVid ?vId7&:(<> ?vId7 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ) ) )
    (edge (fromVid ?vId7&:(< ?vId1 ?vId7 ) ) (toVid ?vId8&:(<> ?vId8 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ) ) )
    (edge (fromVid ?vId8&:(< ?vId1 ?vId8 ) ) (toVid ?vId9&:(<> ?vId9 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ) ) )
    (edge (fromVid ?vId9&:(< ?vId1 ?vId9 ) ) (toVid ?vId10&:(<> ?vId10 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ) ) )
    (edge (fromVid ?vId10&:(< ?vId1 ?vId10 ) ) (toVid ?vId11&:(<> ?vId11 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ) ) )
    (edge (fromVid ?vId11&:(< ?vId1 ?vId11 ) ) (toVid ?vId12&:(<> ?vId12 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ) ) )
    (edge (fromVid ?vId12&:(< ?vId1 ?vId12 ) ) (toVid ?vId13&:(<> ?vId13 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ) ) )
    (edge (fromVid ?vId13&:(< ?vId1 ?vId13 ) ) (toVid ?vId14&:(<> ?vId14 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ) ) )
    (edge (fromVid ?vId14&:(< ?vId1 ?vId14 ) ) (toVid ?vId15&:(<> ?vId15 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ) ) )
    (edge (fromVid ?vId15&:(< ?vId1 ?vId15 ) ) (toVid ?vId16&:(<> ?vId16 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ) ) )
    (edge (fromVid ?vId16&:(< ?vId1 ?vId16 ) ) (toVid ?vId17&:(<> ?vId17 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ) ) )
    (edge (fromVid ?vId17&:(< ?vId1 ?vId17 ) ) (toVid ?vId18&:(<> ?vId18 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ) ) )
    (edge (fromVid ?vId18&:(< ?vId1 ?vId18 ) ) (toVid ?vId19&:(<> ?vId19 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ?vId18 ) ) )
    (edge (fromVid ?vId19&:(< ?vId1 ?vId19 ) ) (toVid ?vId20&:(<> ?vId20 ?vId1 ?vId2 ?vId3 ?vId4 ?vId5 ?vId6 ?vId7 ?vId8 ?vId9 ?vId10 ?vId11 ?vId12 ?vId13 ?vId14 ?vId15 ?vId16 ?vId17 ?vId18 ?vId19 ) ) )
    (edge (fromVid ?vId20&:(< ?vId1 ?vId20 ) ) (toVid ?vId1 ) )
=>
    (assert ( cycle (edges (str-cat  ?vId1 "->" ?vId2 "->" ?vId3 "->" ?vId4 "->" ?vId5 "->" ?vId6 "->" ?vId7 "->" ?vId8 "->" ?vId9 "->" ?vId10 "->" ?vId11 "->" ?vId12 "->" ?vId13 "->" ?vId14 "->" ?vId15 "->" ?vId16 "->" ?vId17 "->" ?vId18 "->" ?vId19 "->" ?vId20 ) ) ) )
)
(timer (run))

(timer (reset))


denis.be...@gmail.com

unread,
Jun 22, 2023, 5:08:48 AM6/22/23
to CLIPSESG
As one who has been using CLIPS for almost 20 years, with thousands of complex rules and with some cases reaching millions of facts, I've been quite surprised by your last results (https://rheofy.com/dw/doku.php?id=retegraphprocessing:elements:prototyping:simple_algorithm:slowreset). 
First thing I checked is whether you used any "logicals", but no.

I then tried to partially reproduce your results and I got the following:
for max length 10:   0.20   0.04     3111
for max length 15:   0.96   0.37     5774
for max length 18:   3.11   0.43    13228
for max length 19:   5.68   0.79    20301
for max length 20: 12.54   3.53    35041

(1st number is runtime, 2nd number is reset time and 3rd number is number of facts - useful to understand what may be happening during reset.
Calculations done on a MacBooPro M1Max, apparently much faster than the machine you used. It might be useful to describe your machine and how much RAM it has.)

The reset ratio I find between 10 and 20 is only 100 (instead of the almost 10,000 you are reporting).
Note that I'm using only CLIPS core but I don't see why the CLIPS interface would change much in such matters.
I understand the 100 ratio as being proportional to the number of internal nodes to be cleaned, which grows faster than the number of facts (factor ~10 here).

I'm curious to see if anyone has different results.

Ged Byrne

unread,
Jun 22, 2023, 5:40:06 AM6/22/23
to CLIPSESG
Hi Denis,

Thanks for trying this on your machine.  Is there an easy way to get a fact count?  The only way I know at the moment is to open the Fact Browser and look at the IDs.

The number of facts doesn't appear to be the key factor here.  I believe that it's the the complexity of the rule that matters.

I'm running a ThinkPad
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz   1.99 GHz
Memory: 16.0 GB

It's not surprising that you M1 MacBookPro is doing better, but at 20 you still seem to be hitting the start of the inflection point for the Reset.  Run doubles but reset quadruples.  I'm confident that if you extend to 21 and 22 you will see the same hockey stick.

In my experience this type of behavior is usually due to hitting some kind of bottleneck because a resource is being exhausted.  It's often something that can be easily resolved once the actual constraint has been identified.

Regards, 


Ged

denis.be...@gmail.com

unread,
Jun 22, 2023, 11:01:57 AM6/22/23
to CLIPSESG
Hi Ged,
The last line of command (facts) after (run) will give you the facts count.

What matters IMO is the number of nodes in the Rete network, because they are what (reset) has to clean.
As I understand your rules, each  of them adds a new condition to the previous one, so that many inner nodes are shared and the number of nodes increases linearly with the number of rules. But the number of partial matches in each node increases close to exponentially. This might explain the behaviour of the (reset) - though I don't see why the (reset) would take time proportional to the content of the nodes..

I also thought about some bottleneck; I thought it was about reaching the limit of your RAM, but with rule20, we need only 9 GB.

PS1: For clarity, I think you should:
-  give rule randomly-create-edges a priority higher than the other rules;
- split each rule into 2 parts: one that extends an open path (which will better show what is shared between rules) and a short one that closes an open path.

PS2: I completed the list and it seems you're right about the inflection point:
for max length 10:   0.20    0.04      3111
for max length 15:   0.96    0.37      5774
for max length 18:   3.11    0.43      13228
for max length 19:   5.68    0.79      20301
for max length 20: 12.54    3.53      35041
for max length 21: 18.61 2.25  56430
for max length 22: 69.82 28.26  186249

Regards
Denis

Ged Byrne

unread,
Jun 22, 2023, 12:53:39 PM6/22/23
to clip...@googlegroups.com
Hi Denis,

Thanks, this is really helpful information. 

The approach is designed to push the RETE graph to it’s limit, but I wasn’t expecting the reset to be the breaking point. 

Regards,


Ged
--
You received this message because you are subscribed to the Google Groups "CLIPSESG" group.
To post to this group, send email to CLIP...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/CLIPSESG?hl=en
 
--> IF YOU NO LONGER WANT TO RECEIVE EMAIL <--
Visit this group at http://groups.google.com/group/CLIPSESG?hl=en
Click on "Edit my membership" link.
Select the "No Email" radio button.
Click the "Save these settings" button.

--> IF YOU WANT TO UNSUBSCRIBE <--
Visit this group at http://groups.google.com/group/CLIPSESG?hl=en
Sign in
Click on "Edit my membership" link.
Click the "Unsubscribe" button.
Note: This appears to be the most reliable way to unsubscribe
 
Alternately, send email to CLIPSESG-u...@googlegroups.com. You will receive an email which you must respond to as well to unsubscribe. Clicking the link mentioned in the unsubscribe reply does not appear to work reliably.
---
You received this message because you are subscribed to a topic in the Google Groups "CLIPSESG" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clipsesg/RsUUog8YoDs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clipsesg+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clipsesg/929e1b00-23b9-433e-82dd-5ad90f54fb31n%40googlegroups.com.

Joshua Scoggins

unread,
Jun 22, 2023, 3:00:19 PM6/22/23
to CLIPSESG
So I took the second example and did some statistical analysis the amount of ram consumed was not 9G but 18.8G on my linux box (which has 128G of ram). I think you're hitting swap is why reset is going crazy.

I got the following statistics:
run time: 74.4812948703766
reset time: 5.98521614074707
memory used (before run): 200.71836566925
memory used (after run and before reset): 16412.0006160736
memory used (after reset): 16319.6967010498


The memory used is in megabytes. There is also 1 million activation records placed on the agenda on startup before we call run (hence the 200 megabytes of memory allocated).

I agree with Dennis completely that rules need to be split apart. However, declaring the salience actually slows down things (as I've observed when using CLIPS to do elf section tracking and other things). Here is the statistics when I add (declare (salience 1)) to randomly-create-edges:

run time: 77.1641499996185
reset time: 6.27142000198364
memory used (before run): 200.718399047852
memory used (after run and before reset): 16412.0006570816
memory used (after reset): 16319.6967582703


I would suggest adding fields to the edge which you can precompute if the edge is valid (<> v1 v2). You can use the following structure to do it:

(slot valid
         (type SYMBOL)
         (allowed-symbols UNKNOWN FALSE TRUE))

This will cause the valid field to default to UNKNOWN and you can match against it to do the computation like so:

(defrule edge-is-valid
?f <- (edge (fromVid ?v1) (toVid ?v2) (valid UNKNOWN))
=>
(modify ?f (valid (<> ?v1 ?v2))))

That way, you can use it as a gate and also a way to drop out _bad_ edges.


Sorry I can't provide more concrete information.

--Josh

Ged Byrne

unread,
Jun 22, 2023, 5:16:16 PM6/22/23
to clip...@googlegroups.com
Hi Joshua,

Thank-you, this is really helpful.  

What confuses me is that memory after reset: 

run time: 74.4812948703766 
reset time: 5.98521614074707 
memory used (before run): 200.71836566925 
memory used (after run and before reset): 16412.0006160736 
memory used (after reset): 16319.6967010498

Shouldn’t that memory be released?  Is this like Java once heap has been allocated it is kept and reused rather than released?

Also, is there a way for CLIPS to tell you how much memory is used or are you using a profiler?

Regards,



Ged

Joshua Scoggins

unread,
Jun 22, 2023, 5:43:39 PM6/22/23
to CLIPSESG
Glad to help!

Actually CLIPS does not release all its memory on reset, you have to use (release-mem) for that. But that has other implications around pretty printing.

I've been using the mem-used builtin. It returns the number of bytes being used so  I do (/ (mem-used) 1024 1024) to get megabytes.

-Josh

Joshua Scoggins

unread,
Jun 22, 2023, 5:45:55 PM6/22/23
to CLIPSESG
Also! Ignore my statement about salience being slower. I didn't realize that the random number being generated could potentially be different each time you startup clips. I was sometimes getting allocations of like 24G and other times 9G! So in my subsequent testing I did (seed 0) to make sure it always is consistent.

denis.be...@gmail.com

unread,
Jun 23, 2023, 12:56:32 AM6/23/23
to CLIPSESG

Here's a version that seems to be faster on all points (but I checked only a few cases). There's some redundancy in some template slots, but it's convenient.

Rule randomly-create-edges could probably be replaced by some extension of function init.


(defglobal ?*max-length* = 20)

(defglobal ?*cycle-count* = 0)


(deftemplate vertex

    (slot vId (type INTEGER))

)

(deftemplate edge

    (slot fromVid (type INTEGER))

    (slot toVid (type INTEGER))

)

(deftemplate loopless-open-path

    (slot length (type INTEGER)) ; = number of edges = number of vertices - 1

    (slot first-vertex (type INTEGER)) ; should always be the smallest of all the vertices

    (multislot vertices) ; contains both first-vertex and last-vertex; all vertices should be different

    (slot last-vertex (type INTEGER))

)

(deftemplate cycle

    ;;; closed path with no inner loops

    (slot length (type INTEGER)) ; = number of edges = number of different vertices

    (multislot edges)

)



(deffunction print-cycle (?l $?vertices)

    ;;; ?l = number of edges = number of different vertices

    (printout t "cycle[" ?l "]: ")

    (loop-for-count (?i 1 ?l) do

        (printout t (nth$ ?i ?vertices) "->")

    )

    (printout t (nth$ 1 ?vertices))

    (printout t crlf)

)



(deffunction init(?n)

    ;;; create ?n vertices

    (bind ?*cycle-count* 0)

    (loop-for-count (?vId 1 ?n) do

        (assert (vertex (vId ?vId)))

    )

)



(defrule randomly-create-edges

    (declare (salience 20))

     (vertex (vId ?vid1))

     (vertex (vId ?vid2))

=>

    (if (= (random 1 1000) 1) then

        (assert (edge (fromVid ?vid1) (toVid ?vid2)))

    )

    (if (= (random 1 1000) 1) then

        (assert (edge (fromVid ?vid2) (toVid ?vid1)))

    )

)



(defrule FindSimpleCycles

    (declare (salience 10))

    (loopless-open-path (length ?l) (first-vertex ?vId1) (vertices $?vert) (last-vertex ?last))

    (edge (fromVid ?last) (toVid ?vId1))

=>

    (assert (cycle (length (+ ?l 1)) (edges $?vert ?vId1)))

    (bind ?*cycle-count* (+ ?*cycle-count* 1))

    ; (print-cycle (+ ?l 1) ?vert)


)



( defrule loopless-path-2

    (edge (fromVid ?vId1 ) (toVid ?vId2&:(<> ?vId2 ?vId1)))

    (edge (fromVid ?vId2&:(< ?vId1 ?vId2)) (toVid ?vId3&:(<> ?vId3 ?vId1 ?vId2)))

=>

    (assert (loopless-open-path (length 2) (first-vertex ?vId1) (vertices ?vId1 ?vId2 ?vId3) (last-vertex ?vId3) ))

)



(defrule extend-loopless-open-path

    (loopless-open-path (length ?l&:(< ?l ?*max-length*)) (first-vertex ?vId1) (vertices $?vert) (last-vertex ?last))

    (edge (fromVid ?last) (toVid ?newVid&:(< ?newVid ?vId1)&:(not (member$ ?newVid $?vert))))

=>

    (assert (loopless-open-path (length (+ ?l 1)) (first-vertex ?vId1) (vertices $?vert ?newVid) (last-vertex ?newVid)))

)



;;; tests


(reset)

(release-mem)

(init 1000)


(timer (run))

?*cycle-count*


(timer (reset))

CLIPS Support

unread,
Jun 23, 2023, 3:44:51 PM6/23/23
to CLIPSESG
If you want to count the facts without having to display them, use either the get-fact-list or find-all-facts function to get a list of facts in a multifield value that can be passed to the length$ function:

(length$ (get-fact-list))
(length$ (find-all-facts ((?f cycle)) TRUE))

CLIPS Support

unread,
Jun 23, 2023, 4:01:02 PM6/23/23
to CLIPSESG
You can also use (mem-requests) to determine the total number of calls that have been made to the operating system and not yet returned. Presumably there's going to be some additional memory allocated by the operating system to keep track of malloc/free calls so mem-used will only reflect the total amount of memory requested by CLIPS and will not include any system overhead. 

CLIPS Support

unread,
Jun 24, 2023, 3:56:20 PM6/24/23
to CLIPSESG
Swapping seems like a plausible explanation. I profiled the code and the majority of the reset is taken up by the code unlinking the partial matches as they're being removed by the retraction of facts during the reset. I have 16 GB on my machine and I see the reset taking more time than the actual run. The reset works in the most straightforward way by explicitly retracting the facts, but knowing that the partial matches are all (mostly) going to be removed it might be more efficient to have a specialized routine clear out the partial matches from the rete network rather than using retract to do that work.

98.4% Reset
98.4%   ResetFacts
98.4%     RetractAllFacts
98.4%       Retract
98.4%         RetractDriver
98.3%           NetworkRetract
98.3%             PosEntryRetractAlpha
87.3%               PosEntryRetractBeta
74.8%                 UnlinkNonLeftLineage
10.5%               UnlinkBetaPMFromNodeAndLineage

I've attached files for running the program with a fixed set of facts if anyone has interest in comparing results on different platforms/configurations.

edge.clp
edge.fct
edge.tst

denis.be...@gmail.com

unread,
Jun 25, 2023, 12:20:10 AM6/25/23
to CLIPSESG
Timings on MacBookPro M1Max 64GB, run with a freshly launched version of CLIPS-core (compiled with "make")

CLIPS> (timer (clear))

4.91142272949219e-05


CLIPS> (load "edge.clp")

CLIPS> (timer (load-facts "edge.fct"))

31.1876230239868

CLIPS> (timer (run))

0.331338167190552

CLIPS> (timer (reset))

3.35309410095215


Notes: 

1) Unsurprisingly, most of the time is spent in "load-facts" (i.e. when propagation occurs in the Rete)

2) If repeating the same, there are some small random fluctuations

3) Changing between 64.x and 63.x, or using CLIPS-IDE instead of CLIPS-core, fluctuations remain in the same ranges.


;;; checks before reset:

(length$ (get-fact-list))

65182

CLIPS> (length$ (find-all-facts ((?f vertex)) TRUE))

1000

CLIPS> (length$ (find-all-facts ((?f edge)) TRUE))

2009

CLIPS> (length$ (find-all-facts ((?f cycle)) TRUE))

62172



About the idea of making "reset" faster, I think it would be occasionally positive (at least for me): when launching calculations on long series of cases, I need to do a "reset" after each case. Most cases don't raise any reset problem (maybe because they don't need any swapping). But, as a few rare cases create millions of facts, it might be useful for them. I'll check this  later in the week.

Ged Byrne

unread,
Jun 27, 2023, 6:08:50 AM6/27/23
to CLIPSESG
Hi Denis,

I developed my own approach to path extension that looks to be equivalent to your code.

The findings were that while this implementation performed better for small cycle sizes the use of multiple rules sets scales significantly better.


Regards, 



Ged

Reply all
Reply to author
Forward
0 new messages