Consider two (large) sets of rules R1 and R2, with R1 included in R2, designed to solve the same class of problems.
Both sets are error-free with extremely high probability: they have been tested on millions of cases for the 1st and on 10,000s for the largest one, and the problem is such that any error is detected after only a few instances are tested.
Here is what I observe in very rare instances of the problem (say 1 in 10,000): the problem is solved by R1 but not by R2.
In and of itself, this shouldn't be too much of a surprise: some of the new rules might fire before the old ones, preventing the old ones from firing later. This is a standard problem of non-commutativity in the set of rules.
However, what I observe after a deeper study of the situation is different and much more diabolical. Suppose I run the two sets of rules in different instances of CLIPS, for the same problem instance. After some common rules fire, a state is reached where all the facts are the same (easy to check). However, at this point, a rule of R1 that fires in the first version doesn't fire in the second (and it should because it has higher salience than the rule that fires next).
So, I wonder if there is any limit on the number of rule instantiations in the agenda(s) and what happens when some limit is reached.
PS: this happens in both 6.3 and 6.4, last release.