Logic Module

Soumya Biswas

unread,

Jan 24, 2014, 2:35:18 AM1/24/14

to sy...@googlegroups.com

Hello

There are a couple of things present in the Logic module that I would like to discuss:

1. Ask Function in PropKB:
If KB 'A' entails formula 'B' then A U {~B} is unsatisfiable. However the current code seems to check if A U {B} is satisfiable. To the best of my understanding, this will give the right answer in many case, but not all.

2. Ask Function on a PropKB with no clauses:
The function returns False if there are no clauses. Theoretically, an empty clause is unsatisfiable but empty clause set is valid. Hence, it is satisfiable in all interpretations. So it only entails formulas that are satisfiable in all interpretation i.e. valid. So shouldn't an empty KB entail a tautology?

3. DNF Satisfiability:
As discussed before, the time taken by to_cnf is actually more than the time taken by the SAT solver. A relatively quick and dirty way to decide whether the formula resembles CNF or DNF can be to compare the number of conjuncts and disjuncts present in the formula. This seems to be giving reasonable results. So, I want to change the code to do this: If more conjunctive then dpll, else dnf sat. DNF Satisfiability seems to be a linear problem and would be easier to compute. Is this okay?

I am done with the code for the above issues and will send a PR once someone confirms my ideas.

Sachin Joglekar

unread,

Jan 24, 2014, 6:41:47 AM1/24/14

to sy...@googlegroups.com

I agree on point 2. An empty set of clauses WILL entail a tautology. So I guess you could just simplify the input and if it comes out to be True, return True, else False.

About point 3, what is the time-complexity of the algorithm that you propose? Also note how the 'disjuncts' function(for eg) works-

>>> f = And(a, Or(b, c), Not(Or(a, b)))

>>> disjuncts(f)

frozenset([And(Not(a), Not(b), Or(b, c), a)])

>>> disjuncts(to_dnf(f))

frozenset([And(Not(a), Not(b), a, b), And(Not(a), Not(b), a, c)])

You may want to do an amortized analysis of your implementation and check, because if I get your point right, you would probably want the second output in the case shown above (for which you will have to do the heavy to_dnf anyways).

About point 1, I do agree A U {B} being satisfiable does not mean entailment, since there may be models of A not satisfying B. @asmeurer, am I missing something here?

Aaron Meurer

unread,

Jan 24, 2014, 7:46:59 PM1/24/14

to sy...@googlegroups.com, Christian Muise

On Fri, Jan 24, 2014 at 1:35 AM, Soumya Biswas <sdb...@gmail.com> wrote:
> Hello
>
> There are a couple of things present in the Logic module that I would like
> to discuss:
>
> 1. Ask Function in PropKB:
> If KB 'A' entails formula 'B' then A U {~B} is unsatisfiable. However the
> current code seems to check if A U {B} is satisfiable. To the best of my
> understanding, this will give the right answer in many case, but not all.

Can you point to the specific part of the code that you're referring to?

Anyway, I'm not sure what PropKB is supposed to do. A simple grep
shows that it's not used anywhere. I'm guessing it's a leftover from
before the dpll algorithm was implemented, and should be removed if
that's the case. The function you should focus on is satisfiable(),
and the algorithms it calls.

Aaron Meurer

>
> 2. Ask Function on a PropKB with no clauses:
> The function returns False if there are no clauses. Theoretically, an empty
> clause is unsatisfiable but empty clause set is valid. Hence, it is
> satisfiable in all interpretations. So it only entails formulas that are
> satisfiable in all interpretation i.e. valid. So shouldn't an empty KB
> entail a tautology?
>
> 3. DNF Satisfiability:
> As discussed before, the time taken by to_cnf is actually more than the time
> taken by the SAT solver. A relatively quick and dirty way to decide whether
> the formula resembles CNF or DNF can be to compare the number of conjuncts
> and disjuncts present in the formula. This seems to be giving reasonable
> results. So, I want to change the code to do this: If more conjunctive then
> dpll, else dnf sat. DNF Satisfiability seems to be a linear problem and
> would be easier to compute. Is this okay?
>
> I am done with the code for the above issues and will send a PR once someone
> confirms my ideas.
>

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> For more options, visit https://groups.google.com/groups/opt_out.

Sachin Joglekar

unread,

Jan 24, 2014, 10:22:54 PM1/24/14

to sy...@googlegroups.com

@Aaron, I don't think getting rid of PropKB is a good idea. But what IS needed is to make sure its foolproof. A PropKB(propositional knowledge base) is essentially a tool to check whether a set of logic expressions 'A' _entail_ a certain expression 'B'. Essentially, whether ALL models of A are models of B. This is a nice tool to use for deductions. Currently, it calls the dpll algo, but I guess wrongly. So that should be fixed.

About satisfiable(), and especially for the assumptions system, may I suggest looking at other, lighter SAT solvers like MiniSAT? We may have to confirm the soundness and completeness, but thats still an avenue to explore.

@Soumya, I would suggest you send a PR with what modifications you propose, and we take it from there.

On Friday, January 24, 2014 1:05:18 PM UTC+5:30, Soumya Biswas wrote:

Aaron Meurer

unread,

Jan 24, 2014, 11:38:39 PM1/24/14

to sy...@googlegroups.com

Ondrej has a branch that uses pycosat in SymPy. But I think we found
for the problems I've tried so far in my newassump branch it didn't
really make a difference. That could change if they get larger,
though.

Aaron Meurer

Soumya Biswas

unread,

Jan 28, 2014, 2:40:20 PM1/28/14

to sy...@googlegroups.com

I worked on improving the way the formulas are converted to CNF. Using Tseitin-Transformation the conversion to CNF can be optimized (a lot). The summary for the test is as follows:

Atoms: 20, Clauses: 100 (Using a propositional statement generator), a = average number of arguments to And/Or

for a=2: Both algorithms producing almost similar results. Normal producing better results with DPLL2 while Tseitin producing better results with DPLL

for a=3: Tseitin producing much better result than normal (Both algorithms)

for a=4 and above: Tseitin showing remarkable improvement over normal

A sample test:

No. of formulas = 10, a=3

Tseitin: 1,982,682 uS

Normal: 6,989,382 uS

Will post the detailed test results once I finish testing more formally

SD
Soumya Dipta Biswas
BITS, Pilani - K. K. Birla Goa Campus
Contact: +918087953906
E-Mail: sdb...@gmail.com

You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/wknFPGmTw0w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.

Aaron Meurer

unread,

Jan 28, 2014, 8:32:33 PM1/28/14

to sy...@googlegroups.com

You have to be careful, though. Apparently Tseitin formulas can slow
down the SAT solver, due to the auxiliary variables that are
introduced. See
https://code.google.com/p/sympy/issues/detail?id=4075#c2.

Aaron Meurer

Sachin Joglekar

unread,

Jan 29, 2014, 1:41:15 AM1/29/14

to sy...@googlegroups.com

I guess Aaron is right. But looking at issue 4075, I think our first focus SHOULD be to improve the 'to_cnf' function in the logic module. @Soumya, could you send a WIP PR to let us see how you are going about it? As Aaron said, the auxiliary variables may create problems, so we need to be sure about whats happening both before and after the CNF conversion.

Soumya Biswas

unread,

Jan 29, 2014, 6:11:02 AM1/29/14

to sy...@googlegroups.com

It seems somehow I had missed this conversation (on the issues page). It is definitely true that this transformation adds many auxiliary variables (as many as the number of non-atomic clauses). However what I am working on is to optimize this encoding to such an extent that the encoding plus satisfiability time is reduced overall. Firstly (as observed on the issues thread) handling duplicate subexpressions optimizes the process to some extent. Additionally, I worked out a way (I don't know if it is an obvious generalization or something new) such that And(a, b, c, ...) can be treated as only one operator (though the number of clauses this encoding will produce will still increase linearly with the number of arguments). This is what makes the method much faster for cases where the average number of arguments to And/Or is around 3 (or more). I find the point made in the issues page to be quite interesting that many of the encodings are quite redundant. Will look into optimization from this approach. There is yet another polarity based optimization, that I am yet to look into. I think we will have to look at encoding plus solving as the unit of comparison i.e. is the entire process faster or slower.

Sachin Joglekar

unread,

Jan 30, 2014, 8:29:37 AM1/30/14

to sy...@googlegroups.com

This may seem like an ad-hoc way of doing things, but can we think of using DNF for satisfiability of formulae with small number of atoms? Since DNF generally causes an extremely dangerous explosion in the size of an expression, its use to check satisfiability is usually frowned upon by the logic community. However, checking the SAT of a DNF is trivial. Therefore, since _we_ can check for the number of atoms used in a given expression, can we just use the DNF method when the number of atoms falls below a certain threshold (say 5 or 6)?

@asmeurer, what do you think?

On Wed, Jan 29, 2014 at 4:41 PM, Soumya Biswas <sdb...@gmail.com> wrote:

It seems somehow I had missed this conversation (on the issues page). It is definitely true that this transformation adds many auxiliary variables (as many as the number of non-atomic clauses). However what I am working on is to optimize this encoding to such an extent that the encoding plus satisfiability time is reduced overall. Firstly (as observed on the issues thread) handling duplicate subexpressions optimizes the process to some extent. Additionally, I worked out a way (I don't know if it is an obvious generalization or something new) such that And(a, b, c, ...) can be treated as only one operator (though the number of clauses this encoding will produce will still increase linearly with the number of arguments). This is what makes the method much faster for cases where the average number of arguments to And/Or is around 3 (or more). I find the point made in the issues page to be quite interesting that many of the encodings are quite redundant. Will look into optimization from this approach. There is yet another polarity based optimization, that I am yet to look into. I think we will have to look at encoding plus solving as the unit of comparison i.e. is the entire process faster or slower.

--

Soumya Biswas

unread,

Jan 30, 2014, 1:57:41 PM1/30/14

to sy...@googlegroups.com

Using Tseitin for small formulas with few arguments is an overkill. In such cases, switching to some other method sounds like a good idea. Will work on it and post the test results for comparison.

Aaron Meurer

unread,

Jan 30, 2014, 8:04:33 PM1/30/14

to sy...@googlegroups.com

I would time the various ways. Unless someone really understands the
theory of DPLL well to know what will and won't be fast, I think this
is the only way we can know what tricks to use when.

In general, though, if something is only faster for very small inputs,
it's not really worth doing, because it's already not slow the way it
is. It only serves to add unnecessary complexity.

By the way, a good GSoC idea for someone would be to set up a good
benchmarking system for us. We are good at avoiding behavioral
regressions with testing and Travis, but performance regressions are
tougher to guard against, because we don't check it very often.

Aaron Meurer

> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Christian Muise

unread,

Jan 31, 2014, 2:07:09 AM1/31/14

to sy...@googlegroups.com

If the formulae are small, then the approach there is just fine. If the formulae is huge but the number of variables small, then just enumerating the models should be done. If neither of those cases hold, then the best you could hope for is to try and detect "how far" the input is from either CNF or DNF (note -- a way to predict one could be used to predict the other). With that in hand, you can do the following:

- if easy to convert to DNF, do so (at least partially) until at least one term is produced

- else if easy to convert to CNF, do so and run the SAT procedure

- else convert to CNF with tseitin and run the SAT procedure

Cheers,

Christian*

*: Can be considered a pseudo-expert on the subject, as I've done graduate work in the area and had a hand in implementing the current DPLL solver in sympy.

Sachin Joglekar

unread,

Jan 31, 2014, 12:44:39 PM1/31/14

to sy...@googlegroups.com

Agree with Christian. However, we would need a very good heuristic to calculate how 'far' a formula is from CNF or DNF. Frankly, if the number of variables is small, I don't think any method would be that bad - like for even 6-7 variables, the number of models would be at max 64-128 - a decent number to iterate over directly.
But as Aaron suggests, maybe we don't want to add too much of complexity? In that direction, I do think we should deprecate dpll1. dpll2, with clause learning involved, is certainly better.

Aaron Meurer

unread,

Jan 31, 2014, 7:22:13 PM1/31/14

to sy...@googlegroups.com

I disagree about the symmetry. Consider the case if you have Or(big
expression 1, big expression 2). Instead of trying to convert the
whole thing into CNF or DNF, you can recursively apply the algorithm.
First just try satisfiable(big expression 1). If it's satisfiable,
that's your model. If not, then apply satisfiable(big expression 2).

Superficially, it doesn't matter what the args are to do this. In
reality it might be more efficient to convert the whole thing to CNF
instead of doing this, or applying some other heuristic. So it would
need to be balanced.

But this again is something I am not sure about. Would it be a bad
idea to unconditionally have something like

def satisfiable(expr):
if isinstance(expr, Or):
return any(satisfiable(i) for i in expr.args)
else:
return dpll2(to_cnf(expr)) # or whatever

Aaron Meurer

On Fri, Jan 31, 2014 at 11:44 AM, Sachin Joglekar

Christian Muise

unread,

Feb 1, 2014, 7:25:34 PM2/1/14

to sy...@googlegroups.com

Two big CNF expressions with an OR tying them together is just a special case I reckon. We'll start to face feature creep with all of the different types of input that might be used -- ultimately what is needed is an assessment of the types of input the satisfiable function actually sees. There will /always/ be some other corner case that you can solve better, but it's not worth the effort unless that corner case is commonly used.

Cheers,

Christian

Aaron Meurer

unread,

Feb 2, 2014, 1:44:04 AM2/2/14

to sy...@googlegroups.com

With my work with the assumptions, I am able to keep it as CNF, or
nearly CNF, although in some cases, I have to be careful how I do
things. For example, there are simple CNF and DNF forms for "exactly
one of x1, ..., xn is true". You need to use the DNF one if it is the
first part of an implication and the CNF form if it is the second
(because p >> q == q | ~p). See
https://github.com/asmeurer/sympy/blob/newassump/sympy/assumptions/newhandlers.py#L163.

There expression passed to satisfiable is basically a large And,
mostly of implications. They are often something like x | And(...)
(i.e., x >> And(...) or some similar), meaning it is necessary to do a
very small distribution. So far, this hasn't seemed to be an issue.

One thing I am not sure how to do is "an even number of x1, ..., xn is
true" (n-ary XOR). CNF or DNF of this is exponential. But
unfortunately some facts are like this (e.g., if you know that each of
x1, ..., xn is positive or negative, then x1*...*xn is positive iff an
even number are negative). My best thought for this is to just put a
limit in the code for how large it tries to get.

For now, though, I think it would be great if I basically didn't have
to worry about using the correct form of a fact to get good
performance. That will make it a lot easier for people to extend the
system in the future, without having to have a deep knowledge of
logic.

Aaron Meurer

Soumya Biswas

unread,

Feb 23, 2014, 11:19:18 AM2/23/14

to sy...@googlegroups.com

In many (most???) cases the (un)satisfiability is of greater importance than the satisfying expression (eg. validity, entailment). For all such operations switching to a Semantic Tableaux seems to be giving results MUCH faster than the previous method. Still haven't run comparison with Tseitin, but this method, theoretically and practically should give better results. Ofcourse, to find a model one would still need the SAT solver.

Secondly, pl_true(a >> (b >> a), {}) would return None. Even though expr is a tautology. The fact that pl_true cannot do the same is mentioned in the comments. Using some extra time (proportional to the number of clauses), we can return the actual result not just for tautologies but for all expressions under partial interpretation. Is the correct answer worth the extra time? Ofcourse we can modify the function into pl_true(expr, deep=False) and do the above computation only if the user wants the exact result.

Sachin Joglekar

unread,

Feb 23, 2014, 11:26:09 AM2/23/14

to sy...@googlegroups.com

Adding a 'deep' parameter to pl_true seems a better idea. We wouldn't want to break backwards compatibility for users who may have written code assuming the former input/output combos of pl_true. But yes, the modification you suggest to pl_true does seem a good idea to me.
About the idea of using a Semantic Tableau, we should probably stick to using a traditional SAT solver for the logic module (Since users looking to use a SAT solver usually need it to return a model, or atleast expect it to). However, what we _can_ do is use the tableau method for the Assumptions system (maybe) - or anywhere else in the core, where a model is not essential.
@asmeurer, do you think we should implement that?
@Soumya, you could probably run a few tests and check out the timing results. But before that, you may want to get your current pipeline-PRs merged. I guess there are 2-3 of them?

Aaron Meurer

unread,

Feb 23, 2014, 11:47:32 AM2/23/14

to sy...@googlegroups.com

On Feb 23, 2014, at 10:26 AM, Sachin Joglekar <srjogl...@gmail.com> wrote:

Adding a 'deep' parameter to pl_true seems a better idea. We wouldn't want to break backwards compatibility for users who may have written code assuming the former input/output combos of pl_true. But yes, the modification you suggest to pl_true does seem a good idea to me.
About the idea of using a Semantic Tableau, we should probably stick to using a traditional SAT solver for the logic module (Since users looking to use a SAT solver usually need it to return a model, or atleast expect it to). However, what we _can_ do is use the tableau method for the Assumptions system (maybe) - or anywhere else in the core, where a model is not essential.
@asmeurer, do you think we should implement that?

I don't know a ought about it to say. I can confirm that in the assumptions, the model is not important. Also, satisfiable is called twice in the assumptions system, once to see if the expression can be true and once to see if it can be false. In the cases where it returns an answer (not None), one of the two is unsatisfiable.

Do semantic tableaux give proofs of unsatisfiability? That could be useful to the user in some cases, even in the assumptions, to answer 'why' (the model is also important for this).

Aaron Meurer

@Soumya, you could probably run a few tests and check out the timing results. But before that, you may want to get your current pipeline-PRs merged. I guess there are 2-3 of them?

--

Sachin Joglekar

unread,

Feb 23, 2014, 12:00:06 PM2/23/14

to sy...@googlegroups.com

In case of a Sematic Tableau, an open branch (not containing conflicting literals) is a proof of satisfiability. In other words, if all branches close, the formula is _definitely_ unsatisfiable. So checking for validity would mean checking for the unsatisfiability of the negation.

If assumptions don't require models in the general case, then this method makes sense. And since its a DFS over the semantic tree, the overheads may not be much (or we can try out an iterative version too). But if we DO require models, then a traditional SAT solver is the way to go.

--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/wknFPGmTw0w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.

Soumya Biswas

unread,

Feb 23, 2014, 2:21:44 PM2/23/14

to sy...@googlegroups.com

Also, satisfiable is called twice in the assumptions system, once to see if the expression can be true and once to see if it can be false.

I believe this can be done with only one call to satisfiability. Put the expression into pl_true with any random interpretation. for eg. pl_true(expr, {a1: True, a2: True, ... an:True}). Assume this gives the value True, then check if the expr is Valid, if yes then you know it can't be false, if no then the expr can be both true and false (under different interpretations ofcourse). If pl_true gives False check if the formula is unsatisfiable, if yes then it can't be true, if no then it can be both true and false. This reduces the 2 SAT calls to 1 SAT call and 1 pl_true call.

Reply all

Reply to author

Forward