First Order Logic - GSoC Proposal

Soumya Biswas

unread,

Mar 11, 2014, 6:24:02 PM3/11/14

to sy...@googlegroups.com

I have almost completed my GSoC proposal on extending the propositional knowledge base and creating a new module for first order logic. However there are a couple of doubts that I would like to clear before putting up the proposal online.

Will first order logic be integrated with its propositional counterpart or exist independently? Going for integration will make the code messy as appropriate checks will have to be made at EVERY point. On the other hand, independent existence will introduce a good amount of redundancy.
Many concepts from first order logic are implemented in the assumption system (for e.g. predicates). However these implementations seem (I am not sure if they are) to be quite specific in nature. Should I try to make changes to the pre-existing code and build the rest of the codebase around it? Right now my proposal involves building a generic first order logic module from scratch. Additionally, once the logic module is mature enough to handle the requirements of the assumption system, would the new assumption use the new sympy.logic or continue to use the code in sympy.core / sympy.assumptions (which seems to fulfill the need except for efficiency parameters).
One of my ideas is to create a faster SAT solver. Picosat and minisat appear to be good candidates. As suggested by @asmeurer we could always use something on the lines of pycosat (without making it a hard dependency) but the native solver should still be as fast as possible. Currently I am inclined towards implementing minisat (as pycosat can be used for picosat). Some suggestion for an appropriate solver would be helpful. Is it mandatory for me to declare the solver I am going to use in my proposal? I believe this decision can only be taken after actual implementation and thorough benchmarking.
Considering that the assumption system is supposed to use the logic module heavily, some indication about the functions required (by the assumption system from the logic module) would be an enormous help.
If my proposal were to get selected, who would be the potential mentor? A.K.A who am I supposed to bug for the next couple of days for suggestions and improvements. There are some implementation issues that I really want to discuss and put in the final proposal.

Thanks in advance for the help.

Sachin Joglekar

unread,

Mar 13, 2014, 12:48:27 PM3/13/14

to sy...@googlegroups.com

On Wednesday, March 12, 2014 3:54:02 AM UTC+5:30, Soumya Biswas wrote:

I have almost completed my GSoC proposal on extending the propositional knowledge base and creating a new module for first order logic. However there are a couple of doubts that I would like to clear before putting up the proposal online.

Will first order logic be integrated with its propositional counterpart or exist independently? Going for integration will make the code messy as appropriate checks will have to be made at EVERY point. On the other hand, independent existence will introduce a good amount of redundancy.

Ideally, we should have the FOL core running on the propositional logic framework. Like in a prenex normal form, the relevant terms can be replaced with dummy Symbols and then solved using a propositional KB. Things like cnf conversion are better kept just in one module, and reused - though it _may_ mean a little complex code. Having the same functionality in different forms in different places introduces redundancy - and also the problem of changing code twice, if an enhancement has to be added.

Many concepts from first order logic are implemented in the assumption system (for e.g. predicates). However these implementations seem (I am not sure if they are) to be quite specific in nature. Should I try to make changes to the pre-existing code and build the rest of the codebase around it? Right now my proposal involves building a generic first order logic module from scratch. Additionally, once the logic module is mature enough to handle the requirements of the assumption system, would the new assumption use the new sympy.logic or continue to use the code in sympy.core / sympy.assumptions (which seems to fulfill the need except for efficiency parameters).

I would suggest you add your code to sympy.logic module, and then later (after GSoC) try to update the assumptions system with the functionality. Trying to do it the other way round may make the overall system quite inconsistent and messy in the intermediate stages of your work - it may prove difficult to come out of it (my thoughts).

One of my ideas is to create a faster SAT solver. Picosat and minisat appear to be good candidates. As suggested by @asmeurer we could always use something on the lines of pycosat (without making it a hard dependency) but the native solver should still be as fast as possible. Currently I am inclined towards implementing minisat (as pycosat can be used for picosat). Some suggestion for an appropriate solver would be helpful. Is it mandatory for me to declare the solver I am going to use in my proposal? I believe this decision can only be taken after actual implementation and thorough benchmarking.

+1 to that. You can keep both the solvers in the module, and use the 'faster' one for the default SAT solving in sympy.

Considering that the assumption system is supposed to use the logic module heavily, some indication about the functions required (by the assumption system from the logic module) would be an enormous help.

@asmeurer would be a better person to comment on this. Though for now you should probably focus on the work you have previously proposed (your proposal should make things clearer). Once Aaron has a look at the proposal, you guys can think of how your code will help the assumptions module.

Soumya Biswas

unread,

Mar 14, 2014, 6:53:50 AM3/14/14

to sy...@googlegroups.com

I have posted my proposal. Please suggest improvements.

Thank You

Aaron Meurer

unread,

Mar 16, 2014, 3:10:32 PM3/16/14

to sy...@googlegroups.com

Sorry for not responding sooner. This thread must have fallen through
the cracks.

I've answered your questions inline. Let me just comment on the
proposal itself here.

It feels a bit like a laundry list. I realize that you don't really
know what is priority. I would rather focus the proposal. Some
thoughts on focus areas (depending on what interests you):

- Implementation of first-order logic. If you want, you could focus on
monadic calculus (only one variable), which is more decidable than
full first-order logic. Take a look at what Maple and Mathematica have
implemented to get an idea of what is possible, and also what sort of
syntax to use.
- Improvements to the SAT solver (see below for a link to some ideas)
- Implement some other kind of "solver" like clause learning, truth
maintenance system, etc. You should put priority on the sorts of
things that would benefit the assumptions.

I think a lot of what makes your proposal feel like a laundry list is
that some of your ideas are a single pull request, and some could be a
whole summer. For example, changing the printing of the logic
functions is easy; improving the SAT solver with ideas from minisat is
not. But both take up the same amount of space in your proposal. For
the ideas that are actual implementations of algorithms, you should
spend some time discussing their implementation, not just what they
are.

I would try to clean it up, writing more about the hard things and
less about the easy things. Otherwise, it comes off as if you don't
really have a good idea of how difficult things are. Maybe you could
make a more extensive full TODO list on the wiki somewhere that
includes everything, easy or hard.

On Tue, Mar 11, 2014 at 5:24 PM, Soumya Biswas <sdb...@gmail.com> wrote:
> I have almost completed my GSoC proposal on extending the propositional
> knowledge base and creating a new module for first order logic. However
> there are a couple of doubts that I would like to clear before putting up
> the proposal online.
>
> Will first order logic be integrated with its propositional counterpart or
> exist independently? Going for integration will make the code messy as
> appropriate checks will have to be made at EVERY point. On the other hand,
> independent existence will introduce a good amount of redundancy.

I'm not sure I follow you here. Can you give a short example to show
what you mean in each case (integrated or not integrated)?

> Many concepts from first order logic are implemented in the assumption
> system (for e.g. predicates). However these implementations seem (I am not
> sure if they are) to be quite specific in nature. Should I try to make
> changes to the pre-existing code and build the rest of the codebase around
> it? Right now my proposal involves building a generic first order logic
> module from scratch. Additionally, once the logic module is mature enough to
> handle the requirements of the assumption system, would the new assumption
> use the new sympy.logic or continue to use the code in sympy.core /
> sympy.assumptions (which seems to fulfill the need except for efficiency
> parameters).

This is unclear at this point. Currently, the code in facts.py is
quite fast. It's possible that we may want several "cores" for the
assumptions system, a fast but less powerful one and a slower but more
powerful one.

> One of my ideas is to create a faster SAT solver. Picosat and minisat appear
> to be good candidates. As suggested by @asmeurer we could always use
> something on the lines of pycosat (without making it a hard dependency) but
> the native solver should still be as fast as possible. Currently I am
> inclined towards implementing minisat (as pycosat can be used for picosat).
> Some suggestion for an appropriate solver would be helpful. Is it mandatory
> for me to declare the solver I am going to use in my proposal? I believe
> this decision can only be taken after actual implementation and thorough
> benchmarking.

As I noted on your proposal in Melange, what does "implementing
MiniSAT" mean? Do you intend to write Python bindings for MiniSAT (if
so, you should demonstrate to us that you know C++, and the Python C
API)?

Or do you really mean, "improve the DPLL algorithm", which is already
in SymPy? There are some ideas for this at
https://groups.google.com/forum/#!searchin/sympy/muise%7Csort:date/sympy/Cb_uhh3MuE4/k-hSQnxpyv4J
(the posts by Christian Muise in particular).

> Considering that the assumption system is supposed to use the logic module
> heavily, some indication about the functions required (by the assumption
> system from the logic module) would be an enormous help.

Anything you can think of can probably be useful. If you can imagine a
mathematical fact that can be expressed and queried about in the
assumptions system that would use the given logic, that is a use-case.

With that being said, the most important stuff is already implemented.
It just needs to be improved to be more efficient, and to better be
able to handle things like efficient conversion to cnf.

> If my proposal were to get selected, who would be the potential mentor?
> A.K.A who am I supposed to bug for the next couple of days for suggestions
> and improvements. There are some implementation issues that I really want to
> discuss and put in the final proposal.

Well, no one wants to say "I'll be the mentor" for exactly that
reason, because then they'll get bugged more often :)

>
> Thanks in advance for the help.
>

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/3ab02e25-4ec7-4cfb-a696-e0fef118a25e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Soumya Biswas

unread,

Mar 17, 2014, 7:37:20 PM3/17/14

to sy...@googlegroups.com

@asmeurer: Thank You for reviewing my proposal and suggesting changes.

It feels a bit like a laundry list. I realize that you don't really
know what is priority. I would rather focus the proposal. Some
thoughts on focus areas (depending on what interests you):

At this point even I think that my focus is diluted and I may have underestimated the time
required for particular jobs. I will restructure my timeline to primarily work on 2 major things,
SAT solver and FoL framework. The extra ideas, Semantic Tableaux and Inference Engine
for FoL will probably have to take a step back (will add those to future ideas).

- Implementation of first-order logic. If you want, you could focus on
monadic calculus (only one variable), which is more decidable than
full first-order logic. Take a look at what Maple and Mathematica have
implemented to get an idea of what is possible, and also what sort of
syntax to use.

I had a look at Mathematica (earlier) and Maple (now). The syntax I wish to implement is
quite consistent with both their syntax. When I first imagined this project, I wanted SymPy
to be able to do (atleast the basic) stuff that Prolog does (preferably with similar syntax).
Focusing on monadic calculus will probably make the use scenarios for the module very concentrated.
Additionally, going for a full FOPC will give a more academic approach to the module, wherein it can be
used by institutions for demonstration of concepts and algorithms related to First Order Logic (which are
not as intuitive as their propositional counterpart). I have been planning to add an optional
step by step output for functions like conversion to normal forms and resolution.

I think a lot of what makes your proposal feel like a laundry list is
that some of your ideas are a single pull request, and some could be a
whole summer. For example, changing the printing of the logic
functions is easy; improving the SAT solver with ideas from minisat is
not. But both take up the same amount of space in your proposal. For
the ideas that are actual implementations of algorithms, you should
spend some time discussing their implementation, not just what they
are.

I perfectly understand the thing you are trying to say. The ironic thing is there

is a perfect negative correlation between space on my proposal and time required

to complete the job. I will add the implementation details to the major algorithms
and try to condense the small tasks. Additionally I will tag each task depending

on its size.

I'm not sure I follow you here. Can you give a short example to show
what you mean in each case (integrated or not integrated)?

I like the idea given by @srjoglekar246 on the issue, and will go for the integrated version.

"Ideally, we should have the FOL core running on the propositional logic framework.

Having the same functionality in different forms in different places introduces redundancy

and also the problem of changing code twice, if an enhancement has to be added."

As I noted on your proposal in Melange, what does "implementing

MiniSAT" mean? Do you intend to write Python bindings for MiniSAT (if
so, you should demonstrate to us that you know C++, and the Python C
API)?

Or do you really mean, "improve the DPLL algorithm", which is already
in SymPy? There are some ideas for this at
https://groups.google.com/forum/#!searchin/sympy/muise%7Csort:date/sympy/Cb_uhh3MuE4/k-hSQnxpyv4J
(the posts by Christian Muise in particular).

Analogous to my Melange reply, I mean improve the current algorithm to use the heuristics implemented in MiniSAT. After

receiving this mail, I went into some heavy duty history hunting and went through all the posts by Muise on the mailing list

as well as on his blog and have got a much better idea on what is already present and what more should be added. I will
update the proposal to better describe my original ideas and some newly acquired ones.

As noted above, I have underestimated the time required for the SAT solver. Even after working through the

community bonding period on the solver, I will need more time to implement all the required features, test and benchmark

the results. Will make changes to the timeline accordingly.

Side Remark: Inspired from @haz's blog, I will add graphs to my pull requests to show the comparison between running time
of all the algorithms.

Well, no one wants to say "I'll be the mentor" for exactly that
reason, because then they'll get bugged more often :)

Thank You :)

Soumya Biswas

unread,

Mar 18, 2014, 6:53:05 PM3/18/14

to sy...@googlegroups.com

Currently I have 4 outstanding pull requests which target 6 issues but are blocked due to many reasons.

#2869:
1. Support for Equivalent with multiple arguments (Ready to be merged)

#2964:
1. Added Semantic Tableaux (Explanation below)
2. improved pl_true (Ready to be merged, but blocked because of 1)

#2830:
1. Fixed PropKB (Ready to be merged but blocked because of 2)

2. Added Valid Function (complete but configuring it to use semantic tableaux will make it MUCH faster)

#2852

1. Added Tseitin Transformation (Explanation below)

Explaination

Both Semantic Tableaux and Tseitin work towards a common goal i.e. faster SAT solving. A significant

part of my GSoC proposal is to make the overall SAT solving faster and depending upon the result
these functions could become obsolete in future.
1. I plan on implementing much faster to_cnf (to_dnf). This is based on rudimentary benchmarks rather than

just a willingness. The Semantic Tableaux works by iteratively manipulating sets, and is quite fast. I came up
with a modification of the same algorithm for conversion to cnf/dnf. The formula generated by Tseitin takes
longer SAT solving time than the same formula converted to CNF using to_cnf. But the overall time taken,

encoding + SAT is very much smaller. Now, with the new to_cnf and new SAT solver, will this overall time remain

much smaller? Theoretically it still should, but I can't comment on the numbers. So it is probably better to benchmark

this with the new system in place.

2. Semantic Tableaux is currently acting as a lightweight decision procedure. This might very well change once
the new system is in place (same reasoning as above). It is critical to benchmark to_cnf+SAT, Tseitin+SAT and SemTab together

3. Currently both SemTab and Tseitin call eliminate_implications to remove (bi)implications, but this step can be prevented

in exchange of efficiency gain (considerably so in some cases). The traditional algorithm requires this elimination, but using
some hacks supported because of the sympy architecture (by the virtue of applying de morgan automatically), this step can
be removed.

4. One of the points in the open issues is regarding stopping automatic evaluation of Xor, ITE, 'Not pushed to literals' etc.

What is the current stance of the community on this? My evaluation of the situation is following: It is quite helpful to the

developer and decreases code complexity as certain things become deterministic (for e.g. if a Not exists then the arg is

either a literal or an (bi)implication) and can lead to some interesting hacks (see above). However automatic eval could

turn out to be quite confusing to the user and introduce a certain lack of obviousness.

The answer to whether automatic evaluation is to be followed or not determines the code for both SemTab and

Tseitin to a certain extent, and also whether I would add it as a part of my proposal.

The whole point of this quite long post was to tell that while I have spent some good time with the code already

and most are either complete or almost so, I would prefer deferring the actual merging to a later date. However, I still

need a customary patch to be merged into master before the deadline, and #2869 will do the job, as it is both small
and independent. Please comment on whether I should consider getting the PRs merged in their current form (in the

spirit of the Zen of SymPy) and improve later or otherwise.

Aaron Meurer

unread,

Mar 18, 2014, 9:02:52 PM3/18/14

to sy...@googlegroups.com

I'm still not really clear how a non-integrated version would work,
but I'm glad you decided on an integrated one (which seems like the
obvious way to do it to me).

>
>>
>> As I noted on your proposal in Melange, what does "implementing
>>
>> MiniSAT" mean? Do you intend to write Python bindings for MiniSAT (if
>> so, you should demonstrate to us that you know C++, and the Python C
>> API)?
>>
>> Or do you really mean, "improve the DPLL algorithm", which is already
>> in SymPy? There are some ideas for this at
>>
>> https://groups.google.com/forum/#!searchin/sympy/muise%7Csort:date/sympy/Cb_uhh3MuE4/k-hSQnxpyv4J
>> (the posts by Christian Muise in particular).
>
>
> Analogous to my Melange reply, I mean improve the current algorithm to use
> the heuristics implemented in MiniSAT. After
> receiving this mail, I went into some heavy duty history hunting and went
> through all the posts by Muise on the mailing list
> as well as on his blog and have got a much better idea on what is already
> present and what more should be added. I will
> update the proposal to better describe my original ideas and some newly
> acquired ones.
> As noted above, I have underestimated the time required for the SAT solver.
> Even after working through the
> community bonding period on the solver, I will need more time to implement
> all the required features, test and benchmark
> the results. Will make changes to the timeline accordingly.
> Side Remark: Inspired from @haz's blog, I will add graphs to my pull
> requests to show the comparison between running time
> of all the algorithms.

That's good. Improving the SAT solver alone could be a GSoC project.

Aaron Meurer

>
>> Well, no one wants to say "I'll be the mentor" for exactly that
>> reason, because then they'll get bugged more often :)
>
>
> Thank You :)
>

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/sympy/CAGypG-O8t%2B7YgaauLy4mK2Fdap387nMtMMgy1a%2B6jDXxZEtWGQ%40mail.gmail.com.

Reply all

Reply to author

Forward