Lists of sets S such that f(S)=1 somewhat efficiently (where should I write this function ?)

105 views
Skip to first unread message

Nathann Cohen

unread,
Sep 13, 2014, 5:35:54 AM9/13/14
to Sage devel, Nicolas M. Thiery
Hello everybody !

I just wrote a function I need for myself, and I wonder where I should put it in Sage. Here is the thing:

I have a boolean function F defined on all subsets of a set X. The function is increasing, i.e. F(X)=1 => F(X')=1 for all subsets X' of X.

Now I want to compute the hypergraph (or simplicial complex) that this function defines, i.e. all solutions of F(X)=1 (or possibly only the inclusionwise maximal solutions).

My function does that, in such a way that F is called as many times as possible. Which is critical in my case because F is ... slow :-P

And so I wondered what exactly I should do with this function, as it seems to be the kind of things that could be useful to other people too.

Do you have any idea where it belongs ?

Thaaaaaaaaaaaaaaaaaaaaaaaanks ! :-)

Nathann

Jeroen Demeyer

unread,
Sep 13, 2014, 8:18:42 AM9/13/14
to sage-...@googlegroups.com
On 2014-09-13 11:35, Nathann Cohen wrote:
> Hello everybody !
>
> I just wrote a function I need for myself, and I wonder where I should
> put it in Sage. Here is the thing:
>
> I have a boolean function F defined on all subsets of a set X. The
> function is increasing, i.e. F(X)=1 => F(X')=1 for all subsets X' of X.
I would call that "decreasing" (smaller sets give larger values of F).

> or possibly only the
> inclusionwise maximal solutions.
Sounds like an interesting problem, do you know of an efficient
algorithm to do this (without simply trying all sets of course)? Have
you looked in the literature?

> And so I wondered what exactly I should do with this function, as it
> seems to be the kind of things that could be useful to other people too.
Certainly.

> Do you have any idea where it belongs ?
I have a "coding theory" feel about this problem because it sounds a bit
like finding the minimal-weight vectors for a code (the sets X
corresponding to the index sets of zero coefficients)

Jeroen.

Dima Pasechnik

unread,
Sep 13, 2014, 11:04:23 AM9/13/14
to sage-...@googlegroups.com
it's quite general; e.g. X is precisely the set of losing coalitions
in a simple game (in cooperative game theory)

Dima

> Jeroen.
>

Nathann Cohen

unread,
Sep 13, 2014, 6:10:31 PM9/13/14
to Sage devel
Yoooooooooo !

> Sounds like an interesting problem, do you know of an efficient algorithm to do this (without simply trying all sets of course)? Have you looked in the literature?

I don't believe in "literature" anymore. I don't know a lot about other fields of research, but in graph theory people who say that they work on 'algorithms' apparently never heard of what a data structure is. All they care about is asymptotic complexity and whether their problem is polynomial/NP-Hard, especially on unheard-of classes of INPUT.

What my implementation does is rather simple:

1) Think of all subsets of X, and say that the (unique) "predecessor" of a set S is S-{max(S)}. This is a graph.
2) Start from the empty set, and do a breadth-first search from there (this will explore all sets of size 1, then size 2, then [...])
3) For each set S that you explore, decide whether the set S already contains a set S' for which f(S') is False. In this case you know that f(S) is wrong without even calling S. If there is no such set then compute f(S). If it is true store the set somewhere, if it is wrong stop the exploration and keep this set S in the database of "NO" answers.

The good thing here is that it is rather "quick" to test whether your current set S contains a set for which f was False. Just store all your "NO" sets in a binary matrix (i.e. an array of bitsets in Sage), and all you need to do to run the test is compute the intersection of some bitsets.

There is no magic in the algorithm and it is 25 lines long. It is just a good way to avoid calling f too often when this function is very slow.

Unfortunately it has been running for 10 hours on my computer and I still did not find the set I am looking for :-/

> I have a "coding theory" feel about this problem because it sounds a bit like finding the minimal-weight vectors for a code (the sets X corresponding to the index sets of zero coefficients)

NOooooooo idea about that O_o

Nathann

Nathann Cohen

unread,
Sep 13, 2014, 6:51:01 PM9/13/14
to Sage devel
If you want to see the code, I just finished cleaning it a bit.


Nathann

kcrisman

unread,
Sep 13, 2014, 9:22:01 PM9/13/14
to sage-...@googlegroups.com
it's quite general; e.g. X is precisely the set of losing coalitions
in a simple game (in cooperative game theory)

Dima Pasechnik

unread,
Sep 14, 2014, 4:45:14 AM9/14/14
to sage-...@googlegroups.com
On 2014-09-13, Nathann Cohen <nathan...@gmail.com> wrote:
> Yoooooooooo !
>
>> Sounds like an interesting problem, do you know of an efficient algorithm
> to do this (without simply trying all sets of course)? Have you looked in
> the literature?
>
> I don't believe in "literature" anymore. I don't know a lot about other
> fields of research, but in graph theory people who say that they work on
> 'algorithms' apparently never heard of what a data structure is. All they
> care about is asymptotic complexity and whether their problem is
> polynomial/NP-Hard, especially on unheard-of classes of INPUT.

this is only one of the current trends - other ones argue that algorithms
should be accompanied by implementations, and then there is a lot of things done
on polynomial-time algorithms.
(of course if you look for an excuse not to check literature, you can have one,
always :-))

Anyhow, one speedup can come from your function being invariant under some
permutations of variables; perhaps you can check this quickly.

Nathann Cohen

unread,
Sep 14, 2014, 7:19:56 AM9/14/14
to Sage devel
Yo !

> this is only one of the current trends - other ones argue that algorithms
> should be accompanied by implementations, and then there is a lot of things done
> on polynomial-time algorithms.

Well, it should at the very least be implemented (with commented code)
indeed. It would also be cool if people cared about the actual
comutation times: using bitsets does not change anything
asymptotically but it does make a difference in the running times.

> Anyhow, one speedup can come from your function being invariant under some
> permutations of variables; perhaps you can check this quickly.

Does not apply to my case but there is a great thing about that in
Sage if it interests you: With just one line you can enumerate, from a
group acting G on a set of points, a representative of each orbit of
the action of G on the groups of size k.

Example:

sage: IntegerVectorsModPermutationGroup(groups.permutation.Cyclic(5),sum=3,max_part=1)
Vectors of length 5 and of sum 3 whose entries is in {0, ..., 1}
enumerated up to the action of Cyclic group of order 5 as a
permutation group

Nathann

Nathann Cohen

unread,
Sep 14, 2014, 7:41:53 AM9/14/14
to Sage devel
By the way, no idea about where I should write that in Sage ? :-/

Nathann

P Purkayastha

unread,
Sep 14, 2014, 8:20:50 AM9/14/14
to sage-...@googlegroups.com

Put the function in combinat or sets? Apparently it is not just restricted to coding theory or some other specific structure.

--
You received this message because you are subscribed to a topic in the Google Groups "sage-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sage-devel/os1LzBjsYnQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Nathann Cohen

unread,
Sep 14, 2014, 8:27:42 AM9/14/14
to Sage devel
> Put the function in combinat or sets? Apparently it is not just restricted
> to coding theory or some other specific structure.

Well I wanted to put it somewhere in the simplicial complex
constructor, but I wrote to John Palmieri about that and I got no
answer... I don't know... I am pretty sure that if I write it
somewhere on its own it will just be forgotten forever.

Nathann

Dima Pasechnik

unread,
Sep 14, 2014, 2:22:56 PM9/14/14
to sage-...@googlegroups.com
On 2014-09-13, Nathann Cohen <nathan...@gmail.com> wrote:
Can you formalise what you look for?
Perhaps this can be sped up, e.g. by doing
some ILP or something like this?

You can also think of your "NO sets" as a set of SAT clauses of the form
!x_{i_1} || !x_{i_2} || ... || !x_{i_m},
and all of them should hold true.
So you want to find all "maximal length" solutions to this SAT problem.
Perhaps some SAT solvers can do this, I don't know
(by the way, SAT solvers is an area where people do care a lot about actual fast
implementations)

Dima

Nathann Cohen

unread,
Sep 14, 2014, 5:02:12 PM9/14/14
to Sage devel
Yo !

> Can you formalise what you look for?

It is very kind of you to help me in my research, but it is a bit unrelated to this discussion :-P

> Perhaps this can be sped up, e.g. by doing
> some ILP or something like this?

I tried, but my function 'f' is not of the good kind for a LP formulation. It is too slow.

> You can also think of your "NO sets" as a set of SAT clauses of the form
> !x_{i_1} || !x_{i_2} || ... || !x_{i_m},
> and all of them should hold true.

Indeed, but in order to do that I would need to enumerate them all. And this is precisely what this code above does (and even it is too slow).

> So you want to find all "maximal length" solutions to this SAT problem.

I don't know what you understand by 'maximal length'. I am somehow interested by the maximal sets such that f(S) is true, but not even all of them. Well, it's a bit complicated to explain.

> Perhaps some SAT solvers can do this, I don't know

They would still need all minimal no-sets, and I can't list them at the moment, they are too many. And if I coud list them I would not need the SAT solver anyway for I would be able to enumerate the maximal yes-sets too.

> (by the way, SAT solvers is an area where people do care a lot about actual fast
> implementations)

Indeed, and I would not try to rewrite one from scratch.

Nathann

Travis Scrimshaw

unread,
Sep 14, 2014, 9:22:16 PM9/14/14
to sage-...@googlegroups.com
That constructor already does a bit of parsing of the data. So I'd be okay with having it call a function which generates the appropriate data. Although it would probably be better as a method of (the classes returned by) Set or Subsets since it is an operation on a set (with a specified function).

Best,
Travis

Bruno Grenet

unread,
Sep 15, 2014, 3:37:53 AM9/15/14
to sage-...@googlegroups.com, Travis Scrimshaw
FWIW the problem (or a slight variant of it) is known in complexity
theory as learning of monotone boolean functions. To translate your
problem to this language, you have to consider your sets as subsets of a
common large set with n elements and describe the subsets by n-tuples.
Of course, you might find more negative results than implemented
algorithms in the complexity theory literature.

Bruno
> --
> You received this message because you are subscribed to the Google
> Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sage-devel+...@googlegroups.com
> <mailto:sage-devel+...@googlegroups.com>.
> To post to this group, send email to sage-...@googlegroups.com
> <mailto:sage-...@googlegroups.com>.

Nathann Cohen

unread,
Sep 15, 2014, 4:08:25 AM9/15/14
to Sage devel, Travis Scrimshaw
Yoooooooooo !

> FWIW the problem (or a slight variant of it) is known in complexity theory
> as learning of monotone boolean functions. To translate your problem to this
> language, you have to consider your sets as subsets of a common large set
> with n elements and describe the subsets by n-tuples.

Excellent ! That's exactly what my code does. I will try to look for some interesting things about that now that I know the correct terminology. Thanks !

> Of course, you might
> find more negative results than implemented algorithms in the complexity
> theory literature.

I know what I can expect of the graph theory world, but I do not know those guys much so I will have faith at least for a while. Let's see how it goes :-)

Thanks again !

Nathann

Dima Pasechnik

unread,
Sep 15, 2014, 4:24:42 AM9/15/14
to sage-...@googlegroups.com
On 2014-09-14, Nathann Cohen <nathan...@gmail.com> wrote:
>
>> You can also think of your "NO sets" as a set of SAT clauses of the form
>> !x_{i_1} || !x_{i_2} || ... || !x_{i_m},
>> and all of them should hold true.
>
> Indeed, but in order to do that I would need to enumerate them all.
In this language, your code enumerates true/false assigments to the variables
x_j, so that all these NO clauses hold true.
These NO clauses are just an encoding of your "matrix of NOs" that I understood
you write out completely. But now you write that you can't do this.
Oh well.

Nathann Cohen

unread,
Sep 15, 2014, 4:31:53 AM9/15/14
to Sage devel
Yo !

> In this language, your code enumerates true/false assigments to the variables
> x_j, so that all these NO clauses hold true.
> These NO clauses are just an encoding of your "matrix of NOs" that I understood
> you write out completely. But now you write that you can't do this.
> Oh well.

Yes, but in this formalism they consider the function as a black box,
which is exactly what I am doing when I run the function I mentionned
in this thread.

The matrix that I build in this function does not contain all no-sets.
At first it contains none. And every time the function F is called, we
'learn' either a yes-set (that we store) or a no-set (that we add to
the matrix, in order to filter other sets later). I cannot enumerate
all no-set and all yes-sets, for the union of them is 2^n. But the
function tries to do its job by only having to meet the inclusionwise
mininal no-sets, which it computes while the functions does its
exploring.

Nathann

Dima Pasechnik

unread,
Sep 15, 2014, 5:05:03 AM9/15/14
to sage-...@googlegroups.com
I see. By the way, there is an approach to do this using ILP. At some point 
you have a 0-1 LP with NO sets generated so far as inequalities (and
other inequalities that cut out the solutions found so far)
I.e. if {j_1,...,j_m} is a NO-set then the corresponding inequality is
 x_{j_1}+...+x_{j_m}<=m-1, and the objective function is to maximize
sum_j x_j.

If your ILP is ineafsible, you're done. 

If you find a solution, say, 
x_{j_1}=...=x_{j_m}=1, and the remaining x_k=0,
you add the inequality x_{j_1}+...+x_{j_m}<=m-1 to your list;
(this cuts out this solution from being considered again)

you also test the solution with your f, and if it passes, you store it.

Now start with empty 0-1 LP, and run the ILP solver again and again, until done.

You end up with a list of stored solutions (ordered by size, in fact)

Implementing this would need a fast way to re-start the ILP solver
(which CPLEX and GUROBI should do very well, I suppose)

Dima


Nathann

Nathann Cohen

unread,
Sep 15, 2014, 5:15:10 AM9/15/14
to Sage devel
Yo !

> I see. By the way, there is an approach to do this using ILP. At some point
> you have a 0-1 LP with NO sets generated so far as inequalities (and
> other inequalities that cut out the solutions found so far)
> I.e. if {j_1,...,j_m} is a NO-set then the corresponding inequality is
>  x_{j_1}+...+x_{j_m}<=m-1, and the objective function is to maximize
> sum_j x_j.
>
> If your ILP is ineafsible, you're done.
>
> If you find a solution, say,
> x_{j_1}=...=x_{j_m}=1, and the remaining x_k=0,
> you add the inequality x_{j_1}+...+x_{j_m}<=m-1 to your list;
> (this cuts out this solution from being considered again)
>
> you also test the solution with your f, and if it passes, you store it.
>
> Now start with empty 0-1 LP, and run the ILP solver again and again, until
> done.
>
> You end up with a list of stored solutions (ordered by size, in fact)
>
> Implementing this would need a fast way to re-start the ILP solver
> (which CPLEX and GUROBI should do very well, I suppose)

It is very kind of you to explain me how to write a LP.

There are at least two problems with what you say:

1) Make it run in your head with a boolean function f constant to "False". It will enumerate the 2^n no-sets.

2) If you fix it by minimizing instead of maximizing, (and by adding a constraint when you find a yes-set, and by adding a constraint when you add a no-set) then you are back to what my implementation does, except that you are solving a LP at each node while it can be done directly with the code I wrote. Faster.

Nathann

Dima Pasechnik

unread,
Sep 15, 2014, 5:34:00 AM9/15/14
to sage-...@googlegroups.com


On Monday, September 15, 2014 10:15:10 AM UTC+1, Nathann Cohen wrote:
Yo !

> I see. By the way, there is an approach to do this using ILP. At some point
> you have a 0-1 LP with NO sets generated so far as inequalities (and
> other inequalities that cut out the solutions found so far)
> I.e. if {j_1,...,j_m} is a NO-set then the corresponding inequality is
>  x_{j_1}+...+x_{j_m}<=m-1, and the objective function is to maximize
> sum_j x_j.
>
> If your ILP is ineafsible, you're done.
>
> If you find a solution, say,
> x_{j_1}=...=x_{j_m}=1, and the remaining x_k=0,
> you add the inequality x_{j_1}+...+x_{j_m}<=m-1 to your list;
> (this cuts out this solution from being considered again)
>
> you also test the solution with your f, and if it passes, you store it.
>
> Now start with empty 0-1 LP, and run the ILP solver again and again, until
> done.
>
> You end up with a list of stored solutions (ordered by size, in fact)
>
> Implementing this would need a fast way to re-start the ILP solver
> (which CPLEX and GUROBI should do very well, I suppose)

It is very kind of you to explain me how to write a LP.

There are at least two problems with what you say:

1) Make it run in your head with a boolean function f constant to "False". It will enumerate the 2^n no-sets.
corner cases are hard, in theory too :-)
You can certainly add a pre-testing by evaluating f on all singletons and pairs, say.
(and this would also speed up things for functions having some NO singletons and pairs quite a bit)
 

2) If you fix it by minimizing instead of maximizing, (and by adding a constraint when you find a yes-set, and by adding a constraint when you add a no-set) then you are back to what my implementation does, except that you are solving a LP at each node while it can be done directly with the code I wrote. Faster.

not necessarily faster. Your code completely ignores the polyhedral nature of the problem; e.g. the ILP can potentially terminate quite fast, 
due to it finding that the polyhedron defined at some point is already empty, while your code will keep looking for non-existing things...



Nathann

Nathann Cohen

unread,
Sep 15, 2014, 5:40:31 AM9/15/14
to Sage devel
Yo !

>> 1) Make it run in your head with a boolean function f constant to "False". It will enumerate the 2^n no-sets.
>
> corner cases are hard, in theory too :-)
> You can certainly add a pre-testing by evaluating f on all singletons and pairs, say.
> (and this would also speed up things for functions having some NO singletons and pairs quite a bit)

This is not a corner case. What it tells you is that you do not
enumerate inclusionwise minimal no-sets, and that you will pay for it.
Make it run with lambda x: len(x)<5 on a set X of size 15, same
result.

> not necessarily faster. Your code completely ignores the polyhedral nature of the problem; e.g. the ILP can potentially terminate quite fast,
> due to it finding that the polyhedron defined at some point is already empty, while your code will keep looking for non-existing things...

It will not. And you can think of the "polyhedral nature" of binary
vectors all you want, if you need to enumerate them just do it the
straightforward way, not with a LP solver.

In this special case I don't see how a LP could beat a code where
feasibility is checked by intersection of bitsets. But if you care I
would be glad to see by how much it beats the LP method :-P

Nathann

Dima Pasechnik

unread,
Sep 15, 2014, 6:35:35 AM9/15/14
to sage-...@googlegroups.com


On Monday, September 15, 2014 10:40:31 AM UTC+1, Nathann Cohen wrote:
Yo !

>> 1) Make it run in your head with a boolean function f constant to "False". It will enumerate the 2^n no-sets.
>
> corner cases are hard, in theory too :-)
> You can certainly add a pre-testing by evaluating f on all singletons and pairs, say.
> (and this would also speed up things for functions having some NO singletons and pairs quite a bit)

This is not a corner case. What it tells you is that you do not
enumerate inclusionwise minimal no-sets, and that you will pay for it.
Make it run with lambda x: len(x)<5 on a set X of size 15, same
result.

enumerating inclusion-wise minimal no-sets is not a remedy: if 
f =(lambda x: len(x)<k) for |X|=2k, you're pretty much
out of luck.



> not necessarily faster. Your code completely ignores the polyhedral nature of the problem; e.g. the ILP can potentially terminate quite fast,
> due to it finding that the polyhedron defined at some point is already empty, while your code will keep looking for non-existing things...

It will not. And you can think of the "polyhedral nature" of binary
vectors all you want, if you need to enumerate them just do it the
straightforward way, not with a LP solver.
we don't need them all, we only need the maximal ones.
LP won't even look at subsets of an already discovered yes-set.
 

In this special case I don't see how a LP could beat a code where
feasibility is checked by intersection of bitsets. But if you care I
would be glad to see by how much it beats the LP method :-P

your bitsets are not pruned, and eventually you might end up with 
too many of them to be efficient. Whereas ILP solver would discard
redundant inequalities. 

As well, as you know, there are combinatorial problems, e.g. finding a maximum clique in a graph,
where ILP might beat the straight combinatorial algorithms. Now make your f report True on a graph clique, 
and False on a non-clique, and wait forever for your code to finish...

 

Nathann

Nathann Cohen

unread,
Sep 15, 2014, 8:24:56 AM9/15/14
to Sage devel
Yo !

> enumerating inclusion-wise minimal no-sets is not a remedy: if
> f =(lambda x: len(x)<k) for |X|=2k, you're pretty much
> out of luck.

Indeed. But it is much, much, much better in most cases.

> we don't need them all, we only need the maximal ones.
> LP won't even look at subsets of an already discovered yes-set.

I can discard forget them too with my code. I don't really need to keep them.

> your bitsets are not pruned, and eventually you might end up with
> too many of them to be efficient. Whereas ILP solver would discard
> redundant inequalities.

They cannot be pruned. There is nothing to prune. I only keep the minimal ones.

> As well, as you know, there are combinatorial problems, e.g. finding a
> maximum clique in a graph,
> where ILP might beat the straight combinatorial algorithms. Now make your f
> report True on a graph clique,
> and False on a non-clique, and wait forever for your code to finish...

In this case the list of no-sets is known from the start, and it is
small. And the boolean function can be quickly evaluated. Clearly not
what this function is meant to handle.

Nathann

Nathann Cohen

unread,
Sep 15, 2014, 8:25:28 AM9/15/14
to Sage devel
> In this case the list of no-sets is known from the start, and it is
> small. And the boolean function can be quickly evaluated. Clearly not
> what this function is meant to handle.

The list of *minimal* no-sets. In case you would hold this against me.

Nathann

Dima Pasechnik

unread,
Sep 15, 2014, 8:40:23 AM9/15/14
to sage-...@googlegroups.com
you wanted to know a function f that might be harded for your algorithm vs ILP,
and I give you one, as above. I won't tell you it comes from a graph.
(and I implement it to be very slow on small-size subsets :-))

Nathann Cohen

unread,
Sep 15, 2014, 9:29:14 AM9/15/14
to Sage devel
> you wanted to know a function f that might be harded for your algorithm vs ILP,
> and I give you one, as above. I won't tell you it comes from a graph.
> (and I implement it to be very slow on small-size subsets :-))

And I maintain it, but you are not allowed to write a problem-specific LP: you should use the LP that works on all instances. 

If you use the LP that you proposed above on a graph with no edges you are precisely in the situation where the function is equal to lambda x:len(x)<=1. And you will spend an exponential time and memory to find that the largest clique is equal to one vertex.

I have not tried, but I am pretty convinced that the function has better performances.

nathann

Dima Pasechnik

unread,
Sep 15, 2014, 3:18:09 PM9/15/14
to sage-...@googlegroups.com
IMHO, it's fine to put your code as a constructor in SimplicialComplex.
If you do this, cc me on the ticket, I'll review it.


Nathann Cohen

unread,
Sep 16, 2014, 12:42:10 PM9/16/14
to Sage devel
After a looooot of time spent thinking about that, I found the following page :


I thus created a new file subsets_hereditary just near with the function in it :-P

It is all on #16994, which needs a review:

Have fuuuuuuuuuun !

Nathann

On 15 September 2014 21:17, Dima Pasechnik <dim...@gmail.com> wrote:
IMHO, it's fine to put your code as a constructor in SimplicialComplex.
If you do this, cc me on the ticket, I'll review it.


--

You received this message because you are subscribed to a topic in the Google Groups "sage-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sage-devel/os1LzBjsYnQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sage-devel+...@googlegroups.com.
To post to this group, send email to sage-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages