Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

new page on SAT, hard instances, factoring, full solution, links

6 views

Skip to first unread message

Vladimir Z. Nuri

unread,

Mar 12, 1998, 3:00:00 AM3/12/98

Following is a page now available at:
http://www8.pair.com/mnajtiv/sat/sat.html

~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^
"in theory, there's no difference vzn...@netcom.com
between theory and practice, mad scientist research lab
but in practice there is!" http://www8.pair.com/mnajtiv/

===

SAT hard instance generator and solution algorithms

by V.Z.Nuri, <vzn...@netcom.com>

This is the documentation to a few C and Perl utilities and algorithms to
explore and solve the satisfiability problem that were developed in the
period 1992-1998. A hard SAT instance generator is based on factoring
(written in C), and two SAT solvers are based on full resolution (written in
Perl).

* Introduction
* Encoding format
* Hard SAT instances based on factoring
* fact2sat.c
* Data files
* Full Solution of SAT via resolution and basic heuristics
* solve1.pl, solve2.pl
* Open questions
* Credits
* Other links

----------------------------------------------------------------------------

Introduction

The satisfiability problem is in my opinion the most interesting and most
important single "open" problem in computer science. It is "open" in the
sense that no "efficient" (nonexponential) algorithm has been found for all
cases, but neither has a nonoptimal solution been proved impossible.
Furthermore it was the first problem proven NP complete, which means that if
it can be solved "efficently", then a massive class of other problems in the
class NP, many of which have very practical applications, could be solved
efficiently too. (Actually, the whole concept of NP completeness was
invented in Cook's general conversion of Turing machines to SAT.) It appears
to me that the satisfiability problem is a fundamental representation of the
essence of all computer science algorithms, or in less melodramatic terms,
at least an excellent candidate.

These are a few programs I developed to study the satisfiability problem by
relating it to another key "open" fundamental problem in computer science,
that of factoring numbers. My interest in studying the satisfiability
problem required a good "hard instance" generator. Converting factoring
problems to satisfiability form has proven very useful in this regard. If it
turned out that any of the factoring instances were "easy", one would have
an efficient factoring algorithm, something that would be virtually as
important as a good satisfiability algorithm. In this way I have used one
difficult problem (factoring) as a lever and tool against another difficult
one (satisfiability). One key researcher in the field has informed me that
the satisfiability instances are indeed very difficult, and they have the
very attractive property that they are rather "compact" for their
difficulty.

More information on SAT can be found in Introduction to Automata Theory,
Languages, and Computation by Hopcroft and Ullman, an advanced text. See
also the links at the bottom of this document.

Encoding format

There is a basic, compact ASCII encoding format used by the code. Suppose a
satisfiability problem is given as

(A + ~B + C) * (B + ~C) * (~A + C)

In the input file format, variables are numbered (indexed) sequentially
starting from 1, with negative values specifying the variable complement
(negation), and spaces separating the variables in a clause. The above
problem then becomes:

1 -2 3
2 -3
-1 3

Hard SAT instances based on factoring

In theory, converting factoring to satisfiability is straightforward, but in
practice the translation is tricky and requires a lot of careful debugging.
This section gives a brief overview. The code reuses the concept of "integer
variables", or a variable composed of a number of bits, each which is a
binary variable. An arithmetic operation between two integer variables can
be modelled as a series of constraints (or relations) operating on their
respective binary variables. Multiplying two binary numbers is done in the
standard pencil-and-paper form of adding shifted forms of the multiplicand,
depending on the multiplier. For example, to multiply a 5 digit binary
number XXXXX by a 3 digit number YYY, we have the following general format:

XXXXX Y
XXXXX Y
XXXXX Y
-------
..ZZZZZZZ

Each of the 3 rows of the multiplicand is either added or not added
depending on a corresponding bit in the multiplier. This is like a "mask"
operation that is tied to each multiplier bit. The ".." indicates a series
of additional digits to the left to handle whatever carries may remain.

The program iteratively creates the constraints that define the
multiplication problem in the following way. The first "XXXXX" is simply the
multiplicand ANDed with the first bit of the multiplier. That is, a series
of constraints is generated, one for each bit of "XXXXX", that requires that
a new series of binary variables (comprising a new integer variable) are all
equal to zero if the first digit "Y" is zero, otherwise they are the
corresponding bits of "XXXXX" if the "Y" digit is one. This is done for the
second "XXXXX" as well.

Next, a new binary variable is created, such that its bits are the sum of
"XXXXX" and the second, shifted "XXXXX". The bits of the new variable are
constrained one at a time in an additive relationship that is similar to a
"full adder" circuit in electrical engineering, with a carry feeding into
the next full adder. The next variable added works from the prior partial
sum, and the masked result of the second binary digit of the multiplier
"YYYYY". Finally, the multiplication is eventually complete and a series of
constraints is placed on the "..ZZZZZZZ" bits, such that this integer
variable is equal to the number to be factored.

In other words, Z = M1 + M2 + M3 + ..., where M1 = X*(2^0) AND Y1, M2 =
X*(2^1) AND Y2, M3 = X*(2^2), ... In recursive form, Z[n] = Z[n-1] +
X*(2^n)*Y[n] where "Y[n]" is the n'th bit of the integer variable Y, and
each "Z[n]" is an integer variable equivalent to the n'th partial sum.

Neither the X or Y integer variables are constrained to be in any particular
form. Once this complete satisfiability problem is solved, the binary digits
of X and Y will be included in the solution and can be recovered. They
specify the exact factors that, when multiplied together, give Z, i.e. X * Y
= Z, where each of these symbols represents an integer formed by a series of
binary digits.

fact2sat.c

The main "factoring to satisfiability" converter, or hard satisfiability
instance generator is called fact2sat.c (14K, 606 lines source code). I've
tested this code over several years in many situations and can certify its
integrity to an extremely high degree.

The program is called in the form

fact2sat [n] > [outfile]

where [n] is the number to factor and [outfile] is the index form output.
This code creates several log files that describes the format of the output.

The format of the output is as follows: the low numbered variables are the
bits in the solution. Other bits at higher index positions are used to
specify intermediate sums in the multiplication problem.

The code first finds the square root of the number [n] specified on the
command line, and rounds up to the nearest integer. This integer will have a
certain number of bits. The lowest bits in the SAT problem will specify one
factor, and the next lowest bits will specify the other factor. The
respective "widths" of the integer variables (represented by successive
boolean variables) are reported by the program to "stderr". Example:

fact2sat 13 > 13.#
widths: 4 & 3

The solution will be specified in the variables 1-7 (4 + 3 bits). Variables
1-4 will be one of the factors, and variables 5-7 will be the other factor.
Lower numbered bit variables are the least significant bits in the two
integer variables. For more information, see the solver section.

The generator creates a variety of log files that describe the internal
structure of the factoring instance. It describes how individual bits are
allocated into integer variables, and the arithmetic constraints that led to
each clause in the final output. ("out.dat" gives a list of the numbers of
the bit variables that comprise the final product integer variable Z.) The
following files would be generated from this command line:

13 # 2121 5-27-95 9:21a
BITS TXT 2597 5-27-95 9:21a
VARS TXT 641 5-27-95 9:21a
VAR2S TXT 472 5-27-95 9:21a
LINES TXT 15593 5-27-95 9:21a
OPS TXT 2568 5-27-95 9:21a
OUT DAT 22 5-27-95 9:21a

Similarly, these are files that would be associated with factoring the
number 15:

15 # 2120 5-27-95 10:53a
BITS TXT 2597 5-27-95 10:53a
VARS TXT 641 5-27-95 10:53a
VAR2S TXT 472 5-27-95 10:53a
LINES TXT 15592 5-27-95 10:53a
OPS TXT 2567 5-27-95 10:53a
OUT DAT 22 5-27-95 10:53a

Note that "fact2sat" generates slightly redundant clauses that can be
compressed somewhat using various straightforward algorithms. In other
words, the input files are not in minimal form.

The solutions to the final problem may also be given in duplicate
reverse-ordered pairs, i.e. two solutions corresponding to factors (a,b) and
(b,a) may exist. For larger numbers where the number of bits to represent
the square root is much less than the length for the number itself, the
differing integer factor lengths tend to restrict duplicates. In general
duplicates exist when each factor is approximately the same bit-length. See
the section on "Solvers" for comments on factoring-satisfiability
considerations.

Data files

For convenience and verification, I've archived some sample data files here.
The full collection is available as dat.tar.gz, 31K gzipped tar file, 144K
uncompressed. The data is split into two series, one for numbers composed of
two prime factors, and the other as 2^n-1.

KB size in kilobytes
n filename and number to factor
vars number of variables in instance
clauses number of clauses or lines in instance
bits number of binary bits in number to be factored
factors prime factors of number

KB n vars clauses bits factors
2 9 35 159 4 3 3
5 49 70 362 6 7 7
9 169 117 645 8 13 13
16 961 176 1008 10 31 31
24 3721 247 1451 12 61 61
33 16129 330 1974 14 127 127
1 3 12 36 2 3
1 7 17 61 3 7
2 15 35 159 4 3 5
3 31 43 204 5 31
5 63 70 362 6 3 3 7
6 127 81 427 7 127
9 255 117 645 8 3 5 17
11 511 131 730 9 131
16 1023 176 1008 10 3 11 31

Full Solution of SAT via resolution and basic heuristics

To the author's knowledge (which is admittedly sparse), the two algorithms
presented here are original and worthy of consideration within the leading
approaches known in the literature based on both the abstract theoretical
interest of the strategies and their efficient performance. They are
essentially a variation on a greedy, recursive resolution algorithm that
tries to minimize the size of boolean expressions evolved over multiple
iterations to represent remaining variables in the solution. They are
guaranteed to solve the problem completely, i.e. find all solutions if any
exist, or signify no solutions (with empty output) if none exist.

These algorithms are designed to return solutions in "sum of products" form
(disjunctive normal form, DNF) which has either the advantage or
disadvantage, of giving all solutions, not merely one assignment. This means
that some satisfiability problems that might require "exponentially sized"
output to fully enumerate all their solutions (i.e. specifying assignments
to every variable) can in some cases require only "polynomially sized"
output by these programs using the sum-of-products representation instead.
(Note obviously that a polynomial sized output is not a guarantee of a
polynomial time algorithm.) It is of course trivial to enumerate "fully
specified" expressions from expressions giving sum-of-products form if the
former is desired.

Specifying prime-number-based satisfiability instances as input to the
solvers are very useful test case because they will generally lead to only a
single satisfying assignment (the exception is very small numbers for which
the number of bits to represent the square root of the number is
approximately as many as the number itself), and in general they also seem
to be "harder" to solve than factoring "smoother" numbers (i.e. those with
more factors in the decomposition). Similarly, the number of satisfying
assignments can almost be anticipated and specified exactly based on the
decomposition of the number to be factored. This ease in creating a small,
predetermined number of "hard" solutions is an attractive feature not always
readily available from other satisfiability generators. (Creating
satisfiability instances with no solutions by adding a few extra clauses to
a factoring problem is trivial when the factoring conversion algorithm is
understood. The author has not investigated the difficulty of such
instances, and this is all left as an interesting exercise to the reader.)

Factoring primes is generally the most time consuming and memory intensive
factoring operation for the satisfiability solvers. However in some cases
"smooth" numbers require less memory and but more time, or vice versa.

solve1.pl, solve2.pl

(~6K, ~400 lines each)

The solvers were written in Perl 5.0 for rapid prototyping and studying the
complexity of the algorithms. The "solve" logic involves creating a boolean
expression of "sum of products" iteratively or recursively from the
satisfiability "product of sums" form using the method known in the
literature as "resolution", with specialized heuristics to determine what
paths to follow. It tries to minimize the size of this expression at all
times, although the compaction is essentially a greedy formula (using
absorption) that may not find the optimally shortest formula. "solve2" uses
a slightly different branching heuristic.

The output from the solver is a "sum of products" expression giving the
solution. Not all variables may be specified. Each line is a separate clause
in the "sum of products". For example,

solve < 13.#

1 -2 3 4 5 -6 -7 8 -9 10 11 -12 -13 -14 -15 -16 17 18 -19 -20 -21 -22 -23 -24 -25 -26 -27 28 29 -30 -31 -32 -33 -34 -35

During the 13.# generation by "fact2sat", the integer variable lengths were
given as "4 & 3". Hence the factors are, given lsb to msb,

(1) (2) (4) (8)
1 -2 3 4 factor 1, 4 digits
5 -6 -7 factor 2, 3 digits

1 + 0 + 4 + 8 = 13
1 + 0 + 0 = 1

For another example, "fact2sat 15 | solve" gives:

1 2 3 4 5 -6 -7 8 9 10 11 -12 -13 -14 -15 16 17 18 -19 -20 -21 -22 -23 -24 -25 -26 -27 28 29 -30 -31 -32 -33 -34 -35
1 2 -3 -4 5 -6 7 8 9 -10 -11 -12 -13 -14 -15 16 -17 -18 -19 -20 -21 -22 -23 24 25 -26 -27 28 29 -30 -31 -32 -33 -34 -35
1 -2 3 -4 5 6 -7 8 -9 10 -11 12 -13 14 -15 16 17 18 -19 -20 -21 -22 -23 -24 -25 -26 -27 28 29 -30 -31 -32 -33 -34 -35

The lengths are 4 & 3 again, so these solutions give respectively the
factors (15,1), (3,5), (5,3).

Using these programs, the author has been able to factor numbers as large as
63 (70 variables, 362 clauses) in about 15 minutes on a 166 Mhz Pentium
Linux system. A 35 variable, 159 clause formula associated with the number
15 can be solved in about 30 seconds. These are for the first heuristic; the
second heuristic is roughly similar. Of course, performance could be
radically improved (perhaps by several orders of magnitude) by porting the
algorithms to C. The Perl code is for studying the complexity of the
algorithm and is not fast in absolute time. It is not even very efficient
Perl code, opting to use only string and not pointers and complex
structures. The continual reparsing of strings adds a significant overhead.

The two heuristics decide which variable to remove or "resolve" from the
remaining problem. Empirical results and trial-and-error of many other
possibilities yielded these as the best found. The code variations are
implemented in the best() subroutine.

* In the first program, a count of occurences of variables in the
remaining problem is determined. The product of the number of positive
instances of a variable is multiplied by the number of occurences of
the negation, and the variable with the lowest score is utilized.
* In the second program, a count of occurences of the total number of
variables in potential new clauses added from resolution of particular
variables is determined. All possible alternative full resolutions for
a particular variable are considered. The variable with an associated
resolution of new added clauses with the smallest number of variables
in the clauses is utilized.

Open Questions

Some basic research questions are raised by these algorithms. In the
authors' opinion, these are extremely interesting open problems that cut to
the heart of determining the difficulty of SAT and perhaps even of
computational complexity in computer science:

* The very best factoring algorithm could be translated to some kind of
solution for factoring satisfiability instances. Would it suggest a
more general algorithm? How does the complexity of this algorithm for
factoring-satisfiability instances compare with factoring algorithms?
* How often are these algorithms finding the optimally sized (i.e.
smallest) sum-of-products expression for the solution, or how close is
the one they are finding to the true optimum?
* The programs may involve exponential or polynomial time, space, or
output depending on the input. Is there some kind of theoretical
characterization of the regions of problem space that lead to each
situation?
* What is the best that algorithms of this type can achieve, i.e. given
the constraint of sum-of-products representation? In other words, how
large (and time- / space-consuming to compute) is the minimal
sum-of-products solution?
* Are there better algorithms (less time, less memory, smaller solution
representation) to find a more optimal sum-of-products representations?
* Are there more compact general representations of satisfiability
solutions than the sum-of-products form?

(Note as an additional permutation that each of these questions can be
explored for factoring-satisfiability instances or all satisfiability
problems in general.) Some of these questions have been researched to
various degrees in the literature, but most have not been fully resolved.

Credits

I would like to thank W.Naylor for (1) the idea of converting factoring to
satisfiability, and (2) helping me realize how important, fundamental, and
elegant SAT is, particularly how it captures the canonical idea in computer
science of filtering criteria applied to a search space. I would like to
thank B.Selman for working with the "fact2sat" program in an earlier
unrefined stage and for knowledgable opinions on the relative merits of this
factoring-satisfiability formulation for general SAT inquiries. However none
of the contents of these pages have been endorsed by anyone.
----------------------------------------------------------------------------

Daniel A. Jimenez

unread,

Mar 12, 1998, 3:00:00 AM3/12/98

In article <vznuriEp...@netcom.com>,
Vladimir Z. Nuri <vzn...@netcom.com> wrote:
> [stuff deleted]

> * The very best factoring algorithm could be translated to some kind of
> solution for factoring satisfiability instances. Would it suggest a
> more general algorithm?

Probably not. The best algorithms seem to have little to do with the
bit-represention of the number to be factored. Perhaps a better path
would be to look at other NP-complete problems that have had a lot of
work put into algorithms, like TSP, and see if you can get a better SAT
algorithm. In general, though (journal articles to the contrary here and
there) it seems that a highly tweaked Davis-Putnam is pretty hard to beat.

> How does the complexity of this algorithm for
> factoring-satisfiability instances compare with factoring algorithms?

Unfavorably, if my experience is any guide. I have written the same kind
of program, one that converts factoring instances into compact CNF-SAT
instances. With a souped-up satisfiability program called POSIT (by
Jon Freeman (formerly?) at U Penn, this program won the DIMACS challenge
you mentioned), I can factor numbers up to about 32 bits (a few thousand
clauses) in less than one second on a 200MHz Pentium. But when I get to
around 40 bits, it takes a couple of minutes, and any more bits than that
takes is too long for this impatient user to wait. Your CNF-to-DNF algorithm
is doing more work than a satisfiability algorithm like Davis-Putnam
(essentially looking for all solutions instead of just one), which still takes
exponential time, so is less likely to compete with something like the Number
Field Sieve, running in (conjectured) subexponential (but superpolynomial)
time.

> * Are there better algorithms (less time, less memory, smaller solution
> representation) to find a more optimal sum-of-products representations?

My favorite Boolean formula question is similar: "Is there a good algorithm
for the minimum-equivalent-formula problem?" That is, given, say, a CNF,
what is the smallest Boolean formula, DNF or not, that represents the same
function (i.e., has the same truth table)? This is NP^NP-complete, so
I don't expect an answer anytime soon :-)

> * Are there more compact general representations of satisfiability
> solutions than the sum-of-products form?

Yes, binary decision diagrams. BDDs can often represent a Boolean
formula (like your DNFs) that take size exponential in the number of
variables and keep them in a poly-sized representation. Unfortunately,
they can't always do that and conversion from CNF is non-trivial :-(
Of couse, the factoring problem where the number to be factored is
the product of two primes has an O(1) clauses DNF representation, so isn't
a good starting point for questions like that.

One maybe interesting digression: if you restrict your factoring solutions
to be unique by adding the constraint that the factors a, b must come
in order, i.e., additional logic such that a < b, then you move into an
area of SAT not known to be NP-complete. The last paper I read on it,
a couple of years ago, stated that it's still open whether USAT, or unique
satisfiability (deciding instances of SAT guaranteed to have at most one
satisfying assignment) is NP-complete.
--
Daniel Jimenez djim...@dryad.cs.utsa.edu
"I've so much music in my head" -- Maurice Ravel, shortly before his death.
" " -- John Cage

Vladimir Z. Nuri

unread,

Mar 13, 1998, 3:00:00 AM3/13/98

Daniel A. Jimenez (djim...@naiad.cs.utsa.edu) wrote:
[factoring leads to insights in SAT?]
: Probably not. The best algorithms seem to have little to do with the
: bit-represention of the number to be factored.

the general idea of the question is that there must nevertheless be
some such correlation. a number field sieve algorithm can be converted
to some kind of algorithm that operates only on bits. the idea is
that looking at the "route" it takes through the transformation of
the problem to the solution might give one ideas for how to get a good
SAT algorithm. then again, maybe not!!

: Unfavorably, if my experience is any guide. I have written the same kind

: of program, one that converts factoring instances into compact CNF-SAT
: instances. With a souped-up satisfiability program called POSIT (by
: Jon Freeman (formerly?) at U Penn, this program won the DIMACS challenge
: you mentioned),

what did you win? or did pOSIT win?
the award for creating hard instances, or quickly solving
them? p.s. I haven't found good Dimacs web pages on this subject, just
their ftp site, if you know of any please let me know;

I can factor numbers up to about 32 bits (a few thousand
: clauses) in less than one second on a 200MHz Pentium.

one way of looking at the factoring algorithms is that they may using
machine shortcuts for manipulating many bits at a time, such as addition
etc. that the SAT algorithm have to do one bit at a time. however, one
still wonders if this is enough to explain the disparity. it seems not--
that's a huge many orders of magnitude disparity (i.e. I suspect people
can factor 100 bit numbers or so on a pentium or something like that).
in theory, there ought to be a SAT algorithm that solves factoring
instances with the same complexity that the very best factoring
algorithms do.

: takes is too long for this impatient user to wait. Your CNF-to-DNF algorithm

: is doing more work than a satisfiability algorithm like Davis-Putnam
: (essentially looking for all solutions instead of just one), which still takes
: exponential time, so is less likely to compete with something like the Number
: Field Sieve, running in (conjectured) subexponential (but superpolynomial)
: time.

I don't see the logical implication here you are suggesting, that
because NFS takes [x] complexity, my algorithm will take [y] complexity.
the NFS is essentially finding all factors, right?
which is analogous to finding all solutions to the SAT instance.
however, I am certainly not claiming the algorithms I used are all
that fast. I can certainly see a claim that an algorithm that finds
all solutions will in general be slower than davis-putnam.

what would be an example of a subexponential but superpolynomial
function, anyway?

: this s is NP^NP-complete,

not familiar with that class.

: Yes, binary decision diagrams.

yeah, I saw those once on someone's web page. do you know someone
who has investigated it? and has web pages?

: One maybe interesting digression: if you restrict your factoring solutions

: to be unique by adding the constraint that the factors a, b must come
: in order, i.e., additional logic such that a < b, then you move into an
: area of SAT not known to be NP-complete.

not sure what you are saying. last I checked, factoring is not known
to be NP complete. so it seems that all conversions of factoring to
SAT, regardless of additional constraints, are not known to be NP complete.

The last paper I read on it,
: a couple of years ago, stated that it's still open whether USAT, or unique
: satisfiability (deciding instances of SAT guaranteed to have at most one
: satisfying assignment) is NP-complete.

hmmmmmmm. wasn't aware of that. maybe a little loophole in a field in
which it seems there is never any free lunch!

but it is not sufficient to have a<b in that case, you need
to somehow restrict the clauses so you have only one factor pair pop
out. such as factor only numbers that are the product of two large primes.
but what about composite numbers? I can't think of a restriction on the
factors that would give you only one nontrivial factor off the bat.

Alan MacDonald

unread,

Mar 20, 1998, 3:00:00 AM3/20/98

In article <vznuriEp...@netcom.com> Vladimir Z. Nuri,

vzn...@netcom.com writes:
> what would be an example of a subexponential but superpolynomial
> function, anyway?

One nice one is n^(log n). An apparently hard subclass of graph
isomorphism can be solved in this time.

Daniel Jimenez

unread,

Mar 21, 1998, 3:00:00 AM3/21/98

In article <vznuriEq...@netcom.com>,

Vladimir Z. Nuri <vzn...@netcom.com> wrote:
>
>

>someone sent me mail saying that USAT had been proven to
>be NP complete. however, I seem to have misplaced that mail.

I would be interested to hear about this. As recently as 1996 (the date on
Harry Buhrmann's "Six Hypothoses in Search of a Solution" article, the latest
information I have) this had still been unresolved. I thought USAT being
NP-complete implied some sort of catastrophe for the polynomial hierarchy
(although I can't remember where I heard this :-).

>regarding Cook's original proof, I thought he showed that
>the construction of converting a TM to a SAT problem had
>a single solution in the SAT problem. or at least that
>if one could find one solution to the SAT problem, one would
>solve the TM. hence I don't understand why it should be
>a problem for USAT (finding one satisfiability solution)
>to be proven NP complete. apparently I
>am missing some subtlety-- maybe someone could elaborate

USAT is the problem of finding *the* satisfying assignment for a formula
that is guaranteed to have exactly one. This is much different than
general SAT, where there may be anywhere from 1 to 2^(number of variables)
solutions. So it is indeed different.

Think of it this way: suppose you are the Davis-Putnam algorithm, and you
are trying to decide whether to try setting a variable to "true" or "false."
In USAT, you know that exactly one of those choices is right. With a member
of general SAT, both might be right, leading to (?) added complexity. The
question is, does this extra information about the space of solutions help?
--
Daniel Jiménez djim...@bluebonnet.uthscsa.edu

Vladimir Z. Nuri

unread,

Mar 21, 1998, 3:00:00 AM3/21/98

someone sent me mail saying that USAT had been proven to
be NP complete. however, I seem to have misplaced that mail.

regarding Cook's original proof, I thought he showed that

the construction of converting a TM to a SAT problem had
a single solution in the SAT problem. or at least that
if one could find one solution to the SAT problem, one would
solve the TM. hence I don't understand why it should be
a problem for USAT (finding one satisfiability solution)
to be proven NP complete. apparently I
am missing some subtlety-- maybe someone could elaborate

Daniel Jimenez

unread,

Mar 21, 1998, 3:00:00 AM3/21/98

[ Apologies for this late response; neither of the two newsfeeds I use
had Vladimir's latest articles on it; I found this on DejaNews :-( ]

vzn...@netcom.com (Vladimir Z. Nuri) writes:
>Daniel A. Jimenez (djim...@naiad.cs.utsa.edu) wrote:
>[factoring leads to insights in SAT?]

>[stuff deleted]

>: Unfavorably, if my experience is any guide. I have written the same kind
>: of program, one that converts factoring instances into compact CNF-SAT
>: instances. With a souped-up satisfiability program called POSIT (by
>: Jon Freeman (formerly?) at U Penn, this program won the DIMACS challenge
>: you mentioned),
>
>what did you win? or did pOSIT win?
>the award for creating hard instances, or quickly solving
>them? p.s. I haven't found good Dimacs web pages on this subject, just
>their ftp site, if you know of any please let me know;

Jon Freeman won the "Second DIMACS Implementation Challenge" for
quickly solving instances of SAT. His implementation generates several
kinds of "hard" instances (n-queens, graph problems, etc.) of SAT and
accepts arbitrary formulas as well.

Check out http://dimacs.rutgers.edu for more DIMACS info.

>>[djimenez wrote]I can factor numbers up to about 32 bits (a few thousand

>: clauses) in less than one second on a 200MHz Pentium.

Please note that this is in the context of factoring via SAT,
which is in my experience slower than simple trial division.

>one way of looking at the factoring algorithms is that they may using
>machine shortcuts for manipulating many bits at a time, such as addition
>etc. that the SAT algorithm have to do one bit at a time. however, one
>still wonders if this is enough to explain the disparity. it seems not--
>that's a huge many orders of magnitude disparity (i.e. I suspect people
>can factor 100 bit numbers or so on a pentium or something like that).
>in theory, there ought to be a SAT algorithm that solves factoring
>instances with the same complexity that the very best factoring
>algorithms do.

I think moving the factoring problem to the much more general framework of
SAT may destroy or mangle some of the number theoretic information in
the problem, information that algorithms like NFS exploit but that SAT
algorithms can't.

>: takes is too long for this impatient user to wait. Your CNF-to-DNF algorithm
>: is doing more work than a satisfiability algorithm like Davis-Putnam
>: (essentially looking for all solutions instead of just one), which still takes
>: exponential time, so is less likely to compete with something like the Number
>: Field Sieve, running in (conjectured) subexponential (but superpolynomial)
>: time.
>
>I don't see the logical implication here you are suggesting, that
>because NFS takes [x] complexity, my algorithm will take [y] complexity.
>the NFS is essentially finding all factors, right?

Davis-Putnam (DP) finds a single satisfying assignment. My understanding is
that your algorithm converts CNF to DNF, which is equivalent to finding
all satisfying assignments. It is likely that DP is faster than your
algorithm. DP is slower than the Number Field Sieve, since DP = O(2^(n-O(1))),
i.e., exponential, and NFS = something like O(exp(sqrt(log n log log n))),
i.e., subexponential. So it is likely that (in your notation) [x] = o([y]);
that's what I was trying to say.

>what would be an example of a subexponential but superpolynomial
>function, anyway?

See above...

>: this is is NP^NP-complete,

>
>not familiar with that class.

It makes NP-complete look easy. It's what you get when you let a
nondeterministic polynomial time Turing machine use an NP oracle.
See Garey & Johnson, _Computers_and_Intractability_ for more cool
stuff like this.

>: Yes, binary decision diagrams.

>yeah, I saw those once on someone's web page. do you know someone
>who has investigated it? and has web pages?

Yes, it's all over the place. Point your search engine at "binary decision
diagrams" and/or "BDD" and you'll get lots of hits (I got 1240 on AltaVista).
Most papers you will find on it will give a short description of the idea.

>: One maybe interesting digression: if you restrict your factoring solutions
>: to be unique by adding the constraint that the factors a, b must come
>: in order, i.e., additional logic such that a < b, then you move into an
>: area of SAT not known to be NP-complete.
>
>not sure what you are saying. last I checked, factoring is not known
>to be NP complete. so it seems that all conversions of factoring to
>SAT, regardless of additional constraints, are not known to be NP complete.

I am saying that "unique satisfiability is not known to be NP-complete."
I was thinking about this in the context of a general SAT algorithm applied
to factoring. Even if P!=NP, there may be an efficient USAT algorithm that
can solve factoring easily where a general SAT algorithm wouldn't.

> The last paper I read on it,
>: a couple of years ago, stated that it's still open whether USAT, or unique
>: satisfiability (deciding instances of SAT guaranteed to have at most one
>: satisfying assignment) is NP-complete.

>hmmmmmmm. wasn't aware of that. maybe a little loophole in a field in
>which it seems there is never any free lunch!

Right, and hopefully it is a good approach. Of course, USAT falls into
that structural complexity black hole of "conjectured NP-incomplete" problems
like graph-isomorphism and maybe even factoring. NP-incomplete problems
are those problems in NP that have no polynomial time solution but aren't
NP-complete. Nobody can prove USAT or any other problems are NP-incomplete
without first proving P!=NP. But I digress...

>but it is not sufficient to have a<b in that case, you need
>to somehow restrict the clauses so you have only one factor pair pop
>out. such as factor only numbers that are the product of two large primes.
>but what about composite numbers? I can't think of a restriction on the
>factors that would give you only one nontrivial factor off the bat.

Sorry, I was just thinking about instances of factoring where there are
exactly two factors. This is the case in the RSA cryptosystem, which
I understand lures many people (myself included :-) into thinking about
factoring algorithms.

Vladimir Z. Nuri

unread,

Mar 22, 1998, 3:00:00 AM3/22/98

Daniel Jimenez (djim...@bluebonnet.uthscsa.edu) wrote:

: USAT is the problem of finding *the* satisfying assignment for a formula

: that is guaranteed to have exactly one. This is much different than
: general SAT, where there may be anywhere from 1 to 2^(number of variables)
: solutions. So it is indeed different.

I didn't really state my question correctly, but I think it still stands.
maybe someone could elaborate on Cook's original proof. I don't recall that
it would have more than one solution in sat after he converted a TM
to an instance of SAT. that is, as I understand it, he converted not
merely all TMs to SAT (i.e., proving NP completeness) but converted
all TMs to USAT. each instance created from the TM had only a single
solution as I recall (representing the unique flow or total sequence
of the computation of the particular program
represented by the TM)

also, what is a reference on the "six hypotheses" paper? sounds
very interesting;

Claude Chaunier

unread,

Mar 22, 1998, 3:00:00 AM3/22/98

Vladimir Z. Nuri wrote:
> that is, as I understand it, he converted not
> merely all TMs to SAT (i.e., proving NP completeness) but converted
> all TMs to USAT. each instance created from the TM had only a single
> solution as I recall (representing the unique flow or total sequence
> of the computation of the particular program
> represented by the TM)

There isn't a unique flow in general as the TM is non-deterministic.

Here is what is shown in Hopcroft and Ullman: Given any Turing Machine
M in NP and a polynomial p upper bound for its non-deterministic
accepting running time, there is an easy explicit log-space algorithm
which, from any given input string s of length say n, builds a boolean
formula of length O(p(n)^2) with as many satisfiable assignements as
there are possible computations of M accepting s in p(n) steps or less.

Here you see it isn't USAT but only SAT. I don't know as for original
cook's work, but I guess it goes along the same line.

Claude

W.K. van Dam

unread,

Mar 22, 1998, 3:00:00 AM3/22/98

vzn...@netcom.com (Vladimir Z. Nuri) writes:

>also, what is a reference on the "six hypotheses" paper? sounds
>very interesting;

It's by Buhrman, Fortnow and Torenvliet, and you can find it at:
http://www.cs.uchicago.edu/~fortnow/papers/

Cheers,

Wim

JDPeh

unread,

Mar 23, 1998, 3:00:00 AM3/23/98

>
>: USAT is the problem of finding *the* satisfying assignment for a formula
>: that is guaranteed to have exactly one. This is much different than
>: general SAT, where there may be anywhere from 1 to 2^(number of variables)
>: solutions. So it is indeed different.
>

Interestingly, every boolean formula has, uniquely, a fixed, maximum number
of solutions, from 0 to 2^n.

Lane A. Hemaspaandra

unread,

Mar 27, 1998, 3:00:00 AM3/27/98

>>
>>someone sent me mail saying that USAT had been proven to

>>be NP complete....

then they implicitly sent you email saying that NP = coNP (see below).
save that email, especially if it included a proof. ;-}

>USAT is the problem of finding *the* satisfying assignment for a formula
>that is guaranteed to have exactly one. This is much different than

USAT, in standard use, is simply a set and, at that, not one that has
any "promise" built in (but see below).

Some brief pointers/info on unique satisfiability:

1) There are both function and language issues related to this. The
key function one, which was an open issue for quite a while, is: Can
even NP (function-computing) machines find exactly one satisfying
assignment whenever given as input a satisfiable formula?
Hemaspaandra et al. ("Computing Solutions Uniquely Collapses the
Polynomial Hierarchy," L. Hemaspaandra, A. Naik, M. Ogihara, and
A. Selman, SIAM Journal on Computing, V. 25, #4, pp. 697--708, August
1996) recently proved that, unless the polynomial hierarchy collapses,
*no* NP (function-computing) machine can do this. That is, culling
down from lots of solutions to one is not possible in this setting
(unless the polynomial hierarchy collapses). [Shameless plug: It is a
fun paper!]

Cute point: NP (function-computing) machines can easily find (within
the standard notion of multi-valued NP functions) ALL satisfying
assignments for an input formula. ***So this is a setting in which
finding exactly one solution is harder than finding all solutions!***
Yes, that sounds paradoxical, but of course it is not: NP machines
find their many solutions along lots of paths, so one cannot just "get
exactly one by taking only the lexicographically first satisfying
assignment," as a given path has not a clue if the assignment it found
is the lexicographically greatest satisfying assignment (which, after
all, is a coNP-complete issue in general).

2) USAT has a well-known and standard use in the literature. It is
USAT = { F | Boolean formula F has EXACTLY one satisfying assignment}.
USAT is coNP hard with respect to many-one reductions, and (this is
what some of the earlier articles in this thread may be thinking of)
is hard for NP with respect to randomized reductions ("NP is as Easy
as Detecting Unique Solutions, Valiant-Vazirani, TCS, V. 47, 1986).
Given that USAT is coNP-hard and is also in the class DP (which itself
contain NP and is in the polynomial hierarchy), it is easy to see that
USAT is NP-complete iff NP = coNP. (So the email that you received
saying that USAT is NP-complete is worth saving, if it includes a
correct proof.)

3) There is a closely related notion that, very informally, sort of
tries to get at the issue of "suppose we promise to limit our
attention just for those formulas that have 0 or 1 solutions, and we
don't care what garbage happens for others---what about that?" (The
idea of focusing on just a subpart of a set has a long history, see,
e.g., "The Complexity of Promise Problems with Applications to
Public-Key Cryptography," S. Even and A. Selman and Y. Yacobi,
Information and Control, V. 61, 1984.) That is, for any predicate Q,
let USAT_Q be the set that on formulas with 0 or 1 solutions agrees
with USAT, but that for other inputs, y, accepts if and only if Q(y)
holds. This dates back to the above-mentioned Valiant-Vazirani paper,
which in fact shows that for ANY predicate Q, the set USAT_Q is hard
for NP with respect to randomized reductions. Of course, it is
important not to confuse these notions, and some of the previous
emails in this thread sort of seem to be doing that. The root of the
confusion is might be that the Buhrman et al. paper uses "Unique-SAT
\in P" to mean "(\exists Q) [Unique-SAT_Q \in P]"---though they do
explicitly warn the reader (note carefully the last sentence of their
Definition 2.6) that, as a matter of notation, the former statement
should always be read as a macro for the latter statement for the
duration of their paper.

4) Finally, there is the good old class UP, of Valiant. However, it
is an open problem whether UP=NP (equivalently, whether "some UP set
is NP-complete") can be proven to imply the collapse of the polynomial
hierarchy. It is a nice open issue, as is the question of whether
P=UP can be proven to imply the collapse of the polynomial hierarchy.
(Of course, if we assume P=UP and UP=NP *simultaneously*... ;-} )

Happy theorem-proving!

Cheers,
Lane

0 new messages