Black box algorithm solution

Dennis Yurichev

unread,

Aug 29, 2008, 8:32:35 AM8/29/08

to

Hi.

It is possible to find out exact algorithm (in some assembler-like
pseudo-code) of some black box using only input-output pairs? Let's
say, it's very simple algorithm, just about 10 usual C++ lines (or
less), and we have few function arguments and one output. And we may
have up to few millions input-output pairs for our analysis.
Are there some known automated methods to do this? Is Prolog suitable
to? Are there any known solvers written in Prolog?

Chip Eastham

unread,

Aug 29, 2008, 9:58:52 AM8/29/08

to

Hi, Dennis:

For a smallish combinatorial space of inputs vs.
outputs, one could replace the function with a
pre-computed lookup table.

If something concrete is known about the nature
of the function (computation) performed by the
black box, and the problem is one of determining
parameters used in the computation, then perhaps
an analysis can be automated. Prolog might be a
good implementation language, depending on the
nature of the underlying algorithm.

Of course the problem can still be a hard one,
even if the general template of the algorithm
is known. This is the whole point of a public
key/trap-door encryption function.

Do you even know the API of the function being
modelled? I.e. the datatypes of the formal
arguments (inputs) and the one output?

regards, chip

Stephen Horne

unread,

Sep 27, 2008, 4:35:26 PM9/27/08

to

On Fri, 29 Aug 2008 05:32:35 -0700 (PDT), Dennis Yurichev
<Dennis....@gmail.com> wrote:

>It is possible to find out exact algorithm (in some assembler-like
>pseudo-code) of some black box using only input-output pairs?

Assuming it's combinatorial (no state preserved between calls) and
provided that the input types are discrete, finite and known, yes.

In principle, it's no different to specifying a logic gate in terms of
the truth table. The generalised truth table is called a decision
table, and it's fairly common to generate a decision tree from this.
The decision tree basically applies inputs in turn to partition the
various cases until a single solution is found - the trick is in
selecting the order to assess the inputs in.

The ID3 algorithm is a common approach to mapping from a (usually
incomplete) decision table to a decision tree, based on information
theory. At each step, it checks the input whose value can (on average)
be expected to reveal the most information. It has a significant
restriction in the form of the decision tables it can take. ID3 (at
least in the form I've always seen it described) cannot create a
decision tree for an exclusive-OR logic gate.

http://en.wikipedia.org/wiki/ID3_algorithm

It is possible to modify ID3 to handle any combinatorial decision
table, and I've done so in the past, but it's a bit of a pain. I still
have my notes as an 18 page PDF if you're really interested. It needs
to be read with a fair bit of scepticism - when I worked it out, I had
never read up on C4.5 which is a common ID3 optimisation.

With continuous inputs and outputs, a lot can still be done, even
though you'd obviously need an infinite number of sample points to be
100%. Interpolation is usually more reliable than extrapolation, but
even with interpolation there are issues. A simple example is the
appearance of helicopter blades on TV, that sometimes seem to be
standing still or turning slowly backwards due to the sampling
implicit in the video. In DSP, this particular issue is called
aliassing IIRC, but I may be confusing my terminology.

When there is state, the problem is intractible in theory and pretty
difficult in practice even with "well-behaved" systems. Think of it
this way - just because the black box has behaved a certain way up
until now, how do you know that it will continue to do so? How do you
know it isn't counting to the millionth call, or that it doesn't
behave differently every Friday the thirteenth or something?

Also, each test point changes the state, but you have no way to track
the state even for well-behaved systems. It is difficult to test every
possible input that transitions from a given state because you have no
way to determine whether you have genuinely returned to that same
state in order to test alternatives.

You have to start asking questions in terms of degrees of belief. For
that, you need to think in terms of Bayesian inference.

The following has a PDF ebook download that is very much worth
reading.

http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Despite the difficulty/impossibility of this problem, reverse
engineering often involves doing exactly what you describe.

In principle, Prolog should make it very simple to work with decision
tables - you just declare all the cases, then query giving whatever
inputs you know, and Prolog delivers all possible cases. However, I'm
sure doing this for real-world problems and making it work is much
harder. I'm no expert, but if Prolog really achieved the advertised
"just declare what you want" simplicity, there'd be no need for Prolog
experts.

Waldek Hebisch

unread,

Sep 28, 2008, 11:09:44 AM9/28/08

to

No if you have arbitrary algorithm and practically sized set of
input-output pairs. Imagine password checking routine: if you give
it a string, if the string matches password then routine outputs "Yes",
otherwise it outputs "No". Reconstructing algorithm for such
routine is equivalent to guessing password in reasonable number
of trials.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

Stephen Horne

unread,

Sep 28, 2008, 3:13:50 PM9/28/08

to

On Sun, 28 Sep 2008 15:09:44 +0000 (UTC), Waldek Hebisch
<heb...@math.uni.wroc.pl> wrote:

>Reconstructing algorithm for such
>routine is equivalent to guessing password in reasonable number
>of trials.

Password cracking software routinely does precisely that. A
"reasonable number" of trials may potentially be billions even on a
home PC, or many orders of magnitude greater in some contexts,
depending on the time and resources available and the expected benefit
from breaking the password.

The combinatorial explosion obviously implies limits, and those limits
are very quickly reached, but a lot can be done in practice if only
because many interesting black boxes simply aren't that complex.

jca...@gmail.com

unread,

Sep 29, 2008, 5:38:52 AM9/29/08

to

There are automated methods to do this in Prolog. There area that
deals
with it is called Inductive Logic Programming. Its purpose is
precisely
to infer the predicate meaning just having examples of input-output.

For instance, one can learn quick sort just having examples of input/
output like:

qsort([4,5,1,3], [1,3,4,5]).
qsort([6,5,8,2], [2,5,6,5]).
...

Naturally you have to quick some hints as to what predicates should
appear on the body
of the predicate you are trying to infer. Inductive Logic Programming
is a generalization
of Decision Trees (e.g. c4.5) to first order logic.

There are some good ILP systems out there. One of the most used is
Aleph:
http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph.html

It is well documented and you can find some examples of its usage. It
requires YAP Prolog
or SWI.

Hope this as helpful,

Jose Santos