> Hi.
> What we have: we have a black box with some unknown crypto algorithm
> inside.
Invest in a screwdriver.
--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
High performance programmable parallel hardware helps a lot:
http://www.annapmicro.com/news.html
--
Scott McPhillips [VC++ MVP]
I know, but I'm looking for any known methods or ideas.
This sounds somewhat related to Kolmogorov complexity. You may have to
account for the Halting Problem, but there are probably steps you can
take to deal with that.
I guess my first question is how much you know about the algorithm
inside the black box. For example, you know that it is "very compact",
but do you have concrete information about how compact? And do you
know what the algorithm is implemented in terms of? That is, do you
know that it is "very compact" when implemented using x86 instructions?
The reason I ask this is that the compactness of actual code depends
on the instruction set. For example, suppose you have one instruction
set that just has the regular integer operations. Now compare that to
another instruction set that has single instruction for "compute the
SHA1 sum of a string". You can put an arbitrary algorithm into an
instruction, and when you do that, you affect the compactness of the
code written using that instruction set.
So, let's suppose that your black box contains not an algorithm, but
a compact *implementation of an algorithm* in a known instruction set.
Perhaps it's x86, although it doesn't really matter which one it is.
Let's also suppose you have an actual specific upper bound on the
size of the implementation. Say, for example, you know it's 10
instructions long.
If both these things are true, then it will be possible to find the
algorithm. All you have to do is create all possible programs up
to the maximum length, then run those programs and see if they get
the same output as the black box does. As you find that algorithms
get different output, eliminate them.
If you find that you have eliminated all but one algorithm, this
remaining algorithm must be the same as the one in the black box.
This is because there must be at least one algorithm (after all,
the black box contains it, so it does exist), and since the others
had different output, they aren't it. Therefore, the one that has
not been eliminated must be it. This is essentially just the
process of elimination.
The question then becomes whether you can eliminate all but one
algorithm. This is a bit tougher. Algorithms that aren't equivalent
must eventually differ. And they must differ on a finite input.
Since you can go through all possible inputs (starting with the short
ones and moving towards the longer ones), eventually you must reach
a point where algorithms will have different outputs if they are not
in fact equivalent algorithms. Unfortunately, there is no way to
know how long this will take.
Naturally, if you eliminate all algorithms but one, then you are
done, because you know there is at least one algorithm that is
correct, so you would be finished in that case. The problem comes
in where you might have more than one sequence of code which are
equivalent implementations of the same algorithm! Unless you have
a general way to examine a set of sequences of code and determine
whether they are equivalent, I don't see a way around this.
I think this means that, in the general case, this problem cannot
be solved.
- Logan
> I guess my first question is how much you know about the algorithm
> inside the black box. For example, you know that it is "very compact",
> but do you have concrete information about how compact? And do you
> know what the algorithm is implemented in terms of? That is, do you
> know that it is "very compact" when implemented using x86 instructions?
I think, roughly, less than 50 x86 instructions.
The interesting point is that some parts of algorithms may be
eliminated.
For example, two instructions together like "a=a+1; a=a-1;" are
nonsence and, can be such things eliminated before algorithm is
checked for inputs and outputs?
Just interesting, is there any known papers which already have what
I'm looking for..
There are two very compact algorithms out there that it *could* be, and
they are quite common: AES and RC4. Considering that you said that it
looks like a LFSR, and that the code size is about 50 x86 instructions,
I would guess RC4 or some variation of it.
--
Daniel Rudy
Email address has been base64 encoded to reduce spam
Decode email address using b64decode or uudecode -m
Why geeks like computers: look chat date touch grep make unzip
strip view finger mount fcsk more fcsk yes spray umount sleep
Oh no, LFSR is just as an example. Algorithm can be not related to
crypto field. By interest is in consideration of all possible
algorithms using reasonable resources and time.
> By interest is in consideration of all possible
> algorithms using reasonable resources and time.
If you mean that you want to consider all possible algorithms, but only
take reasonable time/space to do it, then give up now.
There is a /much/ more than astronomical number of possible
computations from, say. 100 bytes of input to 100 bytes of output.
There is a more than astronomical number of possible programs which can
be expressed in, say, 150 bytes of i86 binary code. The mapping from
one to the other is not continuous (small changes to the program can
cause large changes to the output, and small changes to the desired
output may require large changes to the program). The mapping is not
only non-continuous, it is also non-computable.
That said, if you are happy with very crude approximations, and are not
trying to recover a target transformation which has been designed to be
difficult/impossible to recover (e.g. any crypto algorithm when
considered together with its key(s)), then you may be able to use
techniques of machine learning. Read about "genetic programming" for
instance:
http://www.genetic-programming.org/
http://en.wikipedia.org/wiki/Genetic_programming
(Warning: both the above links include what stikes me as excessive
levels of hype.)
-- chris
you can learn a lot about this algorithm if you test it with right input data of
small length (one byte, word or dbl. word). If this algorithm is linear (in any
way) then you can expect that result of binary input 010001 is (some sort of)
linear combination or results from inputs 010000 and 000001. This is just one
idea you can use and I don't think you will find some universal solution for
your problem.
But you just said it was crypto. You asked us about how to find out
what it is, so we told you. I gave you a couple of good guesses to try
out on your own.
What is it that you want? You either want to find what the algorithm
is, or not. It's your call.