I've recently come across a broken Blowfish implementation.
A correct Blowfish implementation has the Feistel/round-function:
F(a,b,c,d) = ((S_1[a] + S_2[b]) ^ S_3[c]) + S_4[d]
where a,b,c,d are 8 bits and S_1 to S_4 are the four S-boxes.
The broken Blowfish implementation has, instead:
F(a,b,c,d) = ((S_1[a] + S_1[a]) ^ S_1[a]) + S_1[a]
So only one of the four S-boxes / 8 bit values is actually used
during encryption/decryption, and the range of F is considerably
How weak is this broken Blowfish implementation? I can't
wrap my head around the key schedule so my attempt at
cryptoanalysis has failed. If anyone here can see some
obvious weaknesses I'd like to here about them.
For what it's worth (which is not very much), I don't immediately see
any plausible way to exploit this on the full Blowfish, though I haven't
spent any serious amount of time thinking about this. 16 rounds, with
key-dependent S-boxes, forgives many sins.
The original Blowfish paper writes:
I used four different S-boxes instead of one S-box primarily to avoid
symmetries when different bytes of the input are equal, or when the
32-bit input to function F is a bytewise permutation of another 32-bit
input. I could have used one S-box and made each of the four different
outputs a non-trivial permutation of the single output, but the four
S-box design is faster, easier to program, and seems more secure.
If the four indexes chose values out of the same S-box, a more complex
combining function would be required to eliminate symmetries. I
considered using a more complex combining function in Blowfish (using
modular multiplications, rotations, etc.), but chose not to because
the added complication seemed unnecessary.
This certainly seems like good design, but I don't immediately see any
attack that would be worth worrying about in practice.
If I controlled the broken code and were in charge of defense, I'd
probably schedule a fix to this code at some convenient future version
of the software, to be safe -- but I wouldn't issue an emergency patch.
If I were in charge of attacking such a system, I suppose I'd spend more
time thinking about possible cryptanalytic attacks, but my first instinct
is that I'm not too optimistic this will lead to a practical attack.
If I were attacking the system, I'd probably look for other kinds of
implementation bugs (see, e.g., http://www.schneier.com/blowfish-bug.txt)
as well as other ways to defeat the system security without breaking
the crypto head-on.
Note that these not only use the same s-box, but
the same byte input!
>For what it's worth (which is not very much), I don't immediately see
>any plausible way to exploit this on the full Blowfish, though I haven't
>spent any serious amount of time thinking about this. 16 rounds, with
>key-dependent S-boxes, forgives many sins.
Seems to me that it has horrible differential
cryptanalysis properties; changes to any of the
bytes other than "a" propagate straight through.
Also, I have a feeling that F, as described, has a
very limited range. (But I haven't checked.)
232B EC8F 44C6 C853 D68F E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au
Greg Rose wrote:
> Note that these not only use the same s-box, but
> the same byte input!
Eek. This is terrible! Thanks, Greg. I don't know how I
missed that; I guess I need to read more carefully.
So this looks a severe weakness with serious implications.
Observations: The only non-linearity here comes from applying the S-box
to the a-part of the left side and the right side. The rest is entirely
linear. That means that the ciphertext depends non-linearly on 16 bits
of the plaintext and linearly on 48 bits of the plaintext. Not good.
Also, the F-function does indeed have a reduced range: the low 2 bits
of the output of the F-function will always be zero, so 2 bits of each
side of the plaintext are not affected by any nonlinearity.
Attack 1: The low 2 bits of the left part of the ciphertext will be the
low 2 bits of the left part of the plaintext, xor a fixed 2-bit value
(this fixed 2-bit value is determined solely by the key). Same for
the low 2 bits of the right part of the text. So if you have one known
plaintext, you can derive these 4 bits of key material, and then every
subsequent ciphertext leaks 4 bits of information about the plaintext.
Depending upon how the application uses this cipher, this alone could
potentially be devastating. There are variants of this attack that may
apply even if you do not have any known plaintext but if you have a good
statistical model of those 4 bits of the plaintext.
Attack 2: Suppose that we fix the a-part of the left and right side
of the plaintext (16 bits fixed), and let the remaining 48 bits of
the plaintext vary. Then the a-parts of the left and right side of the
ciphertext will stay the same, and the remaining 48 bits of the ciphertext
will be just the xor of the 48 plaintext bits against some fixed value
that depends only upon the key and the 16 fixed bits of plaintext.
Suppose we see a ciphertext C, whose decryption P is unknown. Suppose we
can find some other known plaintext/ciphertext pair (P',C') such that
the a-parts of C match the a-parts of C' (a 16-bit condition). Then we
can conclude that the a-parts of P match the a-parts of P', so we learn
16 bits of information about the decryption of C. Also, the remaining
48 bits just involve an xor with a constant that is the same for both
(P,C) and (P',C'), so we can recover the remaining 48 bits of P trivially
as well (it's just the 48 bits C xor P' xor C'). So if we can find a
known text-pair whose ciphertext matches C in 16 bit positions, then
we can decrypt C. It follows that if we have a collection of around
2^16 known text pairs, then we'll be able to decrypt most ciphertext we
subsequently see. If we have only a fraction of that many known texts, we
can decrypt a corresponding fraction of the subsequent ciphertexts we see.
Attack 3: Suppose we see a lot more than 2^16 ciphertexts, and we have
some statistical model on the corresponding plaintexts. Then we may be
able to decrypt all of the traffic. For instance, the 16-bit a-parts
of each ciphertext are encrypted under a simple substitution cipher,
so we may be able to use frequency analysis to recover some information
about that. Likewise, we can partition the ciphertext into buckets,
where the ciphertexts in each bucket all have the same 16-bit a-parts.
Then within a bucket, the other 48 bits are "encrypted" simply by xor-ing
with a fixed constant (the same constant for all ciphertexts in a bucket,
but a different constant in each bucket). If the plaintexts follow some
statistical distribution, say, that of English, and if we have enough
ciphertexts in each bucket, it may be able to recover cleartext.
Summary: If I haven't made any mistakes in the analysis above, this
cipher looks broken. If I were a defender in charge of software that
used this cipher, I'd do an urgent investigation to examine the impact:
the impact could be significant and might well warrant an emergency patch,
depending upon the circumstances.
Thanks for the analysis. I have informed the software
vendor of the problem.