Re-rolled Salsa20 function

Paul Rubin

unread,

Sep 26, 2005, 4:42:48 AM9/26/05

to

The unrolled-loop version of salsa20 from DJB's document is the
obvious way to implement it: it doesn't take much code space and it's
extremely fast. But seeing the underlying structure from the pattern
of constants (for me at least) takes some head scratching. By
de-unrolling the loop and using a macro ("XX") to express how
coordinates wrap around in the matrix, we get a less efficient, but
very concise expression of the function (r stands for row and c stands
for column):

#define R(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
typedef unsigned int uint32;

void salsa20_word_specification(uint32 out[16],uint32 in[16])
{
int i, r, c;
uint32 x[16];
int z[] = {7,9,13,18};

for (i = 0;i < 16; ++i) x[i] = in[i];

#define XX(r,c) x[((4*(r)) + ((c)%4)) % 16]
for (i = 20; i > 0; i -= 2) {
for (r=0; r<4; r++)
for (c=0; c<4; c++)
XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r), z[c]);

for (r=0; r<4; r++)
for (c=0; c<4; c++)
XX(r,r+c+1) ^= R(XX(r,r+c) + XX(r,r+c+3), z[c]);
}
#undef XX

for (i = 0;i < 16; ++i) out[i] = x[i] + in[i];
}

Notice how the second half of each double round is the same as the
first half, except the coordinates are swapped.

I'm sure DJB already knows about all this but I think it's pretty cool.

David Wagner

unread,

Sep 26, 2005, 5:44:44 AM9/26/05

to

Paul Rubin wrote:
> for (i = 20; i > 0; i -= 2) {
> for (r=0; r<4; r++)
> for (c=0; c<4; c++)
> XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r), z[c]);
>
> for (r=0; r<4; r++)
> for (c=0; c<4; c++)
> XX(r,r+c+1) ^= R(XX(r,r+c) + XX(r,r+c+3), z[c]);
> }
>

>Notice how the second half of each double round is the same as the
>first half, except the coordinates are swapped.

Interesting. It sounds like we can view each pair of double-rounds
as a single-round, followed by a 'matrix transpose', followed by another
single-round (the same as the first), followed by another matrix
transpose. In other words, we can equivalently view it as 20 iterations
of single-round + transpose, like this:
for (i = 20; i > 0; i--) {

for (r=0; r<4; r++)
for (c=0; c<4; c++)
XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r), z[c]);

XX = transpose(XX);
}
where transpose(XX)(i,j) = XX(j,i). Does this sound right?

If this is right, then I notice that there doesn't seem to be any
round-dependence: all 20 rounds compute the same function. Don't we
have to worry about slide attacks? Has there been any analysis of
security against such attacks? The salsa20 security analysis claims
that slide differentials are hopeless, with no justification; I do
not understand where that statement comes from.

For instance, suppose that U[] is one input, and V[] is another input
derived by applying a single round and transpose to U[]. Compute UU =
salsa20(U) and VV = salsa20(V). Then we'd expect to find the relation
that VV-V is the result of applying a single round and transpose to UU-U.
This is a relationship that a random oracle wouldn't have. It's not
clear to me whether the salsa20 encryption algorithm allows inputs of
this form to be found; I don't see any obvious way to do it.

As another example, suppose that X is such that every word of X is the
value 0x80..0 (i.e., 1<<31). Then we find that a single round leaves
X unchanged. The same is true for the transpose. Thus salsa20(X) = X.
This is an extension of an example shown in the salsa20 spec, which
notes that salsa20(all-zeros) = all-zeros. I think these are the only
two trivial fix-points of this form. I note that the salsa20 encryption
algorithm doesn't allow inputs to be of this form.

Similarly, there is a differential characteristic of probability one for
salsa20: if dX = X ^ X' = (every word is 1<<31), and dY = salsa20(X) ^
salsa20(X') = dX with probability 1. The salsa20 encryption algorithm
doesn't allow inputs compliant with this differential characteristic.

This all makes clear that the salsa20 hash should not be used as a
general-purpose hash function, but it might be perfectly suitable for
its intended use in the salsa20 encryption algorithm (which is all that
has ever been claimed about the salsa20 hash, of course).

Paul Rubin

unread,

Sep 26, 2005, 6:12:07 AM9/26/05

to

d...@taverner.cs.berkeley.edu (David Wagner) writes:
> transpose. In other words, we can equivalently view it as 20 iterations
> of single-round + transpose, like this:
> for (i = 20; i > 0; i--) {
> for (r=0; r<4; r++)
> for (c=0; c<4; c++)
> XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r), z[c]);
>
> XX = transpose(XX);
> }
> where transpose(XX)(i,j) = XX(j,i). Does this sound right?

Yes, I believe so; I noticed the same thing though I didn't test it on
a computer. I didn't write it that way because I'd have had to code
the transpose operation, and I was only looking for a concise spelling
of the salsa20 function and wasn't thinking about cryptanalysis at the
time. The rest of your post is very well taken and I'm interested to
hear what DJB has to say. Certainly those issues with the hash
function should be mentioned in any security analysis. Wow!

Paul Rubin

unread,

Sep 26, 2005, 6:14:58 AM9/26/05

to

d...@taverner.cs.berkeley.edu (David Wagner) writes:
> for (i = 20; i > 0; i--) {
> for (r=0; r<4; r++)
> for (c=0; c<4; c++)
> XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r), z[c]);
>
> XX = transpose(XX);
> }

Of course, simply inserting the round number (a la Skipjack) may be
enough to fix the slide attack:

XX(r+c+1,r) ^= R(XX(r+c,r) + XX(r+c+3, r) + i, z[c]);

You'd probably want i to count upward rather than downward but either
way would have the same effect.

Paul Rubin

unread,

Sep 26, 2005, 6:59:41 AM9/26/05

to

d...@taverner.cs.berkeley.edu (David Wagner) writes:
> XX = transpose(XX);
> }
> where transpose(XX)(i,j) = XX(j,i). Does this sound right?

Yes, I tested it: <http://www.nightsong.com/phr/crypto/salsa20.c>.
The test vector is from DJB's spec.

D. J. Bernstein

unread,

Sep 26, 2005, 7:25:59 AM9/26/05

to

David Wagner wrote:
> In other words, we can equivalently view it as 20 iterations of
> single-round + transpose

Yes. This is mentioned in the Salsa20 design document.

> For instance, suppose that U[] is one input, and V[] is another input
> derived by applying a single round and transpose to U[].

You're confusing Salsa20 with a block cipher. The critical point is that
the 512-bit Salsa20 array is not an input chosen by the attacker; it has
128 bits chosen by the attacker, 256 key bits, and 128 constant bits.

Try reading the definition of the Salsa20 expansion function in the
Salsa20 specification; or Section 6 of the Salsa20 security document; or
the comments on reduced communication in the Salsa20 design document.

---D. J. Bernstein, Professor, Mathematics, Statistics,
and Computer Science, University of Illinois at Chicago

D. J. Bernstein

unread,

Sep 26, 2005, 7:48:43 AM9/26/05

to

Paul Rubin wrote:
> Of course, simply inserting the round number (a la Skipjack) may be
> enough to fix the slide attack:

There is no slide attack on the Salsa20 cipher. There are many more
state words than attacker-controlled inputs. This is discussed in detail
in the Salsa20 security document.

A round counter is one way to introduce a state word that the attacker
doesn't control, but I don't think it's the best design. I see each word
modification as an opportunity for diffusion, and I see each use of a
round counter as a missed opportunity, requiring more modifications (and
therefore less speed) to reach the same security level.

Paul Rubin

unread,

Sep 26, 2005, 8:26:26 AM9/26/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:
> You're confusing Salsa20 with a block cipher. The critical point is that
> the 512-bit Salsa20 array is not an input chosen by the attacker; it has
> 128 bits chosen by the attacker, 256 key bits, and 128 constant bits.

OK, but I think the doc should be updated to warn people not to use
the Salsa20 primitive as a general purpose hash function. That's not
so obvious to figure out.

D. J. Bernstein

unread,

Sep 26, 2005, 9:37:08 AM9/26/05

to

Paul Rubin wrote:
> OK, but I think the doc should be updated to warn people not to use
> the Salsa20 primitive as a general purpose hash function.

The Salsa20 hash function is a low-level primitive mapping 64-byte
strings to 64-byte strings. Even the most confused reader can't possibly
think that this is meant to be a drop-in replacement for MD5; it doesn't
compress the input!

Anyway, the documentation already (1) points out that the Salsa20 hash
function has visible structure and (2) explains exactly how the Salsa20
encryption function chooses diagonal constants to eliminate all visible
structure. See Section 4 of the Salsa20 security document. See also the
first three examples in Section 8 of the Salsa20 specification.

xmath

unread,

Sep 26, 2005, 9:55:18 AM9/26/05

to

> In other words, we can equivalently view it as 20 iterations
> of single-round + transpose

My altivec implementation actually does precisely that: it uses 4x
vector add+rl+xor to do the round, and 3x vector permute to do the
transpose.

xmath

unread,

Sep 26, 2005, 10:24:26 AM9/26/05

to

D. J. Bernstein wrote:
> Paul Rubin wrote:
> > OK, but I think the doc should be updated to warn people not to use
> > the Salsa20 primitive as a general purpose hash function.
>
> The Salsa20 hash function is a low-level primitive mapping 64-byte
> strings to 64-byte strings. Even the most confused reader can't possibly
> think that this is meant to be a drop-in replacement for MD5; it doesn't
> compress the input!

Just curious, is there a good way to make Salsa20 into a
collision-resistant hash? The simplest thing I'd come up with is:

* append a 0x80 byte to the message
* split message into 16-byte blocks (zero-pad last block)
* for each block, apply salsa20 with
- as key, bytes 4..19 and 44..59 of the previous output
- as nonce, the message block
* message digest is bytes 0..3, 20..43, 60..63 of last output

note that I conveniently leave the 32 key bytes in place which should
ease implementation, and use the 32 non-key bytes as final output which
should make it impossible to calculate H(x, y) from H(x) for any y
(where . is concatenation).

it may be wise to use different constant-words than for encryption, and
some initial key would need to be picked (opportunity to make it a MAC
for those who can't wrap their head around Wegman-Carter? ;-)

would this have any chance of being a secure general-purpose 256-bit
hash? if not, could some other construction work?

- xmath

D. J. Bernstein

unread,

Sep 26, 2005, 12:54:09 PM9/26/05

to

xmath wrote:
> Just curious, is there a good way to make Salsa20 into a
> collision-resistant hash?

Certainly. In particular, it's safe to build a compression function in
the way you suggest, hashing 16 constant bytes, 16 message bytes, and 32
chaining bytes. It's also safe to replace the 16 constant bytes with 16
additional chaining bytes. Either approach produces a conservative hash
function somewhat slower than SHA-256.

More interesting possibilities would break the traditional chaining
structure. Applying Salsa20 to each message block in parallel produces
an output string that seems hard to control and thus easy to compress.
For example, it seems difficult to find collisions in the 64-byte string
Salsa20(m0,c) ^ Salsa20(m1,c^1), where m0, m1 are 48-byte message blocks
and c is a 16-byte constant. Similarly, it seems difficult to find
collisions in the 64-byte string

Salsa20(Salsa20(m0,c) ^ Salsa20(m1,c^1))
^ Salsa20(Salsa20(m2,c) ^ Salsa20(m3,c^1) ^ 1)

where m0, m1, m2, m3 are 48-byte message blocks. The general idea is to
start from the string m0,c,m1,c,m2,c,... and then hash, in parallel,
each 128-byte block b0,b1 to Salsa20(b0) ^ Salsa20(b1^1). The final
64-byte output can be fed through Salsa20 again and truncated to 32
bytes, producing a 256-bit hash with one Salsa20 invocation for each 32
message bytes.

David Wagner

unread,

Sep 26, 2005, 12:56:23 PM9/26/05

to

D. J. Bernstein wrote:
>There is no slide attack on the Salsa20 cipher. There are many more
>state words than attacker-controlled inputs. This is discussed in detail
>in the Salsa20 security document.

I read the discussion in the Salsa20 security document, but I didn't
find it convincing. There was an assertion, without justification,
that the Salsa20 encryption algorithm is not susceptible to slide attacks
due to the way the inputs to the Salsa20 hash are formed. That's a very
plausible claim, but I would have liked to see the analysis demonstrating
the evidence for this claim.

It seems plausible that there exist pairs of inputs that form a slid pair.
The difficulty seems to be in finding them efficiently. However, I
haven't done the careful analysis to see exactly how the details work out.

If I had to guess, I too would guess that Salsa20 encryption is secure
from slide attacks. I'd rather not have to guess, though.

David Wagner

unread,

Sep 26, 2005, 12:58:33 PM9/26/05

to

D. J. Bernstein wrote:

>David Wagner wrote:
>> For instance, suppose that U[] is one input, and V[] is another input
>> derived by applying a single round and transpose to U[].
>
>You're confusing Salsa20 with a block cipher. The critical point is that
>the 512-bit Salsa20 array is not an input chosen by the attacker; it has
>128 bits chosen by the attacker, 256 key bits, and 128 constant bits.

No, I'm not. I understand all that. You deleted the part where I wrote:

It's not clear to me whether the salsa20 encryption algorithm allows
inputs of this form to be found; I don't see any obvious way to do it.

So, I'll say it again. The above is a slide property on the Salsa20
hash function. Due to the way inputs to the Salsa20 hash function are
constructed, I don't see any obvious way to turn this into an attack on
the Salsa20 encryption algorithm -- but that doesn't mean there isn't one.

xmath

unread,

Sep 26, 2005, 1:34:34 PM9/26/05

to

D. J. Bernstein wrote:
> More interesting possibilities would break the traditional chaining
> structure. Applying Salsa20 to each message block in parallel produces
> an output string that seems hard to control and thus easy to compress.
> For example, it seems difficult to find collisions in the 64-byte string
> Salsa20(m0,c) ^ Salsa20(m1,c^1), where m0, m1 are 48-byte message blocks
> and c is a 16-byte constant. Similarly, it seems difficult to find
> collisions in the 64-byte string
>
> Salsa20(Salsa20(m0,c) ^ Salsa20(m1,c^1))
> ^ Salsa20(Salsa20(m2,c) ^ Salsa20(m3,c^1) ^ 1)
>
> where m0, m1, m2, m3 are 48-byte message blocks. The general idea is to
> start from the string m0,c,m1,c,m2,c,... and then hash, in parallel,
> each 128-byte block b0,b1 to Salsa20(b0) ^ Salsa20(b1^1). The final
> 64-byte output can be fed through Salsa20 again and truncated to 32
> bytes, producing a 256-bit hash with one Salsa20 invocation for each 32
> message bytes.

hm, so say the message is split into 48-byte chunks, each formatted
into 64-bit blocks by inserting constants into the diagonal, and I call
those blocks m0, m1, m2, etc.

define F(x) = Salsa20(Salsa20(x))
define C(x,y) = Salsa20(x) ^ Salsa20(y^1)

then if the number of blocks is a power of two, you get:
F(m0)
F(C(m0,m1))
F(C(C(m0,m1),C(m2,m3)))
F(C(C(C(m0,m1),C(m2,m3)),C(C(m4,m5),C(m6,m7))))

that's 2^(n+1) invocations of Salsa20 for every 2^n 48-byte message
blocks, which is one Salsa20 invocation for every 24 message bytes.

so I'm not sure where you get the "one Salsa20 invocation for each 32
message bytes"... did I misunderstand your scheme?

I'd also estimate some might find it somewhat annoying that you need
O(log(k)) memory for hashing a length-k message.

how about plain
Salsa20(m0) ^ Salsa20(m1^1) ^ Salsa20(m2^2) ^ .. ^ Salsa20(mk^k)

this gives one Salsa20 per 48 bytes. As long as k < 2^32, the three
untouched words of the constant should still provide sufficient
protection or not?

or more conservatively, apply Salsa20 on 32-byte message blocks, with
constant *and* separate nonce (block-counter), and XOR those all
together. giving one Salsa20 per 32 bytes.

-xmath

xmath

unread,

Sep 26, 2005, 1:38:19 PM9/26/05

to

xmath wrote:
> how about plain
> Salsa20(m0) ^ Salsa20(m1^1) ^ Salsa20(m2^2) ^ .. ^ Salsa20(mk^k)
>
> this gives one Salsa20 per 48 bytes. As long as k < 2^32, the three
> untouched words of the constant should still provide sufficient
> protection or not?
>
> or more conservatively, apply Salsa20 on 32-byte message blocks, with
> constant *and* separate nonce (block-counter), and XOR those all
> together. giving one Salsa20 per 32 bytes.

both with a final Salsa20 over the whole thing of course

-xmath

xmath

unread,

Sep 26, 2005, 5:37:49 PM9/26/05

to

D. J. Bernstein wrote:
> Either approach produces a conservative hash
> function somewhat slower than SHA-256.

Doesn't look like it to me, at least not on a G4:

'openssl speed sha1' reports:
Doing sha1 for 3s on 1024 size blocks: 211416 sha1's in 2.83s

so that's about 74705 hashes/s on 1K messages

I did a quick AltiVec-implementation of the H(H(m0^0) ^ H(m1^1) ^ ..)
version of salsa20 hashing, and it did 187951 hashes/s on 1K messages

that's 2.5 times faster than SHA-1

The most conservative suggestion does one salsa20 per 16 bytes, so that
would probably be 3x slower hence about 20% slower than SHA-1. That's
still a lot faster than SHA-256 (for which I don't have a benchmark
around at the moment, but I vaguely recall it being 3x slower than
SHA-1)

Note that the non-altivec version is about 40% slower, which would make
the less conservative salsa20 hashing still faster than SHA-1, and at a
higher security level.

-xmath

David Wagner

unread,

Sep 26, 2005, 6:14:07 PM9/26/05

to

D. J. Bernstein wrote:
> For example, it seems difficult to find collisions in the 64-byte string
> Salsa20(m0,c) ^ Salsa20(m1,c^1), where m0, m1 are 48-byte message blocks

> and c is a 16-byte constant. [....]

> The final
> 64-byte output can be fed through Salsa20 again and truncated to 32
> bytes, producing a 256-bit hash with one Salsa20 invocation for each 32
> message bytes.

I'll note one can find collisions in this hash function in approximately
2^87 time and space by using the generalized birthday attack [1]. In such
an attack, one searches for values m0,m1,m0',m1' such that
Salsa20(m0,c) ^ Salsa20(m1,c^1) ^ Salsa20(m0',c) ^ Salsa20(m1',c^1) = 0.
which can be done in something like 2^{256/3} operations.

Such an attack is clearly impractical today, but one might hope for better
security when using a 256-bit hash.

[1] David Wagner, "A Generalized Birthday Problem", CRYPTO 2002.
http://www.cs.berkeley.edu/~daw/papers/genbday.html

D. J. Bernstein

unread,

Sep 26, 2005, 6:21:33 PM9/26/05

to

David Wagner wrote:
> It seems plausible that there exist pairs of inputs that form a slid pair.
> The difficulty seems to be in finding them efficiently.

No. Salsa20 is not a block cipher. There are only 2^128 inputs, so there
are only 2^256 input pairs (i0,i1), so there are only 2^256 differences
i0 - f(i1). There's no reason to believe that _any_ of those 512-bit
differences are small, for any useful definition of ``small.'' For
example, there's no reason to believe that any of them are zero.

Of course, even if some there were some small differences, you wouldn't
have any way to find them. But there's no point in talking about the
difficulty of finding objects that don't exist in the first place.

D. J. Bernstein

unread,

Sep 26, 2005, 7:00:46 PM9/26/05

to

David Wagner wrote:
> The above is a slide property on the Salsa20 hash function.

Are we next going to hear about ``slide properties'' of your computer's
instruction set? Taking your computer's state at time 0, and starting
another computer in the same state at time 1, produces an amazing match
of the second computer's state at time 1000001 with the first computer's
state at time 1000000! Wow!

Question: Why don't these trivial ``slide properties'' lead to an actual
attack? Answer: The attacker doesn't control most of the inputs. Talking
about a slide attack is nonsense if you haven't specified an attacker.

In particular, the Salsa20 hash function---like the computer's
instruction set---is a lower-level primitive not exposed directly to an
attacker. The Salsa20 encryption function doesn't allow the attacker to
control most of the hash-function input. That's why trivial ``slide
properties'' of the hash function aren't attacks.

D. J. Bernstein

unread,

Sep 26, 2005, 7:12:48 PM9/26/05

to

David Wagner wrote:
> I'll note one can find collisions in this hash function in approximately
> 2^87 time and space by using the generalized birthday attack [1].

Incorrect. The intermediate results are 512 bits, not 256 bits, so all
of your exponents need to be doubled.

Perhaps you missed the final Salsa20 invocation before the truncation to
256 bits: ``The final 64-byte output can be fed through Salsa20 again
and truncated to 32 bytes.''

David Wagner

unread,

Sep 26, 2005, 7:43:28 PM9/26/05

to

D. J. Bernstein wrote:
>David Wagner wrote:
>> I'll note one can find collisions in this hash function in approximately
>> 2^87 time and space by using the generalized birthday attack [1].
>
>Incorrect. The intermediate results are 512 bits, not 256 bits, so all
>of your exponents need to be doubled.
>
>Perhaps you missed the final Salsa20 invocation before the truncation to
>256 bits: ``The final 64-byte output can be fed through Salsa20 again
>and truncated to 32 bytes.''

Oops! You're absolutely right. Sorry. I did indeed miss that.
My apologies; everything I said was erroneous. Thanks for correcting
my mistake.

Milan VXdgsvt

unread,

Sep 26, 2005, 10:26:04 PM9/26/05

to

Hello.

I find the design of Salsa20 cipher really refreshing after going
through AES code, but after reading the documentation, a few questions
arised. Perhaps even someone less qualified could explain these to me.

1. I prefer schemes where reusing a nonce leaks "some" information on
the plaintext, instead of leaking (nearly) all information. Would
deriving the nonce from plaintext, e.g. by taking
nonce:=Salsa20(plaintext[0..47]),
be a stupid idea?

2. The diagonal constants are chosen as a human-readable text, so I
guess nearly any choice would be good. It is explained that their role
is to be linearly independent after shift/rotation. Wouldn't it be
better to use a bitstring, where "1" are at positions that are prime,
e.g.
0110101000101... (positions 2,3,5,7,11,13...)
I have in my mind reduced-rounds and reduced-bits variants where the
constants possibly would need some care.

3. It is explained that the bit shifts are chosen because they're "as
good as any". So why not choose (2,4,8,16) instead of (7,9,13,18) ?

Thanks for any explanations, or pointers to documentation.

Milan

xmath

unread,

Sep 27, 2005, 1:44:40 AM9/27/05

to

David Wagner wrote:
> I'll note one can find collisions in this hash function in approximately
> 2^87 time and space by using the generalized birthday attack [1]. In such
> an attack, one searches for values m0,m1,m0',m1' such that
> Salsa20(m0,c) ^ Salsa20(m1,c^1) ^ Salsa20(m0',c) ^ Salsa20(m1',c^1) = 0.
> which can be done in something like 2^{256/3} operations.
>
> Such an attack is clearly impractical today, but one might hope for better
> security when using a 256-bit hash.

Actually, since Salsa20 produces 512-bit output, wouldn't that be 2^171
time? Of course if you directly truncate the output to 256-bit then
you're right, but afaik not if there's a final Salsa20 application
before truncation.

-xmath

PS I mentioned earlier that I vaguely recalled SHA-256 to be 3x slower
than SHA-1.. it seems I must have had a crappy SHA-256 since I just
tested with LTC and on 1K messages it gives 79341 hashes/s for SHA-1,
64651 hashes/s for SHA-256.. still both slower than salsa20-based
hashing on a G4 at least

D. J. Bernstein

unread,

Sep 27, 2005, 1:47:04 AM9/27/05

to

xmath wrote:
> which is one Salsa20 invocation for every 24 message bytes.

Yes; sorry for the miscalculation. An n-way xor would use one Salsa20
invocation for every 48(n-1)/n message bytes. With n=3, for example,
there would be one Salsa20 invocation for every 32 message bytes.

> I'd also estimate some might find it somewhat annoying that you need
> O(log(k)) memory for hashing a length-k message.

A kilobyte of memory to handle a megabyte-long message. Who cares?

xmath

unread,

Sep 27, 2005, 3:06:19 AM9/27/05

to

D. J. Bernstein wrote:
> Yes; sorry for the miscalculation. An n-way xor would use one Salsa20
> invocation for every 48(n-1)/n message bytes. With n=3, for example,
> there would be one Salsa20 invocation for every 32 message bytes.

But is there any reason to use the recursively nested hashing instead
of the faster flat hashing H(H(m0, 0) ^ H(m1, 1) ^ .. ^ H(mk, k)),
unless you have a specific need for a hashing-tree?

> > I'd also estimate some might find it somewhat annoying that you need
> > O(log(k)) memory for hashing a length-k message.
>
> A kilobyte of memory to handle a megabyte-long message. Who cares?

It may be relevant when hashing a continuous stream of data, especially
in constrained devices (small embedded systems, hardware
implementations)

-xmath

D. J. Bernstein

unread,

Sep 27, 2005, 11:52:30 AM9/27/05

to

xmath wrote:
> But is there any reason to use the recursively nested hashing instead
> of the faster flat hashing H(H(m0, 0) ^ H(m1, 1) ^ .. ^ H(mk, k))

http://www.cs.berkeley.edu/~daw/papers/genbday.html broke AdHash.

With a 128-way 512-bit xor the attack takes something like 2^60 time on
a machine with 2^60 bytes of memory; slightly better price-performance
ratio than parallel collision search for a 256-bit hash.

The current attacks don't show any problems with, say, a 6-way 512-bit
xor. A 6-way tree is nearly as fast as a flat structure.

I'm not saying that xor is the best approach---I'd rather use a slightly
more expensive compression function---but the difficulty of breaking xor
is inspirational; I think that the general approach of hashing blocks in
parallel, and then compressing the results, is fundamentally much better
than the Merkle-Damgaard chaining approach.

> unless you have a specific need for a hashing-tree?

It's useful to have a single hash function that works well for all
applications. Code size is important.

David Wagner

unread,

Sep 27, 2005, 1:04:32 PM9/27/05

to

xmath wrote:
>But is there any reason to use the recursively nested hashing instead
>of the faster flat hashing H(H(m0, 0) ^ H(m1, 1) ^ .. ^ H(mk, k)),
>unless you have a specific need for a hashing-tree?

Yeah. The reason is to avoid generalized birthday attacks (which don't
work on DJB's proposal, but do work against this variant). With k>=256,
you can use linear algebra to find a collision in the flat hashing variant
in almost no time at all, I think.

David Wagner

unread,

Sep 27, 2005, 1:06:08 PM9/27/05

to

Milan VXdgsvt wrote:
>1. I prefer schemes where reusing a nonce leaks "some" information on
>the plaintext, instead of leaking (nearly) all information. Would
>deriving the nonce from plaintext, e.g. by taking
> nonce:=Salsa20(plaintext[0..47]),
>be a stupid idea?

This won't prevent nonce re-use, since the first 48 bytes can repeat
even though the rest of the packet doesn't. Indeed, it might *increase*
the incidence of nonce reuse. If you really wanted to be sure to avoid
nonce reuse, you could use nonce := hash(random || plaintext), but that
is likely to be measurably slower.

Milan VXdgsvt

unread,

Sep 27, 2005, 5:52:37 PM9/27/05

to

David Wagner wrote:

> Milan VXdgsvt wrote:
> > nonce:=Salsa20(plaintext[0..47]),

The idea was that even if a nonce is reused, it gives no additional
information over the first appearance, since the result is exactly the
same:

1. Scenario with random nonce that is reused:
Salsa20(nonceA, plainX) xor Salsa20(nonceA, plainY) = plainX xor plainY

2. Scenario where nonce is the same for the same plaintext:
Salsa20(nonceX, plainX) xor Salsa20(nonceX, plainX) = 0

It follows from the idea that if Hash(plainX)=Hash(plainY), it's more
probable that plainX=plainY than it would be if plaintexts were truly
random. I can see that the collisions of nonces are inevitable in the
second scenario as well, but hopefully with less probability and with a
less useful result (plainX has to be quite different from plainY to
have the same hash).

> This won't prevent nonce re-use, since the first 48 bytes can repeat

> even though the rest of the packet doesn't. Indeed, it might increase

You have a point in that the nonce would have to be derived from the
complete plaintext. This makes the nonce computation almost impossible
on streams.

At any rate, I just hate nonces...

Milan

David Wagner

unread,

Sep 28, 2005, 12:18:20 AM9/28/05

to

Milan VXdgsvt wrote:
>David Wagner wrote:
>> Milan VXdgsvt wrote:
>> > nonce:=Salsa20(plaintext[0..47]),
>
>The idea was that even if a nonce is reused, it gives no additional
>information over the first appearance, since the result is exactly the
>same:

[...]

>You have a point in that the nonce would have to be derived from the
>complete plaintext. This makes the nonce computation almost impossible
>on streams.

Yes. Also, there is the issue that in some applications, plaintexts
may repeat, and we don't want to reveal that fact -- so one should add
to the hash of the whole plaintext a random salt as well (as I described
in my previous post).

Milan VXdgsvt

unread,

Sep 28, 2005, 3:42:50 AM9/28/05

to

David Wagner wrote:

> to the hash of the whole plaintext a random salt as well (as I
> described in my previous post).

Well, first, I'm not quite sure what should the lifetime of the salt
should be.

If it's random for every message, then there's no reason to add the
plaintext, since Hash(random+fixed)=random.

If it's fixed for a key, then it adds no benefits I can see, it's just
a different hash.

> Yes. Also, there is the issue that in some applications, plaintexts
> may repeat, and we don't want to reveal that fact -- so one should add

Anyway, I guess I finally understood the purpose of a nonce. How about
using it as a message counter, similar to in-message counter? For new
message (or session), the in-message counter is reset and nonce
increased. For each new key, the nonce and the in-message counters are
reset.
With nonce as a counter, I'm not worried about the chance of reuse any
more. Thanks for explaining these basics to me.

Milan

Message has been deleted