order preserving cksum

sinbad

unread,

Jul 19, 2012, 12:30:06 AM7/19/12

to

i want to know if there is any cksum/hash function
that can be used to apply cksum/hash on a set of data.
for example if have the set { 23,45,67,89 }. if i compute
the cksum/hash, no matter what order i compute it i should get
the same result. does the current cksum functions already
do that ?

Nicolas George

unread,

Jul 19, 2012, 4:25:47 AM7/19/12

to

sinbad , dans le message
<bc570559-7387-4221...@googlegroups.com>, a �crit�:

Be careful. Your subject line is about "order-preserving checksums". That
means: for all input A, B, A <= B implies checksum(A) <= checksum(B). This
is possible, but such a checksum is very weak, because a small change (a
single corrupted byte) in the middle of the input will very probably not be
detected.

Your explanations rather speak of permutation-invariant checksums. That is
possible too, but to design it properly, you should probably be more,
especially about the scale the permutations applies to: byte-level? record
level, and if so, what is the size of a record?

A simple way of achieving such a checksum is to sort the input before
applying any checksum function. But this is costly, there are much more
efficient solutions.

sinbad

unread,

Jul 19, 2012, 8:10:34 AM7/19/12

to

On Thursday, July 19, 2012 1:55:47 PM UTC+5:30, Nicolas George wrote:
> sinbad , dans le message

> <bc570559-7387-4221...@googlegroups.com>, a écrit :
> > i want to know if there is any cksum/hash function
> > that can be used to apply cksum/hash on a set of data.
> > for example if have the set { 23,45,67,89 }. if i compute
> > the cksum/hash, no matter what order i compute it i should get
> > the same result. does the current cksum functions already
> > do that ?
>
> Be careful. Your subject line is about "order-preserving checksums". That
> means: for all input A, B, A <= B implies checksum(A) <= checksum(B). This

> is possible, but such a checksum is very weak, because a small change (a
> single corrupted byte) in the middle of the input will very probably not be
> detected.
>
> Your explanations rather speak of permutation-invariant checksums. That is
> possible too, but to design it properly, you should probably be more,
> especially about the scale the permutations applies to: byte-level? record
> level, and if so, what is the size of a record?
>
> A simple way of achieving such a checksum is to sort the input before
> applying any checksum function. But this is costly, there are much more
> efficient solutions.

yes permutation-invariant checksums are what i am looking for.
any pointers to the efficient solutions ?

thnks
sinbad

Nicolas George

unread,

Jul 19, 2012, 8:29:53 AM7/19/12

to

sinbad , dans le message

<9de0bc4d-7846-44a3...@googlegroups.com>, a �crit�:

> On Thursday, July 19, 2012 1:55:47 PM UTC+5:30, Nicolas George wrote:
>> sinbad , dans le message

>> <bc570559-7387-4221...@googlegroups.com>, a �crit�:

>> > i want to know if there is any cksum/hash function
>> > that can be used to apply cksum/hash on a set of data.
>> > for example if have the set { 23,45,67,89 }. if i compute
>> > the cksum/hash, no matter what order i compute it i should get
>> > the same result. does the current cksum functions already
>> > do that ?

I do not know how you managed to get that, but it is ugly.

> yes permutation-invariant checksums are what i am looking for.
> any pointers to the efficient solutions ?

I already told you, you need to give more details about the resolution of
the permutation you need. You will not use the same algorithm if you really
need to be able to permute any single bytes or if you only permute a few big
fields.

Eric Sosman

unread,

Jul 19, 2012, 9:08:35 AM7/19/12

to

On 7/19/2012 8:10 AM, sinbad wrote:
> On Thursday, July 19, 2012 1:55:47 PM UTC+5:30, Nicolas George wrote:
>> sinbad , dans le message

>> <bc570559-7387-4221...@googlegroups.com>, a �crit :

>> > i want to know if there is any cksum/hash function
>> > that can be used to apply cksum/hash on a set of data.
>> > for example if have the set { 23,45,67,89 }. if i compute
>> > the cksum/hash, no matter what order i compute it i should get
>> > the same result. does the current cksum functions already
>> > do that ?
>>
>> Be careful. Your subject line is about "order-preserving checksums". That
>> means: for all input A, B, A <= B implies checksum(A) <= checksum(B). This
>> is possible, but such a checksum is very weak, because a small change (a
>> single corrupted byte) in the middle of the input will very probably not be
>> detected.
>>
>> Your explanations rather speak of permutation-invariant checksums. That is
>> possible too, but to design it properly, you should probably be more,
>> especially about the scale the permutations applies to: byte-level? record
>> level, and if so, what is the size of a record?
>>
>> A simple way of achieving such a checksum is to sort the input before
>> applying any checksum function. But this is costly, there are much more
>> efficient solutions.
>
> yes permutation-invariant checksums are what i am looking for.
> any pointers to the efficient solutions ?

Break the input into meaningful "chunks," checksum each
chunk independently (using any means you like), and add the
checksums. Since addition is commutative,

combinedCksum(A,B) = cksum(A) + cksum(B)
= cksum(B) + cksum(A)
= combinedCksum(B,A)

You could use XOR instead of addition if you prefer.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Nicolas George

unread,

Jul 19, 2012, 9:59:00 AM7/19/12

to

Eric Sosman , dans le message <ju90sm$r1k$1...@dont-email.me>, a �crit�:

> Break the input into meaningful "chunks," checksum each
> chunk independently (using any means you like), and add the
> checksums.

That is my suggestion when there are chunks. Possibly add a final checksum
round at the end: checksum(checksum(A) + checksum(B)), to muddy things even
more; only if checksum is known to be bijective on an input with the same
size as the output (block ciphers have that property).

> You could use XOR instead of addition if you prefer.

It would work, but it would be much weaker, because in that case,
combined_checksum(ABB) = combined_checksum(A).

Eric Sosman

unread,

Jul 19, 2012, 10:23:41 AM7/19/12

to

On 7/19/2012 9:59 AM, Nicolas George wrote:
> Eric Sosman , dans le message <ju90sm$r1k$1...@dont-email.me>, a �crit :
>> Break the input into meaningful "chunks," checksum each
>> chunk independently (using any means you like), and add the
>> checksums.
>
> That is my suggestion when there are chunks.

I think there must be chunks of some kind, or the O.P. would
not be concerned about the "order" and "permutation." He illustrated
a set of four items; I think the notion of "atomic item" ("chunk")
is inherent in the question.

> Possibly add a final checksum
> round at the end: checksum(checksum(A) + checksum(B)), to muddy things even
> more; only if checksum is known to be bijective on an input with the same
> size as the output (block ciphers have that property).

My vision is imperfect, but I don't see any benefit from the
muddying. If one additional round improves things, why not improve
them even more with two, or with two hundred?

>> You could use XOR instead of addition if you prefer.
>
> It would work, but it would be much weaker, because in that case,
> combined_checksum(ABB) = combined_checksum(A).

(Shrug.) Letting B' be the inverse of B under whatever combiner
is in use, combined_checksum(ABB') = combined_checksum(A). For XOR
B' = B; for ADD B' = -B. Choose your poison.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Nicolas George

unread,

Jul 19, 2012, 12:15:45 PM7/19/12

to

Eric Sosman , dans le message <ju959h$lu5$1...@dont-email.me>, a �crit�:

> I think there must be chunks of some kind, or the O.P. would
> not be concerned about the "order" and "permutation." He illustrated
> a set of four items; I think the notion of "atomic item" ("chunk")
> is inherent in the question.

The chunks can be very small. If he is writing some kind of anagram
generator, a chunk is a single letter. In that case, it may be more
efficient to count the number of each possible letter and hash the result.

> My vision is imperfect, but I don't see any benefit from the
> muddying.

I may be mistaken, but I believe that it is usually recommended to always
_finish_ with a well-cryptanalyzed hash function.

> If one additional round improves things, why not improve
> them even more with two, or with two hundred?

If the final hash is strong, iterating it will not make it stronger (it will
make it more computationally expensive, though, this is used for hashing
passwords).

> (Shrug.) Letting B' be the inverse of B under whatever combiner
> is in use, combined_checksum(ABB') = combined_checksum(A). For XOR
> B' = B; for ADD B' = -B. Choose your poison.

For ADD, B' = invert-checksum(-checksum(B)). Having two chunks with opposite
checksum will happen as a coincidence, no more. On the other hand, having
identical chunks may pretty well be a common occurrence, depending on the
use.

Barry Margolin

unread,

Jul 19, 2012, 4:58:23 PM7/19/12

to

In article <5007fdc1$0$1714$426a...@news.free.fr>,

Nicolas George <nicolas$geo...@salle-s.org> wrote:

> sinbad , dans le message
> <9de0bc4d-7846-44a3...@googlegroups.com>, a �crit�:
> > On Thursday, July 19, 2012 1:55:47 PM UTC+5:30, Nicolas George wrote:
> >> sinbad , dans le message
> >> <bc570559-7387-4221...@googlegroups.com>, a �crit�:
> >> > i want to know if there is any cksum/hash function
> >> > that can be used to apply cksum/hash on a set of data.
> >> > for example if have the set { 23,45,67,89 }. if i compute
> >> > the cksum/hash, no matter what order i compute it i should get
> >> > the same result. does the current cksum functions already
> >> > do that ?
>
> I do not know how you managed to get that, but it is ugly.

It's a recent Googlism.

Go back to the old version of Google Groups, sinbad.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***