Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

noob question about AES & pigeonhole principle

5 views

Skip to first unread message

lisa.cart...@yopmail.com

unread,

Nov 23, 2009, 5:09:47 AM11/23/09

Hi everyone !
This is a noob question.
I've tried using AES recently (using SunJCE), & the result is
surprising to me.
If I try to encrypt a 3 bytes message using a 128 bits key, the
cipherText is 128 bits (16 bytes), & when I decipher it the resulting
message is 3 bytes.
So apparently I can encrypt messages from 1 to 16 bytes, & the cipher
can retrieve my original message size.
Let's calculate the total number of messages of 1 to 16 bytes : 256 +
256² + ... + 256^16 = 3,42e+38 (well known geometric serie :
http://en.wikipedia.org/wiki/Geometric_progression#Geometric_series).
Here is the total number of messages that can be stored in 128 bits :
256^16 = 3,40e+38
So, according to the pigeonhole principle, there should be collision,
right ?
I'm sure I've missed something somewhere, but I can't point it out.
Any help ?

--
John

rossum

unread,

Nov 23, 2009, 5:45:01 AM11/23/09

On Mon, 23 Nov 2009 02:09:47 -0800 (PST),
lisa.cart...@yopmail.com wrote:

>Hi everyone !
>This is a noob question.
>I've tried using AES recently (using SunJCE), & the result is
>surprising to me.
>If I try to encrypt a 3 bytes message using a 128 bits key, the
>cipherText is 128 bits (16 bytes), & when I decipher it the resulting
>message is 3 bytes.

Yes. A cypher is not much good if you do not get the same message
back when you decrypt.

>So apparently I can encrypt messages from 1 to 16 bytes, & the cipher
>can retrieve my original message size.

You can encrypt messages of pretty much any size you want.

>Let's calculate the total number of messages of 1 to 16 bytes : 256 +
>256² + ... + 256^16 = 3,42e+38 (well known geometric serie :
>http://en.wikipedia.org/wiki/Geometric_progression#Geometric_series).
>Here is the total number of messages that can be stored in 128 bits :
>256^16 = 3,40e+38
>So, according to the pigeonhole principle, there should be collision,
>right ?

Your maths is correct.

>I'm sure I've missed something somewhere, but I can't point it out.
>Any help ?

You have indeed missed something. A block cypher takes a block and
reversibly encrypts it to another block. Block cyphers are
permutations of blocks - 128 bit/16 byte blocks in the case of AES.

However, not all messages are exactly 16 bytes long so all messages
are reversibly padded to the next block boundary. Long messages are
split into blocks and the last block gets the padding. Look up
Cryptographic Padding:
http://en.wikipedia.org/wiki/Padding_(cryptography) for details of
padding schemes.

Your short messages are padded to exactly 16 bytes (which is why your
cyphertext was always 16 bytes long). However, not all input blocks
are valid as a short message with padding - the padding is restricted
to a small defined space of possible bit patterns. For example, if we
use PKCS7 padding on a 14 byte message then the input block must end
in the two padding bytes 0x02 0x02 - any other ending will not be
allowed by the padding scheme. In this case only 1 of the 2^16
possible two byte endings of the block are allowed.

This reduces the number of allowed input bit patterns and hence the
cypher is reversible, i.e. collision free.

rossum

lisa.cart...@yopmail.com

unread,

Nov 23, 2009, 6:09:08 AM11/23/09

On 23 nov, 10:45, rossum <rossu...@coldmail.com> wrote:
> Yes. A cypher is not much good if you do not get the same message
> back when you decrypt.

Indeed. ; )

> You have indeed missed something. A block cypher takes a block and
> reversibly encrypts it to another block. Block cyphers are
> permutations of blocks - 128 bit/16 byte blocks in the case of AES.
>
> However, not all messages are exactly 16 bytes long so all messages
> are reversibly padded to the next block boundary. Long messages are
> split into blocks and the last block gets the padding. Look up
> Cryptographic Padding:http://en.wikipedia.org/wiki/Padding_(cryptography) for details of
> padding schemes.
>
> Your short messages are padded to exactly 16 bytes (which is why your
> cyphertext was always 16 bytes long). However, not all input blocks
> are valid as a short message with padding - the padding is restricted
> to a small defined space of possible bit patterns. For example, if we
> use PKCS7 padding on a 14 byte message then the input block must end
> in the two padding bytes 0x02 0x02 - any other ending will not be
> allowed by the padding scheme. In this case only 1 of the 2^16
> possible two byte endings of the block are allowed.
>
> This reduces the number of allowed input bit patterns and hence the
> cypher is reversible, i.e. collision free.
>
> rossum

Yep, I knew about the padding. I've had a look at the various padding
schemes (SunJCE allows NOPADDING - PKCS5PADDING - ISO10126PADDING). I
guess PKCS #5 padding is the same as PKCS #7 padding.
So I was thinking about that : if I have a 128 bits message, how can
the decryptor know whether the last bytes are made of padding or real
data ? So I had a look at my tests, & realized that when I encrypt 128
bits using either PKCS5PADDING or ISO10126PADDING, the cipherText
length is 256 bits, which I hadn't noticed before. If I encrypt 120
bits, the cipherText length is 128 bits. & if I encrypt 128 bits with
NOPADDING, the cipherText length is 128 bits.
So things make perfect sense now.
Thanks for your help rossum. : )

Francois Grieu

unread,

Nov 28, 2009, 8:55:29 AM11/28/09

lisa.cart...@yopmail.com wrote :

> I've had a look at the various padding
> schemes (SunJCE allows NOPADDING - PKCS5PADDING - ISO10126PADDING).
> I guess PKCS #5 padding is the same as PKCS #7 padding.
> So I was thinking about that : if I have a 128 bits message,
> how can the decryptor know whether the last bytes are made of
> padding or real data ?

"The decryptor" (which I'll take as "whatever performs decryption
followed by de-padding") is supposed to know that from the specification
of the padding scheme. Any padding scheme worse consideration makes it
unambiguous how to de-pad at least a validly padded string. If no
padding scheme is specified, the best the decryptor can do is assume
everything is data. Bad but common practice is to guess the padding from
the content of the message, and/or the padding scheme from a few examples.

Most practical padding schemes for B-bit blocks encode a message of
B-bit into two blocks. The pigeonhole principle makes it impossible to
have a padding scheme that reversibly encodes all the messages of at
most 128 bits into a 128-bit cryptogram, since 2^129-1 > 2^128.

There is an amazing variety of padding schemes, and even more actual
un-padding variants if we consider the result of de-padding data which
could not have been produced by the padding scheme.

If we take the example of PKCS#7 V1.5 padding, it is specified as:
<quote>
Some content-encryption algorithms assume the input length is a multiple
of K octets, where K > 1, and let the application define a method for
handling inputs whose lengths are not a multiple of K octets. For such
algorithms, the method shall be to pad the input at the trailing end
with K - (L mod K) octets all having value K - (L mod K), where L is the
length of the input. In other words, the input is padded at the trailing
end with one of the following strings:
01 ï¿½ if L mod K = K-1
02 02 ï¿½ if L mod K = K-2
.
.
.
K K .. K K ï¿½ if L mod K = 0

The padding can be removed unambiguously since all input is padded and
no padding string is a suffix of another. This padding method is
well-defined if and only if K < 256; methods for larger K are an open
issue for further study.
</quote>

Assume that K is known to "the decryptor" and in range [1..255].
This specification leaves it unspecified what to do if either:
- the input to the de-padding is empty;
- it ends with a byte N not in range [1..K];
- N>1 and the last N bytes are not identical.
Different implementations will likely act differently, and this may have
implications on security. For example, the last ambiguity may let the
padding scheme create a "covert channel".

Some padding schemes, e.g. ISO 10126, intrinsically leave room for a
covert channel, regardless of how carefully the de-padding is performed.

A simple and common padding scheme is defined in ISO 9797-1 [a standard
for Message Authentication Codes] as "padding method 2":
"The data string [..] shall be right-padded with a single '1' bit. The
resulting string shall then be right-padded with as few (possibly none)
'0' bits as necessary to obtain a data string whose length (in bits) is
a positive integer multiple of [the block size]."

It has the nice properties that
- any bit string input can be encoded unambiguously;
- any non-empty sequence of blocks can be decoded unambiguously;
- an encoder restricted to byte string input is easily derived.
The later, in conjunction with a "MSbit to LSbit" convention (most usual
in the context) makes the rule "right-pad with a single '80h' byte; then
right-pad with as few (possibly none) '00h' bytes as necessary to obtain
a data string whose length is a multiple of the block size."

Francois Grieu

0 new messages