Encrypting data with a restricted range of values

339 views
Skip to first unread message

Peter Gutmann

unread,
Jan 23, 1997, 3:00:00 AM1/23/97
to

I occasionally get mail asking me how to encrypt a set of values within a
restricted range (for example ASCII -> ASCII encryption). A quick response
(which is also portable and easy to implement in something like Visual Basic)
would be:

cipher = ( ( ( plain - BASE ) + ksg() ) % RANGE ) + BASE;

plain = ( ( ( cipher - BASE ) - ksg() ) % RANGE );
while( plain < 0 )
plain += RANGE;
plain += BASE;

For example for ASCII text, BASE = 32, RANGE = 96. For decimal digits, BASE =
0, RANGE = 10. ksg() is some form of keystream generator like RC4 or DES/OFB.
The fact that the output distribution will be nonuniform because of the mod
operation shouldn't affect the strength. This is a conservative
implementation which should work for different languages and different
functionality in the mod operator, you can tune it as required for particular
cases (for example use "( plain - BASE ) ^ ( ksg() & RANGE_MASK )" for the
transformation if RANGE is of the form 2^n - 1, which eliminates the need for
the mod operation).

Has anyone done anything more rigorous than this? This is just an off-the-cuff
response, a quick search through my archives hasn't revealed any work on
restricted-range encryption, the usual response is "encrypt and then
ASCII-encode it", but in some cases this isn't an option due to the expansion
of the ASCII-encoding.

Peter.


John J. G. Savard

unread,
Jan 24, 1997, 3:00:00 AM1/24/97
to

pgu...@cs.auckland.ac.nz (Peter Gutmann) wrote, in part:

>I occasionally get mail asking me how to encrypt a set of values within a
>restricted range (for example ASCII -> ASCII encryption).

and gave, as an example, adding (mod n) the output of a PRNG (properly
scaled, of course) to the string to be encrypted.

>Has anyone done anything more rigorous than this?

Historically, until recently, most encryption was done on an alphabet
of 26 characters or thereabouts, rather than on binary bits. So, there
is certainly a wide selection of algorithms to choose from for
non-binary encryption.

A rotor machine with 95 contacts on each wheel; a Hagelin lug and pin
machine with 94 lugs and seven pinwheels. A Playfair in a 10 x 9
rectangle, with the other five characters given a simple substitution.

One quick-and-dirty method...using a text string as a key, make three
passes. One is a simple Vigenere with the key. The other two, one from
right to left, and the other from left to right, do this: starting
with the third character (in the direction of motion) replace it with
itself PLUS the character preceding it by two places, PLUS the
character in the key pointed to by the character immediately preceding
it. This can be implemented as a simple set of subroutine calls:
encrypt string1 with string2 as key, and decrypt string1 with string2
as key.

John Savard


jbri...@juno.com

unread,
Jan 24, 1997, 3:00:00 AM1/24/97
to

In article <5c8sin$p...@scream.auckland.ac.nz>,
pgu...@cs.auckland.ac.nz (Peter Gutmann) wrote:
>
>

> Has anyone done anything more rigorous than this? This is just an
off-the-cuff
> response, a quick search through my archives hasn't revealed any work on
> restricted-range encryption, the usual response is "encrypt and then
> ASCII-encode it", but in some cases this isn't an option due to the expansion
> of the ASCII-encoding.

My preference is writing a compression routine that changes the restrictive
value of ASCII to a non-restrictive full binary in such a way that it can be
expanded back to ASCII even when encrypted. The net result is the ability to
use ANY binary encryption routine. If the expansion/compression routine is
designed properly (so as to always distribute through restricted range) you
don't run into problems with expansion of ASCII encoding. The
expansion/compression routine should never be a general purpose, you are
compressing numerical values, design the routine for that range, for ASCII
design it for the wider range and variable letter distribution.

Jeffry J. Brickley
The Programmer's Link Page
http://www.geocities.com/SiliconValley/Pines/3210

-------------------==== Posted via Deja News ====-----------------------
http://www.dejanews.com/ Search, Read, Post to Usenet

Bodo Moeller

unread,
Jan 24, 1997, 3:00:00 AM1/24/97
to

pgu...@cs.auckland.ac.nz (Peter Gutmann):

> I occasionally get mail asking me how to encrypt a set of values
> within a restricted range (for example ASCII -> ASCII encryption).

> A quick response (which is also portable and easy to implement in
> something like Visual Basic) would be:

> cipher = ( ( ( plain - BASE ) + ksg() ) % RANGE ) + BASE;

> plain = ( ( ( cipher - BASE ) - ksg() ) % RANGE );
> while( plain < 0 )
> plain += RANGE;
> plain += BASE;

> For example for ASCII text, BASE = 32, RANGE = 96. For decimal
> digits, BASE = 0, RANGE = 10. ksg() is some form of keystream
> generator like RC4 or DES/OFB. The fact that the output
> distribution will be nonuniform because of the mod operation
> shouldn't affect the strength.

It _does_ affect the strength (unless the number of possible values of
ksg() -- 256, say -- is a multiple of RANGE). For an easy attack
scenario, imagine that the same message has been encrypted a lot of
times with different keys (e.g., each ksg() is a truely random byte).
Then you look at the first character of each ciphertext. All these
letters are encryptions of the first plaintext character. A sequence
(of the form BASE + x, BASE + x + 1, BASE + x + 2, ..., possibly
including the "overflow" from BASE + RANGE - 1 to BASE) consisting of
256 % BASE characters should have the property that the characters in
it appear more often than the others. (If you do not have enough
independent encryptions, you might have to guess.) x is the first
plaintext character. (And so on for the second etc. characters.)

Peter M Cipollone

unread,
Jan 25, 1997, 3:00:00 AM1/25/97
to

Why not run through the usual routine of compressing and encrypting
unrestricted data and then convert to the restricted range of values
afterward, using a base64-ish technique based on whatever range
restriction you have?

Pete


Peter Gutmann

unread,
Jan 26, 1997, 3:00:00 AM1/26/97
to


I originally wrote:

>I occasionally get mail asking me how to encrypt a set of values within a
>restricted range (for example ASCII -> ASCII encryption). A quick response
>(which is also portable and easy to implement in something like Visual Basic)
>would be:
>
> cipher = ( ( ( plain - BASE ) + ksg() ) % RANGE ) + BASE;
>
> plain = ( ( ( cipher - BASE ) - ksg() ) % RANGE );
> while( plain < 0 )
> plain += RANGE;
> plain += BASE;
>
>For example for ASCII text, BASE = 32, RANGE = 96. For decimal digits, BASE
>= 0, RANGE = 10. ksg() is some form of keystream generator like RC4 or
>DES/OFB. The fact that the output distribution will be nonuniform because of
>the mod operation shouldn't affect the strength.

Then Bodo Moeller <Bodo_M...@public.uni-hamburg.de> pointed out:



>It _does_ affect the strength (unless the number of possible values of ksg()
>-- 256, say -- is a multiple of RANGE). For an easy attack scenario, imagine
>that the same message has been encrypted a lot of times with different keys
>(e.g., each ksg() is a truely random byte). Then you look at the first
>character of each ciphertext. All these letters are encryptions of the first
>plaintext character. A sequence (of the form BASE + x, BASE + x + 1, BASE +
>x + 2, ..., possibly including the "overflow" from BASE + RANGE - 1 to BASE)
>consisting of 256 % BASE characters should have the property that the
>characters in it appear more often than the others. (If you do not have
>enough independent encryptions, you might have to guess.) x is the first
>plaintext character. (And so on for the second etc. characters.)

As a result I propose the following correction:

KSG_RANGE = ( 256 / RANGE ) * RANGE;

do
val = ksg();
while( val >= KSG_RANGE );

to ensure that the ksg() value used is a multiple of RANGE. The worst-case
scenario is when RANGE = 129, when nearly 50% of the ksg() output will be
discarded. A more typical case when RANGE = 96 (ASCII text) loses 25% of the
output, and RANGE = 10 (digits) loses 2% of the output.

The full process then becomes:

encrypt:

do
val = ksg();
while( val >= KSG_RANGE );
cipher = ( ( ( plain - BASE ) + val ) % RANGE ) + BASE;

decrypt:

do
val = ksg();
while( val >= KSG_RANGE );
plain = ( ( ( cipher - BASE ) - val ) % RANGE );


while( plain < 0 )
plain += RANGE;
plain += BASE;

A few people have suggested compressing, binary-encrypting, and
ASCII-armouring the data. In my original message I said that the requirements
were for a general-purpose subrange -> subrange encryption system, ie one
which would take an input value within an arbitrary subrange and produce as
output another value within the same subrange. Any form of ASCII-encoding of
binary data doesn't fit these requirements because it doesn't allow the
processing of a single character at a time (not to mention that the encoding
process gets pretty weird for some subranges).

Peter.


yoda

unread,
Jan 26, 1997, 3:00:00 AM1/26/97
to

-----BEGIN PGP SIGNED MESSAGE-----

On 23 Jan 1997 23:34:47 GMT, pgu...@cs.auckland.ac.nz (Peter Gutmann)
wrote:

>I occasionally get mail asking me how to encrypt a set of values within a
>restricted range

[snip]

> cipher = ( ( ( plain - BASE ) + ksg() ) % RANGE ) + BASE;

[snip]

>
>For example for ASCII text, BASE = 32, RANGE = 96. For decimal digits, BASE =
>0, RANGE = 10. ksg() is some form of keystream generator like RC4 or DES/OFB.

[snip]

>The fact that the output distribution will be nonuniform because of the mod
>operation shouldn't affect the strength.

I believe that the strength is affected, but this can easily be avoided
by either processing each byte as it comes from ksg() by either:
1. If byte is > RANGE, discard it.
or, if RANGE < 256/2,
2. If byte < RANGE * ( 256 / RANGE )
replace byte with byte / ( 256/RANGE )
else discard it.
where the divisions above are integer. This will retain the uniformity,
you will just need to run the ksg() a little more.

>
>Has anyone done anything more rigorous than this?

Well, what you have proposed is completely rigorous, and your cipher is
just as strong as the ksg() iy used (after the uniformity correction is
done). If you wish to have a block cipher instead of a stream cipher,
your can make them for this RANGE restricted class directly. Since so
much more is known about RC4 and DES/OFB, this is perhaps of only
academic interest, but here it is anyway. For instance, this is a way
to generalise blowfish:

We still use 64 bit blocks, but each byte in the block will be an ascii
character. For ease of computation, subtract B from each byte, so that
the bytes range from 0 to 96 in value. The output will be similarly
shifted down, so after all rounds are complete, shift the bytes in the
block back up in order to get standard ascii output.

Instead of 4 S boxes with entries of 256 long ints each, fill them with
96 entries. Each entry is still a long, but the following restriction
holds: when you extract the bytes, each of the bytes is between 0 and
96 (it has the form of our blocks, too). One still has 2R+2 P boxes,
where R is the number of rounds, and each P box is a long int with the
same restriction as those in the S boxes. All the ext() functions work
the same as the standard blowfish.

So we have:

ULA P[2*R+2], S[4][96]

instead of

UL P[2*R+2], S[4][256]

where UL is the unsigned long int of standard blowfish, and ULA is a new
an unusual type described above, with bytes of shifted ascii.

The Feistel function is changed in the following fashion. Instead of:

F(A) = ( ( S[0][A1] + S[1][A2] ) ^ S[2][A3] ) + S[3][A4]
A = A ^ P[r], B = B ^ F(A)

where A1, ..., A4 are the 4 bytes in A, we have

F(A) = ( ( ( S[0][A1] "+" S[1][A2] ) "+" S[2][A3] ) "+" S[3][A4] )
A = A "+" P[r], B = B "+" F(A)

where "+" means to add byte-wise, and reduce mod 96, byte-wise. The
round number is r.

We can't use XOR as freely as we could with standard blowfish, since A ^
B may not be < 96, if A and B are.

Instead of XOR'ing with the Feistel function, again we add, and reduce
mod 96. When decrypting, we perform subtraction instead of addition,
then we use an overloaded subtraction "-" instead of "+".

Since 96 < 256, this will be less secure than blowfish, but it will run
at exactly the same speed as blowfish, which is fairly fast, so one
could double or even quadruple the usual number of rounds.

Finally, the key scheduling could be done by using whatever method one
would have used for the traditional blowfish, and then doing the
same transformation described above for your ksg().

yoda

~~~
This PGP signature only certifies the sender and date of the message.
It implies no approval from the administrators (ad...@nym.alias.net).
Date: Sun Jan 26 22:20:01 1997 GMT
From: yo...@nym.alias.net

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQEVAwUBMuvYkk5NDhYLYPHNAQH9UAf/SoEtulKeyCQZ7gsB8QO1kzP0Z1fg4sMq
VsVvdMqUMZfd5IDs60rLl0cIL0YK/nKb2/4zVK2P/Hxgt7oHF/kOJ4SmqyXtT26y
wZ2ULh+/VmBu1OtFQ+seuYwuttJtYo2wpfwkHX2LFYt6B+86LlML5xjYLtKrXzBX
jqRZf8KHIl+ebdp+MtvkFLvRxMNZ9Doqcmx0usgzXaASw/EYgWi0KC3vlHvYu8Wo
RONj9LzpdNsA2RPZpkQ3f+gCR4yjINcHkJyi77bONVqAo5S6G1jPoC6zzfrRtIDx
Q0fP14rKe/Ib9dd5WemdXUuI3DF6cESqWaSPAQq9TMYLYSiHD3OX4w==
=VHup
-----END PGP SIGNATURE-----

Jerry Leichter

unread,
Jan 27, 1997, 3:00:00 AM1/27/97
to Peter Gutmann

A safer way to do this is (assuming 0 <= plain,cipher < LIM - i.e., BASE
is 0; a non-zero BASE is just a translation, which makes the formulas
wordier):

cipher = int(plain * (ksg() / KSG_LIM) * LIM)

where ksg() produces a integer in the range 0 <= val < KSG_LIM, and the
arithmetic inside the "int" conversion is floating point. Since the
division is floating point, this can be re-written as:

cipher = int(plain * ksg() * (LIM/KSG_LIM))

- i.e., the division is a constant operation.

On many modern machines, FP arithmetic is fast enough that you can do
this exactly as written. In fact, a loop of such encryptions will
overlap use of the FP and integer resources on many machines, giving you
even better performance than a naive analysis would indicate. (You can
make the first multiplication either integer or float with the same
results. Or, you can even use a ksg() that returns an FP value in
[0,1); that's all the division by KSG_LIM is doing, but the same effect
can often be gotten more efficiently by hacking the bits of the FP
representation directly.)

You can often do even better. For example, suppose ksg() returns a
non-negative 16-bit signed integer, so KSG_LIM is 32768, and we're
encrypting unsigned bytes, so LIM = 256. Then you can get exactly the
same effect as the FP arithmetic above by using:

(plain * ksg() * LIM) >> 15

where all the operations are unsigned and arithmetic is assumed to be
(at least) 32 bit. (You may really need to do this in assembler for
maximum performance; a C compiler won't know that the multiplications
can't possibly overflowm so may generate extra code to do the unsigned
multiplies.)

If ksg() returns any possible n-bit value on an n-bit machine, then the
division amounts to keeping the *top* bits of a 2n-bit product. Again,
this is often a very cheap operation.
-- Jerry

jbri...@juno.com

unread,
Jan 27, 1997, 3:00:00 AM1/27/97
to

In article <5cfo71$6...@scream.auckland.ac.nz>,
pgu...@cs.auckland.ac.nz (Peter Gutmann) wrote:

>
>
> I originally wrote:
>
> A few people have suggested compressing, binary-encrypting, and
> ASCII-armouring the data. In my original message I said that the requirements
> were for a general-purpose subrange -> subrange encryption system, ie one
> which would take an input value within an arbitrary subrange and produce as
> output another value within the same subrange. Any form of ASCII-encoding of
> binary data doesn't fit these requirements because it doesn't allow the
> processing of a single character at a time (not to mention that the encoding
> process gets pretty weird for some subranges).
>
Okay, that's a pretty tough limitation if you want any type of security, but how
about this:
Generate text replacement table
(one for one replacement of restricted range).
Shuffle table based on given "key"
(unique shuffle for each character of key).
while more characters
lookup character in table
replace in cypher text
shuffle replacement table based on character
next character
This will give you a dynamic encryption structure (not all T's will be replaced
with G's) allowing for a little more security. Your algorithm's difficulty to
analyze will be based on the complexity of the shuffle and the limited range.
Obviously, an ASCII text range offers a little more flexibility than a numeric
only range. However, I do not believe that you can achieve an equivalent
difficulty anywhere near a good binary encryption algorithm....

Jeffry J. Brickley
http://www.geocities.com/SiliconValley/Pines/3210


The Programmer's Link Page

Bryan G. Olson

unread,
Jan 27, 1997, 3:00:00 AM1/27/97
to

Peter Gutmann wrote:
>

> As a result I propose the following correction:
>
> KSG_RANGE = ( 256 / RANGE ) * RANGE;
>
> do
> val = ksg();
> while( val >= KSG_RANGE );
>
> to ensure that the ksg() value used is a multiple of RANGE. The worst-case
> scenario is when RANGE = 129, when nearly 50% of the ksg() output will be
> discarded. A more typical case when RANGE = 96 (ASCII text) loses 25% of the
> output, and RANGE = 10 (digits) loses 2% of the output.

Arguably, it wastes 58% of the key generators output when RANGE is 10,
since ksg() produces a much larger range than needed.

Suppose ksg() output is expensive, and we need to waste as little key
as possible, but we still require a uniform distribution. This is
related to the problem of producing a random number in [0..RANGE)
from a random bit source, when RANGE is not a power of 2.

The obvious solution is to find the smallest power of two at least as
large as RANGE, call it 2^m, generate a random number in [0..2^m),
call it "rand", and throw it away and try again rand>=RANGE. We can
save on random bits if we observe and if rand>=RANGE, then rand is
uniformly distributed in [RANGE..2^m), and we can use that entropy.

Instead of ksg(), I'll use rand_bit(), which produces the next bit
of keystream ( 0 or 1 ). Here's some example code which is not
tested:

int RandInRange( int RANGE )
{
rand = 0 ;
mrange = 1 ; /* random will always be uniformly
distributed in [0..mrange) */
for(;;)
{
while( mrange < RANGE )
{
mrange = 2*mrange ;
rand = 2*rand + rand_bit() ;
}
if( rand < RANGE )
return rand ;

mrange = mrange - RANGE ;
rand = rand - RANGE ;
}
}

If we really need to use the minimal number of random bits, we
should consider the entire plaintext of length n single number of
n digits, and generate a single key in the range [0..RANGE^n).
I conjecture that this solution is optimal in the average number
of bits generated. (The worst-case is still infinite.)

--Bryan

Bryan G. Olson

unread,
Jan 27, 1997, 3:00:00 AM1/27/97
to

Jerry Leichter wrote:

> A safer way to do this is (assuming 0 <= plain,cipher < LIM - i.e., BASE
> is 0; a non-zero BASE is just a translation, which makes the formulas
> wordier):
>
> cipher = int(plain * (ksg() / KSG_LIM) * LIM)
>
> where ksg() produces a integer in the range 0 <= val < KSG_LIM, and the

> arithmetic inside the "int" conversion is floating point. [...]

This re-introduces the defect which Peter corrected in the preceding
post. Given a perfect ksg(), the output is not uniformly distributed
over the LIM possible values. If KSG_LIM is large compared to LIM,
and the number of floating point values between 0.5 and 1 is also
large compared to LIM, then the distribution will be close to uniform,
but still not usually perfect, and we would need to generate much more
running key than we really need.

--Bryan

scott_nelson

unread,
Jan 27, 1997, 3:00:00 AM1/27/97
to

In article <5cfo71$6...@scream.auckland.ac.nz>, pgu...@cs.auckland.ac.nz says...

>
>I originally wrote:
>
>>I occasionally get mail asking me how to encrypt a set of values within a
>>restricted range (for example ASCII -> ASCII encryption).
[snip]

>
>A few people have suggested compressing, binary-encrypting, and
>ASCII-armouring the data. In my original message I said that the requirements
>were for a general-purpose subrange -> subrange encryption system, ie one
>which would take an input value within an arbitrary subrange and produce as
>output another value within the same subrange.
>

A general purpose subrange -> subrange system?

1. Convert the symbol set to bits via an appropriate Huffman tree
2. Convert the bits to the desired cipher symbol set via another
appropriate Huffman tree.
3. Encrypt.
4. Convert the cipher symbols to bits using an appropriate
Huffman tree (probably the same one used in step 2.)
5. Convert the bits to the original symbol set via an appropriate
Huffman tree (probably the same one used in step 1.)

If however, you would also like to add the additional limitation
that the cipher text must be the same number of symbols as the
plain text, this won't work, as the cipher text will usually be
smaller.

How about:
Generate N simple substitution ciphers of the symbol set.
Encipher each symbol with one the N ciphers, based on a CSPRNG.

Or:
Encrypt using any method you like.
If the cipher text is out of range, encrypt again.

-o--------------------------------------------------------------
_M_ Pseudo One Time Pads provide pseudo security.
O
/|\
/ \ Scott Nelson, Patron Saint of Hats <sc...@helsbreth.org>

Paul Leyland

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to

pgu...@cs.auckland.ac.nz (Peter Gutmann) writes:

> I occasionally get mail asking me how to encrypt a set of values within a

> restricted range (for example ASCII -> ASCII encryption). A quick response
> (which is also portable and easy to implement in something like Visual Basic)

> Has anyone done anything more rigorous than this? This is just an


> off-the-cuff response, a quick search through my archives hasn't
> revealed any work on restricted-range encryption, the usual response
> is "encrypt and then ASCII-encode it", but in some cases this isn't an
> option due to the expansion

It's been observed that RC4 doesn't have to work on 8-bit quantities
and, in particular, a standard deck of playing cards gives a
52-character alphabet. The jokers are used for the pointers into the
permuted deck.

Are playing cards exportable from the United States?

--
Paul Leyland <p...@oucs.ox.ac.uk> | Hanging on in quiet desperation is
Oxford University Computing Services | the English way.
13 Banbury Road, Oxford, OX2 6NN, UK | The time is gone, the song is over.
Tel: +44-1865-273200 Fax: 273275 | Thought I'd something more to say.
PGP KeyID: 0xCE766B1F

Jerry Leichter

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to bgo...@ix.netcom.com

That's true, but consider practical application: It's unlikely that LIM
is more than 256. Even you use single-precision IEEE FP, you've got 24
bits of precision. All numbers between .5 and 1 will have an (actual)
exponent of -1, so there are 2^24 of them - much larger than 256. It's
not clear exactly how to measure the bias, but I don't think it's large
- and certainly smaller than in the naive modular approach.

In any case, as I pointed out later in the posting, in practical cases,
you generally will take KSG_LIM to be a power of two. In that case, you
can easily make the arithmetic exact, and the distribution will be flat.
(Problems will arise when LIM is so large that you can't do the exact
higher-precision arithmetic cheaply. In that case, yes, this method is
a poor choice.)
-- Jerry

Bryan G. Olson

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to

Jerry Leichter wrote:
>
> | > A safer way to do this is (assuming 0 <= plain,cipher < LIM - i.e.,
> | >BASE is 0; a non-zero BASE is just a translation, which makes the
> | >formulas wordier):
> | >
> | > cipher = int(plain * (ksg() / KSG_LIM) * LIM)
> | >
> | > where ksg() produces a integer in the range 0 <= val < KSG_LIM,
> | > and the arithmetic inside the "int" conversion is floating point.
> | >[...]
> |
> | This re-introduces the defect which Peter corrected in the preceding
> | post. Given a perfect ksg(), the output is not uniformly distributed
> | over the LIM possible values. If KSG_LIM is large compared to LIM,
> | and the number of floating point values between 0.5 and 1 is also
> | large compared to LIM, then the distribution will be close to uniform,
> | but still not usually perfect, and we would need to generate much more
> | running key than we really need.
>
> That's true, but consider practical application: It's unlikely that LIM
> is more than 256. Even you use single-precision IEEE FP, you've got 24
> bits of precision.

You missed the other condition: KSG_LIM must be large compared to
LIM. What good is having lots of floating point numbers if you don't
generate them approximately uniformly? The problem is going from a
uniform distribution over KSG_LIM possibilities to a uniform
distribution over LIM possibilities. Going through the floating point
representation can only hurt.

> In any case, as I pointed out later in the posting, in practical cases,
> you generally will take KSG_LIM to be a power of two. In that case, you
> can easily make the arithmetic exact, and the distribution will be flat.

If LIM divides a power of KSG_LIM of course it's trivial. If not, to
get
a truly flat distribution (assuming a flat distribution over the KSG_LIM
range) requires a re-try algorithm (and infinite time in the
worst-case).

--Bryan

jbri...@juno.com

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to

>
> Are playing cards exportable from the United States?

Only if a computer shuffles them TOO well!! ;)

scott_nelson

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to

In article <854446...@ulf.mali.sub.org>, Bodo_M...@public.uni-hamburg.de says...
>It is easy to see that if we want "cipher" to have a uniform
>distribution (for fixed "plain" and uniformly distributed ksg()), we
>cannot simply use some function cipher = f(plain, ksg())
. . .
>
>So either we could throw away some bytes from ksg() in order to obtain
>a key stream that is uniformly distributed in 0 .. 233
>or we can concatenate several values from ksg() to produce a very large
>integer, [and use it's mod LIM value] [e.g. integer size = 1024 bits]

Bias is approximately equal to 1/<large_integer>, so <large_integer>
doesn't really need to be so huge, 32 bits would suffice for most
applications. But rather than concatenating then diving, you
can calculate
plain = (plain+ksg()+1) mod LIM,
at each step, which eliminates the need for bignums entirely.

Bodo Moeller

unread,
Jan 28, 1997, 3:00:00 AM1/28/97
to

Jerry Leichter <leic...@smarts.com>:

> A safer way to do this is (assuming 0 <= plain,cipher < LIM - i.e., BASE
> is 0; a non-zero BASE is just a translation, which makes the formulas
> wordier):

> cipher = int(plain * (ksg() / KSG_LIM) * LIM)

> where ksg() produces a integer in the range 0 <= val < KSG_LIM, and the
> arithmetic inside the "int" conversion is floating point.

This does not seem to make too much sense. Are you aware that this
scheme results in 0 <= cipher < LIM*LIM rather than 0 <= cipher < LIM?
And if plain == 0, we also have cipher == 0, etc. Maybe what you
really meant is

cipher = (plain + int(plain + (ksg() / KSG_LIM) * LIM)) MOD LIM,

but that still will not produce uniform output and thus does not
provide the level of security we want to achieve.

It is easy to see that if we want "cipher" to have a uniform
distribution (for fixed "plain" and uniformly distributed ksg()), we

cannot simply use some function cipher = f(plain, ksg()): When there
are 256 possible values for ksg() and 26 possible values for cipher,
it is not possible to have the same number of "ksg()"s for each
"cipher".

So either we could throw away some bytes from ksg() in order to obtain

a key stream that is uniformly distributed in 0 .. 233 (that solves
our problem because 234 == 9 * 26, and the bytes from this thinned-out
key stream can be taken MOD 26 to produce a uniform stream of values
in 0 .. 25); or we can concatenate several values from ksg() to
produce a very large integer (e.g. 128 bytes (= 1024 bits)), convert
it to base-26 (in this example: 218 base-26 digits), discard some of
the most significant base-26 digits of that (maybe about 28) and use
the remainder (which is a nearly uniformly distributed sequence of
values in 0 .. 25; in the example we have 190 values) for the
encryption.

The latter scheme has the obvious disadvantage that the software must
include the necessary bignum functions, while the first scheme could be
implemented in a tiny program. However, it might be faster (if ksg()
is slow), and it has an upper limit for the running time.

Bodo Moeller

unread,
Jan 29, 1997, 3:00:00 AM1/29/97
to

Scott Nelson:
> Bodo_M...@public.uni-hamburg.de:

[...]


>> So either we could throw away some bytes from ksg() in order to obtain
>> a key stream that is uniformly distributed in 0 .. 233

>> or we can concatenate several values from ksg() to produce a very large

>> integer, [and use it's mod LIM value] [e.g. integer size = 1024 bits]

^^^^^^^^^^^^^^^^^^^^^^^^^^

That's not what I wrote (at least not what I meant). My suggestion
was: Convert this bignum to base-26 and use several of the least
significant base-26 digits, but throw away some of the most
significant base-26 digits. In other words, you take the bignum, use
its MOD 26 value, then divide by 26 (discarding the fraction), use the
MOD 26 value of the result, again divide by 26, and so on.

This can be repeated until you have e.g. 190 values when you have
started with 128 samples from ksg(): Conversion of a 128 byte number
to base 26 will result in a number with 218 or so base-26 digits. The
most significant digits of this are biased, so we don't want to use
them and might decide to use only 218 - 28 = 190 of them. This 28 is,
of course, somewhat arbitrary, but I suggested that value because 28
base-26 digits are equivalent to nearly 128 binary digits which could
be considered a "round" number.

Bodo Moeller
<3moe...@informatik.uni-hamburg.de>

R Fleming

unread,
Jan 30, 1997, 3:00:00 AM1/30/97
to

Paul Leyland <p...@sable.ox.ac.uk> wrote:

[...]


> It's been observed that RC4 doesn't have to work on 8-bit quantities
> and, in particular, a standard deck of playing cards gives a
> 52-character alphabet. The jokers are used for the pointers into the
> permuted deck.
>

> Are playing cards exportable from the United States?

Yes, because the apparatus isn't "easily convertable" to cryptography;
it's painfully slow to add the Queen of Hearts to the Jack of Spades
modulo 52. Of course, if you replace the combining operation with a lookup
table, and include that with the deck, they might get _really_
suspicious!!

I vaguely recall reading of some war-time censorship restriction which
required packs of cards to be shuffled before export, in case the
permutation concealed information. As another exercise for the reader,
devise efficient algorithms for conversion between a permutation of n
objects, and an integer in [0,..,n!-1].

R Fleming

unread,
Jan 30, 1997, 3:00:00 AM1/30/97
to

bgo...@ix.netcom.com wrote:

[...]


> as possible, but we still require a uniform distribution. This is
> related to the problem of producing a random number in [0..RANGE)
> from a random bit source, when RANGE is not a power of 2.
>

> The obvious solution is to[...]

Of course, the Pigeonhole Principle shows us that a perfect solution -
with no discards, _and_ no bias - is impossible, but we can get
arbitrarily close.

The most efficient solutions (ie with proportionately the fewest discards)
are to find coprime integers n, m such that RANGE^^m ~< 2^^n, ie a closest
rational approximation to the irrational log_2(RANGE). This is closely
related to dividing the musical octave into harmonious fifths, the useful
fact that 2^^10 ~= 10^^3, and other cool number theory stuff.

Anyway, you express log_2(RANGE) as a continued fraction, and terminate
the series when it is sufficiently accurate. Turn this into a regular
rational, n/m. Now generate n random bits, and discard the resulting
number only if it is larger than RANGE^^m - 1. Otherwise, convert it into
an m digit number base RANGE, and output m values uniformally distributed
in [0,..,RANGE-1].

Here is some rough code for RANGE = 10, using the 1024~=1000
approximation; on average, it uses 3.41 bits to create a decimal deviate,
which is pretty economical.

if (flag > 0) { // There's still some digits left from last time.
flag--;
return s[flag];
} else {
x = 1000;
while ( x>999) {
x = 0;
for(i=0;i<10;i++) {
x = x<<1 | randbit();
};
};
for (i=0;i<3;i++) {
s[i] = x % 10;
x = x/10;
};
flag = 2;
return s[2];
};

A very similar procedure can be developed for any PRNG output range too,
based on rational approximations to log_OUTRANGE(RANGE).

Bryan G. Olson

unread,
Jan 30, 1997, 3:00:00 AM1/30/97
to

R Fleming wrote:

> bgo...@ix.netcom.com wrote:
> > the problem of producing a random number in [0..RANGE)
> > from a random bit source, when RANGE is not a power of 2.

[...]


> The most efficient solutions (ie with proportionately the fewest discards)
> are to find coprime integers n, m such that RANGE^^m ~< 2^^n, ie a closest
> rational approximation to the irrational log_2(RANGE).

> Anyway, you express log_2(RANGE) as a continued fraction, and terminate


> the series when it is sufficiently accurate. Turn this into a regular
> rational, n/m. Now generate n random bits, and discard the resulting
> number only if it is larger than RANGE^^m - 1. Otherwise, convert it into
> an m digit number base RANGE, and output m values uniformally distributed
> in [0,..,RANGE-1].

I don't think you can assume that you get to choose how many
digits of output are required. If asked to produce a long
sequence of random outputs in [0..RANGE) you might want to
break it up into pieces of a convienient size by generating
numbers in [0..RANGE^m), where you can choose m. But my method
seem to do better by taking m to be the total number of outputs
required.

I have not found a case where generating a selection in
[0..RANGE1) and one in [0..RANGE2) takes fewer bits than
generating singe choice from [0..RANGE1*RANGE2). I also haven't
shown that no such case exists; it certainly does exist if you
don't use the optimization.

(The optimization is that if you have a uniform choice in x from
[0..MRANGE) and need a uniform choice in [0..RANGE) with
MRANGE>RANGE, then if x>=RANGE, don't just discard it; use
x-RANGE as a uniform choice from [0..MRANGE-RANGE). )

> Here is some rough code for RANGE = 10, using the 1024~=1000
> approximation; on average, it uses 3.41 bits to create a decimal deviate,

> which is pretty economical. [...]

But not optimal. The method I suggested uses an average of
10.151267 bits for a uniform selection in [0..1000), for 3.38376
bits per decimal digit. And if you generate up to higher powers
of 1000, it does even better.

Range Average bits Ave per digit
[0..1000) 10.151 3.383
[0..1000^2) 20.256 3.376
[0..1000^3) 30.314 3.368
[0..1000^4) 40.416 3.368
[0..1000^5) 50.342 3.356

Your method is a reasonable, practical choice; wasting a few
hundredths of a bit is not likely to be a problem.

Since I forgot to declare two variables in my function, here
an improved version. I conjecture it is optimal in the average
number of random bits needed.

/* Return a random choice from the uniform distribution
of integers in [0..target_range), assuming rand_bit() returns
uniformly distributed random bits.
*/
int RandInRange( int target_range )
{
int rand = 0 ;
int current_range = 1 ; /* INVARIANT: random is uniformly
distributed in [0..current_range) */
for(;;)
{
while( current_range < target_range )
{
current_range *= 2 ;
rand = 2 * rand + rand_bit() ;
/* Invariant true */
}
if( rand < target_range )
return rand ;

/* rand uniform in [target_range..current_range) */
current_range -= target_range ;
rand -= target_range ;
/* Invariant true again */
}
}

--Bryan

H. Peter Anvin

unread,
Feb 2, 1997, 3:00:00 AM2/2/97
to

Followup to: <-30019719...@mg4-52.its.utas.edu.au>
By author: ,@,., (R Fleming)
In newsgroup: sci.crypt

>
> I vaguely recall reading of some war-time censorship restriction which
> required packs of cards to be shuffled before export, in case the
> permutation concealed information. As another exercise for the reader,
> devise efficient algorithms for conversion between a permutation of n
> objects, and an integer in [0,..,n!-1].
>

Now that is *really* brilliant, don't you say... all the decks are in
random order, so at the top of box X of shipment Y every day there is
the deck with new information... seems more intelligent to require
that the cards be sorted e.g. in bridge order.

-hpa
--
This space intentionally has nothing but text explaining why this
space has nothing but text explaining that this space would otherwise
have been left blank, and would otherwise have been left blank.


Reply all
Reply to author
Forward
0 new messages