Proposal for a non-periodic CPRNG (WARNING: CROSSPOST!)

Sebastian Garth

unread,

Apr 28, 2012, 10:39:21 PM4/28/12

to

I'd like to present a new variation of the standard Shrinking LFSR Generator (SLG) that, with just a few slight modifications, achieves both non-periodicity and a significantly improved level security. An SLG is of course currently considered impervious to cryptoanalysis except for the case where the characteristic polynomial is known (ie: public). So the first modification proposed is to have the polynomial change somehow over time. In this case we'll simply increment the polynomial's binary representation whenever the LFSR's internal state has completed a complete cycle (which will be of course some length on [2, (2^N)-1], where N is the degree of the polynomial). The next modification is to dynamically increase the size of the LFSR whenever an overflow of the polynomial's binary representation is imminent. As an arbitrary convention (and better suggestions are certainly welcome here), the internal state of the LFSR is set to the seed and the binary representation of the polynomial is set to seed + 1. That's basically it!

In summary, the new design should demonstrate the following properties:

1) Non-periodicity;
2) Security; Implied by the current consensus that the only weakness of the SLG is the "known polynomial predicament".
3) Memory efficiency; This is implied by the fact that we are merely incrementing the polynomial's binary representation, and that for any given N the number of characteristic polynomials with maximal period ((2^N)-1) is T(N) where T is Euler's totient function, so of course the result is a very slow rate of growth indeed. To illustrate the point, if we were to imagine running the generator non-stop for a trillion years on today's computers, the amount of working memory needed would be much less so than what would be required to store the information contained in this very sentence!
4) Low algorithmic complexity; This is of course somewhat of a subjective statement, but just as an example, a very lightly optimized proof-of-concept program written by the author was capable of producing about 3 million output bits per second. A highly optimized version might have produced 2 or 3 times as much, I imagine.
5) Capable of producing "sufficient" randomness; Implied by what research has been done on SLG's as well as the empirical evidence (albeit meager) that has been gathered thus far by the author. Up to this point, the testing I have conducted essentially amounts to running the output of the generator through the ENT tool and several file compressors. In the latter case, compression levels consistently amounted to 2% or less for streams up to 3000 bytes and 0% for anything larger, suggesting a high degree of entropy. A fair representation of the type of results obtained from the ENT tool are as follows:

[quote]
Entropy = 7.999800 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 277.12, and randomly
would exceed this value 16.31 percent of the times.

Arithmetic mean value of data bytes is 127.4699 (127.5 = random).
Monte Carlo value for Pi is 3.146724587 (error 0.16 percent).
Serial correlation coefficient is 0.000509 (totally uncorrelated = 0.0).
[/quote]

6) Bit-width neutrality; There is no need for carefully chosen "magic constants" which depend so delicately on machine word size and other such factors, and the input seed can be any arbitrary value whatsoever, even one whose magnitude is several thousand bits or what have you.
7) Simplicity; The algorithm is arguably trivial to understand, easy to implement, and overall analysis of the algorithm should invariably prove to be a relatively straight-forward task.

A proof-of-concept implementation of the algorithm written in the C programming language can be found here (http://tinyurl.com/moar-zip). I would sincerely appreciate any comments relating to the algorithm, the source code, or anything else that may be relevant to the discussion here.

Phoenix

unread,

Apr 29, 2012, 8:44:16 AM4/29/12

to

On 29 Abr, 02:39, Sebastian Garth <sebastianga...@gmail.com> wrote:

Hi Sebastian.

Keep it up. Only three things:

1. 3 million output bits per second, seems to me very slowly.
2. The ENT program alone is not enough to make cryptanalysis.
4. Try to make a description of the scheme in pseudo code.

Sebastian Garth

unread,

Apr 29, 2012, 10:41:29 AM4/29/12

to

On Sunday, April 29, 2012 8:44:16 AM UTC-4, Phoenix wrote:
> 1. 3 million output bits per second, seems to me very slowly.

As I said, the program could be heavily optimized to improve the speed significantly, I think. Even so, 3*10^9 BPS isn't really slow, per se. For some applications (eg: real-time simulations) it might be, but for many (eg: generating crypto keys) it should be sufficiently efficient even without modification.

On Sunday, April 29, 2012 8:44:16 AM UTC-4, Phoenix wrote:
> 2. The ENT program alone is not enough to make cryptanalysis.

Strictly speaking, the ENT program isn't capable of supporting any claims whatsoever about the cryptographic quality of the algorithm; it only serves as an indicator as to the quality of randomness produced.

> 4. Try to make a description of the scheme in pseudo code.

Yes, that's probably a very good idea. To be sure, my initial description was somewhat vague. I'll work on that as soon as I recover from my current state of mental exhaustion (this whole idea came to me less than a week ago, and since then I've done nothing but write code, run tests, etc).

unruh

unread,

Apr 29, 2012, 11:06:56 AM4/29/12

to

Please edit your line lengths.

On 2012-04-29, Sebastian Garth <sebasti...@gmail.com> wrote:
> I'd like to present a new variation of the standard Shrinking LFSR Generator (SLG) that, with just a few slight modifications, achieves both non-periodicity and a significantly improved level security. An SLG is of course currently considered impervious to cryptoanalysis except for the case where the characteristic polynomial is known (ie: public). So the first modification proposed is to have the polynomial change somehow over time. In this case we'll simply increment the polynomial's binary representation whenever the LFSR's internal state has completed a complete cycle (which will be of course some length on [2, (2^N)-1], where N is the degree of the polynomial). The next modification is to dynamically increase the size of the LFSR whenever an overflow of the polynomial's binary representation is imminent. As an arbitrary convention (and better suggestions are certainly welcome here), the internal state of the LFSR is set to the seed and the binary representation of the polynomial is set to seed + 1. That's basically it!

The assumption is that the algorithm is known. Ie, someone will either
read teh source or reverse engineer the program so that they know
exactly what the algorithm is. Remember, that the receiver has to know
the algorithm to decrypt it, and as a result that has to be public.
Now, you could have the algorithm depend on the key to get around that.
But since you must have only a finite number of algorithms, the attacker
can just try them all for each chunck of the code.

>
> In summary, the new design should demonstrate the following properties:
>
> 1) Non-periodicity;
> 2) Security; Implied by the current consensus that the only weakness of the SLG is the "known polynomial predicament".

And 2(b) Since the crypto algorithm must be known for the reciever to
decrypt it, the algorithm must be assumed to be known as well.(code
leaks out if more than one person knows it)

> 3) Memory efficiency; This is implied by the fact that we are merely incrementing the polynomial's binary representation, and that for any given N the number of characteristic polynomials with maximal period ((2^N)-1) is T(N) where T is Euler's totient function, so of course the result is a very slow rate of growth indeed. To illustrate the point, if we were to imagine running the generator non-stop for a trillion years on today's computers, the amount of working memory needed would be much less so than what would be required to store the information contained in this very sentence!
> 4) Low algorithmic complexity; This is of course somewhat of a subjective statement, but just as an example, a very lightly optimized proof-of-concept program written by the author was capable of producing about 3 million output bits per second. A highly optimized version might have produced 2 or 3 times as much, I imagine.
> 5) Capable of producing "sufficient" randomness; Implied by what research has been done on SLG's as well as the empirical evidence (albeit meager) that has been gathered thus far by the author. Up to this point, the testing I have conducted essentially amounts to running the output of the generator through the ENT tool and several file compressors. In the latter case, compression levels consistently amounted to 2% or less for streams up to 3000 bytes and 0% for anything larger, suggesting a high degree of entropy. A fair representation of the type of results obtained from the ENT tool are as follows:

Sssorry, but those are like evaluating a car by kicking the tires. The
fact that the wheel does not fall off under that test is hardly an
adequate cryptographic test.

Sebastian Garth

unread,

Apr 29, 2012, 3:15:05 PM4/29/12

to

On Sunday, April 29, 2012 11:06:56 AM UTC-4, unruh wrote:
> The assumption is that the algorithm is known. Ie, someone will either
> read teh source or reverse engineer the program so that they know
> exactly what the algorithm is. Remember, that the receiver has to know
> the algorithm to decrypt it, and as a result that has to be public.
> Now, you could have the algorithm depend on the key to get around that.
> But since you must have only a finite number of algorithms, the attacker
> can just try them all for each chunck of the code.

To be clear, the only thing that must be kept secret is the input seed. That is of course a perfectly reasonable requirement.

On Sunday, April 29, 2012 11:06:56 AM UTC-4, unruh wrote:
> Sssorry, but those are like evaluating a car by kicking the tires. The
> fact that the wheel does not fall off under that test is hardly an
> adequate cryptographic test.

The cryptographic properties are briefly (although perhaps too vaguely) described in item #2. SLG's have been extensively studied, and as I have said before, the only real weakness in them is the fact that if the attacker knows the characteristic polynomial, they can possibly deduce the input seed using sophisticated techniques (see the work of Mihaljević and others). This algorithm overcomes such a conundrum by allowing the polynomial to be kept secret (as it is derived directly from the input seed itself), and besides that the polynomial is constantly changing, making cryptanalysis that much more difficult.

Sebastian Garth

unread,

Apr 29, 2012, 5:32:21 PM4/29/12

to

My initial description of the algorithm was admittedly quite vague, so I'm going to go over it again in a little more detail, in hopes that it may clear up any undue confusion.

But first some terminology. "CP" refers to a "characteristic polynomial over the finite field GF(2)". "RN" refers to a "representation of a CP as a binary number". So for example, the CP x^11 + x^9 + x^4 + 1 has an RN counterpart of 10100001000 (ie: 1288 in decimal). "LS" refers to the RN of the LFSR's internal state. "SC" refers to a copy of some previous state of the LS.

So to recap, the entire system basically consists of a CP, an LS, and an SC. Now here's how the algorithm works. We start with the input seed of arbitrary length and copy it to both the LS and SC. Note that if the seed is zero (which is a forbidden state for our LFSR) then it simply gets incremented before the copy step. We then set our CP to LS + 1. For each cycle of the LFSR we then do the following:
(1) Compare LS with SC. If they are equal, set the boolean flag "SAVE" to "true" and increment the RN of our CP. If the increment overflows, increase the bit-width of the LFSR by one machine word and adjust the LFSR to account for the carry.
(2) "Feed" the LS into CP per standard LFSR operation.
(3) IF "SAVE" is set to "true" copy LS to SC.

So SC serves to detect that our LFSR has produced a maximum length sequence for that particular CP (remember, much like a counter, an LFSR's CP never produces the same number twice unless it is about to repeat the entire sequence).

Anyway, several of these cycles (at least two) will run during the output-bit-determination step (ie: the "self-shrinking generator").

In conclusion, these very simple modifications yield a mechanism that demonstrates some truly remarkable properties indeed. Namely, an inexhaustible supply of high-quality random bits delivered at reasonably efficient output rates.

lur...@gmail.com

unread,

Apr 30, 2012, 1:13:12 AM4/30/12

to

On Sunday, April 29, 2012 10:06:56 AM UTC-5, unruh wrote:
> Please edit your line lengths.
>

Mac since their beginning in 1984 did not use line lengths as simplistic HTML does not either. In fact, there is no default easy way to do what you request. Macs never had "word-wrap." Format variations over the years still show up...WTShaw

Greg Rose

unread,

Apr 30, 2012, 1:39:19 AM4/30/12

to

In article <11019480.134.1335762792066.JavaMail.geo-discussion-forums@vbkv21>,

Funnily enough, RFC 977, Network News Transfer
Protocol, which is what we're using now, talks
about, you know, typing return occasionally.
It really isn't difficult.

Greg.
--

orz

unread,

Apr 30, 2012, 1:43:03 PM4/30/12

to

I replied in the other thread (the one on sci.crypt.random-numbers),
but this one seems more popular so I may as well post here too.

The gist is, there are several problems:

An infinite period doesn't actually add any value.

LFSRs with that kind of self-shrinking applied fail general purpose
statistical tests. I tested two LFSR/GFSRs of very different sizes,
with and without shrinking, on the PractRand core battery of tests:
xorshift64
description: a 64 bit "xorshift" LFSR from one of Marsaglias papers
performance without shrinking: failed after 8 MB
performance with shrinking: failed after 32 MB
mt19937_unhashed
description: the mersenne twister GFSR, but without the
"tempering" (aka output hashing)
performance without shrinking: failed after 32 GB
performance with shrinking: failed after 1 TB

Shrinking RNGs are very slow; 3 million bits per second is nothing
compared to, say, the 4 to 6 billion bits per second per core that the
high quality CSPRNG "HC-256" produces on my computer. Optimization
can help significantly, but shrinking remains slower than alternatives
no matter how much optimizing is done.

The use of a full cycles worth of each LFSR CP produces patterns of
uniformity that allows the boundary between one CP and the next to be
detected, at least when the underlieing LFSR CPs have maximal
period.

Some means of filtering out lower qualiy CPs is required, as some
randomly chosen CPs can be very very bad.

The linear nature of the sequence of CPs chosen creates problems if
you assume that enough output will be used to force CP changes,
particularly with respect to seeding.

lur...@gmail.com

unread,

May 2, 2012, 2:23:14 AM5/2/12

to

For some, it is not necessary ever unless for two to get a double space. There are three control characters that are used in line formats, two different ones that are used to end each and every line on MS Programs including Explorer. To do as you say means reverting to a habit which was long ago broken by many and never formed by many more. Your RFC is obsolete and irrelevant. The ongoing use of archaic formatting and any legacy of it is the cropped tail attempting to wag the dog.

I once did us a dedicated news reader too, but that is no longer convenient with Lord Google calling the shots. It does seem that the new Google is a mixed bag but former ways are obscure these days, even the recently departed Old Google, RIP. History stumbles forward...WTShaw
>
> Greg.
> --

Sebastian Garth

unread,

May 2, 2012, 1:51:02 PM5/2/12

to

To keep things manageable, I'll go ahead and move the remainder of the discussion to sci.crypt.random-numbers (something I should have done in the first place, yes!). I've already posted a response to Orz's comments there.