Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

20 views

Skip to first unread message

Jun 24, 2002, 3:44:18â€¯AM6/24/02

to

Alain Picard wrote:

>

> Fellow lispers,

>

> I needed a simple, in-lisp (i.e. no FFI) encryption for an

> application, so I wrote the following. Perhaps it will be

> of some use to someone else.

>

> I'm posting this here because I don't have a public FTP site, and

> I'm not too sure where would be a good place to send this to.

> Certainly, either the CCLAN or CCLOCC (or both!) people are welcome

> to incorporate this into their offerings, if they desire.

>

> My thanks to Pierre Mai for posting an excellend MD5 implementation

> in lisp, from which I learned many things which went into this code.

I could offer you to include it to my "Cryptography in Common Lisp" site

on http://www.dataheaven.de if you want.

It is rather minimalistic but it will certainly grow in future. I've

contributed my "in-lisp" RSA implementation, math utilities and my binding

to CL-SSL.

Some months ago I worked on a version of ARC4 (A stream cipher) but I have

to look it up on my harddisk first.

I remember there were implementations of RIPEMD and SHA too but I don't

remember the details.

ciao,

Jochen

Jun 24, 2002, 11:40:37â€¯AM6/24/02

to

A Common Lisp implementation of ARC4 can be found on

http://www.xs4all.nl/~cg/ciphersaber/

best regards

Roland Kaufmann

Jun 25, 2002, 9:47:40â€¯AM6/25/02

to

Jochen Schmidt <j...@dataheaven.de> writes:

> I could offer you to include it to my "Cryptography in Common Lisp" site

> on http://www.dataheaven.de if you want.

This is great :-)

I just implemented both blowfish and RSA myself (mostly because I

needed something small and light which could be cross-platform

portable without having to depend on external libraries), so I don't

need any of them right now, but whenever I might find some spare time,

I'll have a look and see if I can contribute anything at all to what

you guys have already done!

--

(espen)

Jun 25, 2002, 5:27:29â€¯PM6/25/02

to

Espen Vestre wrote:

I really would be interested in seeing your modular exponentiation function

as this is the major bottleneck in my RSA implementation.

I think a fast lowlevel tuned modular exponentiation function would be a

good enhancement for CLs numeric function repertoire. Many cryptographic

algorithms make use of that. It is possible to implement it fast by using

things like "montgomery multiplication" which enables fast modular

exponentiation.

Jun 25, 2002, 7:28:31â€¯PM6/25/02

to

(defun expt-mod-n (a x n)

(let ( (result 1) )

(while (> x 0)

(if (oddp x)

(setf result (mod (* result a) n)))

(setf x (ash x -1))

(setf a (mod (* a a) n)))

result))

(let ( (result 1) )

(while (> x 0)

(if (oddp x)

(setf result (mod (* result a) n)))

(setf x (ash x -1))

(setf a (mod (* a a) n)))

result))

In article <afan3a$k3m$1...@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt

Jun 26, 2002, 3:53:39â€¯AM6/26/02

to

Jochen Schmidt <j...@dataheaven.de> writes:

> I really would be interested in seeing your modular exponentiation function

> as this is the major bottleneck in my RSA implementation.

It's not fast. It's the main bottleneck in my implementation, too.

> I think a fast lowlevel tuned modular exponentiation function would be a

> good enhancement for CLs numeric function repertoire. Many cryptographic

> algorithms make use of that. It is possible to implement it fast by using

> things like "montgomery multiplication" which enables fast modular

> exponentiation.

I whole-heartedly agree!

--

(espen)

Jun 26, 2002, 4:28:23â€¯AM6/26/02

to

Erann Gat wrote:

> (defun expt-mod-n (a x n)

> (let ( (result 1) )

> (while (> x 0)

> (if (oddp x)

> (setf result (mod (* result a) n)))

> (setf x (ash x -1))

> (setf a (mod (* a a) n)))

> result))

I rewrote it using (loop while (> x 0) ...)

because while is not part of ANSI CL.

I also added declarations.

(defun expt-mod-n (a x n)

(declare (optimize (speed 3) (safety 0)))

(let ((result 1))

(loop while (> x 0)

do (if (oddp x)

(setf result (mod (* result a) n)))

(setf x (ash x -1))

(setf a (mod (* a a) n)))

result))

I compared it to mine:

(defun expt-mod (n exponent modulus)

"As (mod (expt n exponent) modulus), but more efficient."

(declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))

(loop with result = 1

for i of-type fixnum from 0 below (integer-length exponent)

for sqr = n then (mod (* sqr sqr) modulus)

when (logbitp i exponent) do

(setf result (mod (* result sqr) modulus))

finally (return result)))

Of course the result of integer-length can be bigger than a fixnum but

not in application to the cryptographic algorithms I used it for.

A more general CL implementation can test the ranges of it's parameters

first and then choose the best (and more important: a correct)

implementation.

The difference in LispWorks 4.2.6 was significant. (With mine being faster).

Here is my reasoning why this is so:

I think that the use of logbitp and integer-length enables the

implementation to reduce consing here. The ASH will allocate a new bignum

per cycle in your code while increasing the bitcounter i in mine will not.

There are two optimizations I see which would make modular exponentation

faster. One is reusing allocated bignums. The result does not have to get

allocated over and over. The same counts for sqr. Both result, sqr and

intermediate results are never bigger than twice of the size of the modulus.

It would be nice if one could preallocate two "bignum-buffers" (for sqr and

the result) and then let the arithmetic operations use those buffers.

I'm not sure how far dataflow optimization can deduce such reuse of bignums

and if any implementation does something like that.

The other optimization would be to use montgomery representation for the

numbers. This would mean having a conversion step at the beginning and then

using "montgomery-multiplication" instead of (mod (* ...) modulus). If the

result is calculated one converts it from montgomery representation to

normal representation back. This allows one to leave out the MODs.

Jun 26, 2002, 12:32:37â€¯PM6/26/02

to

In article <afbtqb$aju$1...@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt

<j...@dataheaven.de> wrote:

<j...@dataheaven.de> wrote:

> I rewrote it using (loop while (> x 0) ...)

> because while is not part of ANSI CL.

Yes, but it should be ;-)

Indeed. In MCL the difference is a factor 10. Zowie!

> Here is my reasoning why this is so:

> I think that the use of logbitp and integer-length enables the

> implementation to reduce consing here. The ASH will allocate a new bignum

> per cycle in your code while increasing the bitcounter i in mine will not.

Sounds plausible.

E.

Jun 26, 2002, 12:42:09â€¯PM6/26/02

to

In article <864rftp...@gondolin.local.net>, Alain Picard

<apicard+die...@optushome.com.au> wrote:

<apicard+die...@optushome.com.au> wrote:

> ;;;; Blowfish Constants.

> ;;

> ;; These values taken from constants.txt,

> ;; at counterpane.com, in the description of blowfish.

> ;; They're actually the digits of pi, in Hex.

> ;;

Here's a wizzy piece of code that will allow you to compute these values

instead of having them clutter up your source. (It's also less

error-prone.)

(defun compute-pi-hex (n &aux (p 0) r)

(dotimes (i n)

(incf p (- (/ 4 (+ 1 (* 8 i)))

(/ 2 (+ 4 (* 8 i)))

(/ 1 (+ 5 (* 8 i)))

(/ 1 (+ 6 (* 8 i)))))

(multiple-value-setq (r p) (truncate p 16))

(format t "~X" r)

(if (= (mod i 8) 1) (princ #\space))

(setf p (* p 16))))

FWIW, here's a decimal pi-computer as well:

(defun compute-pi-decimal (n &aux (p 0) r)

(dotimes (i n)

(incf p (/ (- (/ 4 (+ 1 (* 8 i)))

(/ 2 (+ 4 (* 8 i)))

(/ 1 (+ 5 (* 8 i)))

(/ 1 (+ 6 (* 8 i))))

(expt 16 i))))

(dotimes (i n)

(multiple-value-setq (r p) (truncate p 10))

(format t "~X" r)

(if (= (mod i 10) 1) (princ #\space))

(setf p (* p 10))))

E.

Jun 27, 2002, 4:05:47â€¯AM6/27/02

to

g...@jpl.nasa.gov (Erann Gat) writes:

> In article <864rftp...@gondolin.local.net>, Alain Picard

> <apicard+die...@optushome.com.au> wrote:

>

> > ;;;; Blowfish Constants.

> > ;;

> > ;; These values taken from constants.txt,

> > ;; at counterpane.com, in the description of blowfish.

> > ;; They're actually the digits of pi, in Hex.

> > ;;

>

> Here's a wizzy piece of code that will allow you to compute these values

> instead of having them clutter up your source. (It's also less

> error-prone.)

>

[SNIP]

Thanks, but no thanks. If Bruce Schneier made a mistake in the constants,

at least he and I (and all other implementors in the world) are making

the same mistake. :-)

Ob. conjecture: I'm guessing that PI is used to initialize the s-boxes

because there was long standing suspicion in the cryptographic community

that NSA had diddled the s-boxes of DES (presumably to give themselves

a back door). Bruce probably thought that using PI only gives God a

back door.

Ob lisp: I've metered my code a bit more, and CMU is > 120X faster

that Lispworks. Those kernel:32bit-logical-xor functions sure do

make a difference. I've been reading the fast modulo exponentiation

thread with interest; there doesn't seem to be an API in LW to do

this 32bit diddling, and even if there were, I suspect all the return

values would still have to be coerced to bignums, sort of defeating the

purpose. I'm starting to think CMUCL is pretty cool.

Jun 27, 2002, 12:49:17â€¯PM6/27/02

to

In article <86n0thn...@gondolin.local.net>, Alain Picard

<apicard+die...@optushome.com.au> wrote:

<apicard+die...@optushome.com.au> wrote:

> g...@jpl.nasa.gov (Erann Gat) writes:

>

> > In article <864rftp...@gondolin.local.net>, Alain Picard

> > <apicard+die...@optushome.com.au> wrote:

> >

> > > ;;;; Blowfish Constants.

> > > ;;

> > > ;; These values taken from constants.txt,

> > > ;; at counterpane.com, in the description of blowfish.

> > > ;; They're actually the digits of pi, in Hex.

> > > ;;

> >

> > Here's a wizzy piece of code that will allow you to compute these values

> > instead of having them clutter up your source. (It's also less

> > error-prone.)

> >

> [SNIP]

>

> Thanks, but no thanks. If Bruce Schneier made a mistake in the constants,

> at least he and I (and all other implementors in the world) are making

> the same mistake. :-)

Well, you can check the algorithm against Bruce's constants once to

convince yourself that he got them right, then use the algorithm to

protect yourself against future typos and bit-rot. But whatever.

E.

Jun 28, 2002, 4:01:05â€¯AM6/28/02

to

g...@jpl.nasa.gov (Erann Gat) writes:

> > Thanks, but no thanks. If Bruce Schneier made a mistake in the constants,

> > at least he and I (and all other implementors in the world) are making

> > the same mistake. :-)

>

> Well, you can check the algorithm against Bruce's constants once to

> convince yourself that he got them right, then use the algorithm to

> protect yourself against future typos and bit-rot. But whatever.

...but isn't the usage of pi in blowfish pretty random anyway (just

used as a useful non-repeating pattern)?

--

(espen)

Jun 28, 2002, 4:13:02â€¯AM6/28/02

to

Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes:

> ...but isn't the usage of pi in blowfish pretty random anyway (just

> used as a useful non-repeating pattern)?

Of course it is (*), here's a quote from B. Schneier:

I chose the digits of pi as the initial subkey table for two

reasons: because it is a random sequence not related to the

algorithm, and because it could either be stored as part of the

algorithm or derived when needed. There is nothing sacred about pi;

any string of random bits--digits of e, RAND tables, output of a

random number generator--will suffice.

(http://www.counterpane.com/bfsverlag.html)

(*) I apologize for posting before I had checked my bookmarks.

--

(espen)

Jun 28, 2002, 10:40:27â€¯PM6/28/02

to

In article <afbtqb$aju$1...@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt

<j...@dataheaven.de> wrote:

<j...@dataheaven.de> wrote:

> The other optimization would be to use montgomery representation for the

> numbers. This would mean having a conversion step at the beginning and

> then

> using "montgomery-multiplication" instead of (mod (* ...) modulus). If

> the

> result is calculated one converts it from montgomery representation to

> normal representation back. This allows one to leave out the MODs.

In C, it is certainly true that the Montgomery representation speeds up

modular exponentiation dramatically. In Common Lisp, it's more murky.

Montgomery representation lets you do modular multiplies with shifts

instead of divides for the modulus operation. But in Common Lisp, shifts

of bignums typically cons, and may not be much faster than divides.

It depends on your implementation.

Other optimizations are worth trying too. For example, the Karatsuba

formulation is a good way to optimize multiplication of large numbers in

general, regardless of modulo operations (Montgomery only applies to

modular multiplication). Karatsuba is elegantly recursive and

substitutes two additions for one of the four needed sub-multiplications

at each step. (Do a Google search on Karatsuba for more information).

Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]

operations. CMUCL already uses Karatsuba internally for large bignum

multiplications, but other CLs may not.

Squaring is another avenue for optimization. It's possible to formulate

squaring to be faster than general multiplication, and since

exponentiation requires squaring, speeding up squaring speeds up

exponentiation.

See http://members.tripod.com/careybloodworth/karatsuba.htm

for a good description of how to go one step further and combine the

Karatsuba reformulation trick with squaring, which makes squaring faster

still.

While it's possible to code all these optimizations in Lisp--and it's

an enlightening exercise to do so--such an approach will probably not

help the speed of your code much unless you're using a very smart

compiler that knows when how to avoid excess consing.

If you have LAP available, however, they could make a real difference.

--

Shannon Spires

Jun 29, 2002, 6:02:37â€¯AM6/29/02

to

Shannon Spires wrote:

> In article <afbtqb$aju$1...@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt

> <j...@dataheaven.de> wrote:

>

>> The other optimization would be to use montgomery representation for the

>> numbers. This would mean having a conversion step at the beginning and

>> then

>> using "montgomery-multiplication" instead of (mod (* ...) modulus). If

>> the

>> result is calculated one converts it from montgomery representation to

>> normal representation back. This allows one to leave out the MODs.

>

> In C, it is certainly true that the Montgomery representation speeds up

> modular exponentiation dramatically. In Common Lisp, it's more murky.

> Montgomery representation lets you do modular multiplies with shifts

> instead of divides for the modulus operation. But in Common Lisp, shifts

> of bignums typically cons, and may not be much faster than divides.

> It depends on your implementation.

Yes of course. I actually meant to implement Montgomery multiplication

using lowlevel means of particular implementations (for example access to

the bignum vector).

> Other optimizations are worth trying too. For example, the Karatsuba

> formulation is a good way to optimize multiplication of large numbers in

> general, regardless of modulo operations (Montgomery only applies to

> modular multiplication). Karatsuba is elegantly recursive and

> substitutes two additions for one of the four needed sub-multiplications

> at each step. (Do a Google search on Karatsuba for more information).

> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]

> operations. CMUCL already uses Karatsuba internally for large bignum

> multiplications, but other CLs may not.

>

> Squaring is another avenue for optimization. It's possible to formulate

> squaring to be faster than general multiplication, and since

> exponentiation requires squaring, speeding up squaring speeds up

> exponentiation.

Hm... this may be interesting too - I remember reading something about that

the Handbook of applied Cryptography.

> See http://members.tripod.com/careybloodworth/karatsuba.htm

> for a good description of how to go one step further and combine the

> Karatsuba reformulation trick with squaring, which makes squaring faster

> still.

>

> While it's possible to code all these optimizations in Lisp--and it's

> an enlightening exercise to do so--such an approach will probably not

> help the speed of your code much unless you're using a very smart

> compiler that knows when how to avoid excess consing.

Yes as I said one has to solve the consing problem first. A

WITH-BIGNUM-BUFFERS which enables the compiler to reuse the bignum buffers

could be a good idea (combined with dataflow analysis in the body of the

macro).

> If you have LAP available, however, they could make a real difference.

Whats LAP?

Jun 30, 2002, 4:50:16â€¯PM6/30/02

to

In article <afk0f1$3s$1...@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt

<j...@dataheaven.de> wrote:

<j...@dataheaven.de> wrote:

> Shannon Spires wrote:

> > Squaring is another avenue for optimization. It's possible to formulate

> > squaring to be faster than general multiplication, and since

> > exponentiation requires squaring, speeding up squaring speeds up

> > exponentiation.

>

> Hm... this may be interesting too - I remember reading something about

> that

> the Handbook of applied Cryptography.

Here's why. Pardon me if parts of this are too elementary, but I want

to be clear for the benefit of readers who might not have studied

multiplication algorithms.

Notice that if you think of bignum multiplication as

polynomial multiplication (which is what scalar multiplication

really is at core), you could express (* A B) as the two-digit

polynomials

A2r + A1

times

B2r + B1

where r is your base or digit size. For computation, it's usually most

convenient to make r a power of 2. (When one writes an ordinary decimal

number, r is just ten and it and the "+" signs are implicit.)

So multiplication is then of course

A2B2r^2 + A1B2r + B1A2r + A1B1 [1]

which is four sub-multiplies (A2B2, A1B2, B1A2, and A1B1) plus some

additions plus some shifts if we choose r to be a power of 2.

Thus our sub-multiplications (presumably the most expensive

sub-operations) are O[n^2], where n is the number of digits. One could

make r smaller and increase the number of digits; the multiplication is

still O[n^2]. But I chose two digits for reasons that will become clear.

Notice that each of these four multiplications could be reformulated

exactly the same way with a new r that is half the length of the old r.

Thus multiplication itself is inherently recursive. This buys you no

performance; it's just an interesting observation at this point,

especially when you contrast it with the manifestly non-recursive

algorithm we were all taught for multiplication in grade school.

Now let's examine squaring by the same mechanism.

(* A A) becomes

A2r + A1

times

A2r + A1

which is

A2A2r^2 + A1A2r + A1A2r + A1A1

or

A2^2r^2 + 2A1A2r + A1^2 [2]

aha! Only 3 multiplications now, and one fewer addition. Thus squaring

is already faster than multiplication.

But we're just getting started.

By now, your recursion detector should be lit up bright red, and you've

noticed that two of these multiplications are recursive calls to our

square routine itself. The third is a general-purpose multiply that we

cannot do anything about (just yet). So not only do we need 3/4 as many

sub-multiplies, but two-thirds of _those_ need only 3/4 as many

sub-sub-multiplies, etc.

Thus, without any Karatsuba tricks, we can formulate squaring

recursively to be faster than general multiplication. Providing we can

do so without the recursion itself adding too much overhead.

The key further insights of the Karatsuba mechanism are that

a) You can reformulate [1] above to have one fewer multiplication than

it has now, and if you implement it recursively, each sub-multiply gets

the same advantage. Now, recursively expressing general multiplication

_does_ buy you something. To wit:

A2B2(r^2+r) + (A2-A1)(B1-B2)r + A1B1(r+1) [3]

b) You can reformulate [2] above to make all _three_ sub-multiplies into

recursive square calls, which makes things even faster than either [2]

or [3] if you're squaring:

A2^2(r^2+r) - ((A2-A1)^2)r + A1^2(r+1) [4]

Karatsuba imparts overhead of its own in the form of additional

additions, shifts, and recursive calls. Reminiscent of Quicksort, it

typically pays off only on fairly large numbers and one switches

to ordinary multiplications in the leaf nodes of the recursion tree.

The potential speed boost of squaring is why I wish Common Lisp

had included #'square in the spec. Although it probably only makes

much difference to cryptographers. Perhaps even better, implementers

might consider including a compiler macro to detect it in the case of

(* foo foo).

> > If you have LAP available, however, they could make a real difference.

>

> Whats LAP?

Lisp Assembly Programming. Some CL implementations (e.g. MCL and OpenMCL

and some others) allow you to write assembler using Lisp notation.

--

Shannon Spires

Jul 1, 2002, 10:00:17â€¯AM7/1/02

to

>>>>> "Jochen" == Jochen Schmidt <j...@dataheaven.de> writes:

>> Other optimizations are worth trying too. For example, the Karatsuba

>> formulation is a good way to optimize multiplication of large numbers in

>> general, regardless of modulo operations (Montgomery only applies to

>> modular multiplication). Karatsuba is elegantly recursive and

>> substitutes two additions for one of the four needed sub-multiplications

>> at each step. (Do a Google search on Karatsuba for more information).

>> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]

>> operations. CMUCL already uses Karatsuba internally for large bignum

>> multiplications, but other CLs may not.

No, CMUCL doesn't use Karatsuba. I wrote one, but it was never

incoporated into the sources, mostly because it only speeds up

multiplication if the numbers are 512 or 1024 bits or more. That was

a fast as I could get the Lisp implementation to go with

CMUCL-specific stuff.

Ray

Jul 1, 2002, 2:02:15â€¯PM7/1/02

to

Raymond Toy wrote:

While your post was certainly informative I only wanted to note that it was

not me who wrote the paragraph you quoted - It was Shannon.

Jul 1, 2002, 2:40:29â€¯PM7/1/02

to

>>>>> "Jochen" == Jochen Schmidt <j...@dataheaven.de> writes:

Jochen> Raymond Toy wrote:

>>>>>>> "Jochen" == Jochen Schmidt <j...@dataheaven.de> writes:

>>

>> >> Other optimizations are worth trying too. For example, the

>> >> Karatsuba formulation is a good way to optimize multiplication of

>> >> large numbers in general, regardless of modulo operations

>> >> (Montgomery only applies to modular multiplication). Karatsuba is

>> >> elegantly recursive and substitutes two additions for one of the

>> >> four needed sub-multiplications at each step. (Do a Google search

>> >> on Karatsuba for more information). Karatsuba can multiply in

>> >> O[N^1.585] operations instead of O[N^2] operations. CMUCL already

>> >> uses Karatsuba internally for large bignum multiplications, but

>> >> other CLs may not.

>>

>> No, CMUCL doesn't use Karatsuba. I wrote one, but it was never

>> incoporated into the sources, mostly because it only speeds up

>> multiplication if the numbers are 512 or 1024 bits or more. That was

>> a fast as I could get the Lisp implementation to go with

>> CMUCL-specific stuff.

Jochen> While your post was certainly informative I only wanted to note that it was

Jochen> not me who wrote the paragraph you quoted - It was Shannon.

Sorry, I obviously snipped the wrong part. Mea culpa.

Ray

0 new messages

Search

Clear search

Close search

Google apps

Main menu