Dilithium Implementation

Michael Scott

unread,

Feb 2, 2022, 3:06:15 PM2/2/22

to pqc-forum

I am aware that in the reference implementation of Dilithium the large A matrix is calculated and stored prior to each of the phases of key generation, signing and verification. This implies a large overall stack requirement (~60k) to store the entire matrix.

However in the key generation and (more importantly) verification phases the elements of A can be calculated “on the fly” as needed. This greatly reduces the memory requirement (~20k). And as far as I can see there is no performance downside in doing this. The case for signature is a little different. See https://eprint.iacr.org/2020/1278 for a discussion.

Nevertheless the general view seems to be that Dilithium requires the A matrix to be fully stored for all phases. For example https://csrc.nist.gov/CSRC/media/Events/third-pqc-standardization-conference/documents/accepted-papers/atkins-requirements-pqc-iot-pqc2021.pdf mentions the perceived large stack requirement to be a reason not to recommend Dilithium. And its a fair point – may IoT nodes would struggle to accommodate that amount of stack memory.

And a recent paper, which describes a highly optimized implementation of Dilithium again seems to revert to the larger stack requirement for key generation and verification - https://eprint.iacr.org/2022/112

So what am I missing? Is there some optimization that arises from pre-calculating and storing the whole of A?

Mike Scott

Vadim Lyubashevsky

unread,

Feb 2, 2022, 3:30:20 PM2/2/22

to Michael Scott, pqc-forum

Dear Michael,

Thanks for the question. Indeed, for key generation and verification,
one can generate the public key on the fly and there is no
disadvantage. For signing, because of rejection sampling, one
may need to compute several matrix-vector products with A. For this
reason, there is an advantage in storing A. But on-the-fly generation
in the signing procedure wouldn't cause too much of a slowdown if
stack space is at a premium.

Best,
Vadim

> --
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
> To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/9a6aaba2-77d4-4f51-a1b4-02bc396bff98n%40list.nist.gov.

Daan Sprenkels

unread,

Mar 8, 2022, 3:35:22 PM3/8/22

to pqc-forum, joost...@nxp.com, jopp...@nxp.com

Hey all,

When the previous emails were sent, we were actually already looking at *how much* stack space you would probably need, and what kind of impact it would have on the performance of the signing algorithm. We got the stack usage for Dilithium{2,3,5} signing down to {5.0,6.5,8.1} KiB. (We also got the verification down to 2.7 KiB.)

We did not optimize the performance of our implementation very much, but having to regenerate the matrix A on every iteration of the rejection-sampling loop seems to result in a slowdown of around 2.5 times compared to the Dilithium{2,3} implementations in PQClean.

For more info, feel free to read our new PDF at https://eprint.iacr.org/2022/323 :D

Best wishes,

Joppe, Joost, and Daan

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAE158z8%2BHshBUvwbfv6cfxJkUeqHRmL7D0oVfVj%3D%3D-H%3DXWfhGg%40mail.gmail.com.

Reply all

Reply to author

Forward