I am aware that in the reference implementation of Dilithium the large A matrix is calculated and stored prior to each of the phases of key generation, signing and verification. This implies a large overall stack requirement (~60k) to store the entire matrix.
However in the key generation and (more importantly) verification phases the elements of A can be calculated “on the fly” as needed. This greatly reduces the memory requirement (~20k). And as far as I can see there is no performance downside in doing this. The case for signature is a little different. See https://eprint.iacr.org/2020/1278 for a discussion.
Nevertheless the general view seems to be that Dilithium requires the A matrix to be fully stored for all phases. For example https://csrc.nist.gov/CSRC/media/Events/third-pqc-standardization-conference/documents/accepted-papers/atkins-requirements-pqc-iot-pqc2021.pdf mentions the perceived large stack requirement to be a reason not to recommend Dilithium. And its a fair point – may IoT nodes would struggle to accommodate that amount of stack memory.
And a recent paper, which describes a highly optimized implementation of Dilithium again seems to revert to the larger stack requirement for key generation and verification - https://eprint.iacr.org/2022/112
So what am I
missing? Is there some optimization that arises from pre-calculating
and storing the whole of A?
Mike Scott
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAE158z8%2BHshBUvwbfv6cfxJkUeqHRmL7D0oVfVj%3D%3D-H%3DXWfhGg%40mail.gmail.com.