Preliminary Neon implementation results for NTRU and NTRU Prime

196 views
Skip to first unread message

Matthias Kannwischer

unread,
Nov 3, 2021, 10:53:18 PM11/3/21
to pqc-forum, Bo-Yin Yang, Vincent Hwang, Judy Chen, Shang-Yi Yang
Dear pqc-forum,

we would like to announce some preliminary Neon implementation results for NTRU and NTRU Prime targeting the Cortex-A72.
Previous work by Nguyen and Gaj [1] uses Toom-Cook for NTRU polynomial multiplication. We replace the big x small polynomial multiplication with an NTT-based multiplication to achieve superior performance.
In addition, we have optimized the constant-time sorting.
Similar techniques apply to NTRU Prime as well. Technical details are to follow soon.

Here are our current results which may still improve and cover more parameter sets of NTRU Prime.
We used gcc 10.2.0, SUPERCOP, and a Raspberry Pi 4b (Cortex-A72) for benchmarking.

| scheme          | key gen    | encaps  | decaps  |
| --------------  | ---------- | ------- | ------- |
| ntruhps2048509  | 4,394,483  | 69,383  | 133,632 |
| ntruhps2048677  | 7,640,198  | 102,285 | 197,110 |
| ntruhps4096821  | 11,080,278 | 129,605 | 264,192 |
| ntruhrss701     | 8,174,290  | 83,460  | 221,184 |
| sntrup653       | 6,953,610  | 169,455 | 193,538 |
| sntrup761       | 7,971,864  | 185,508 | 212,981 |

For comparison, previous work on NTRU by Nguyen and Gaj using Toom--Cook multiplication [1] achieved the following:

| scheme          | key gen    | encaps  | decaps  |
| --------------  | ---------- | ------- | ------- |
| ntruhps2048509  | 4,463,227  | 130,550 | 134,141 |
| ntruhps2048677  | 7,922,450  | 199,899 | 216,929 |
| ntruhps4096821  | 11,276,921 | 246,794 | 265,220 |
| ntruhrss701     | 8,262,872  | 96,206  | 245,290 |

(We rebenchmarked using gcc 10.2.0 and SUPERCOP, so the numbers slightly differ from the paper)

For NTRU, we see a vastly reduced cost for encaps (13 - 49% fewer cycles). For decaps the reduction is smaller (0-10% fewer cycles) due to a big x big multiplication for which we are still using Toom-Cook.
We will post the details as soon as possible.

Thank you,
Ting-Yuan Chen, Vincent Hwang, Matthias Kannwischer, Bo-Yin Yang, and Nick Shang-Yi Yang

[1] https://csrc.nist.gov/CSRC/media/Events/third-pqc-standardization-conference/documents/accepted-papers/nguyen-optimized-software-gmu-pqc2021.pdf 
Reply all
Reply to author
Forward
0 new messages