In this comment we suggest some improvement for multiplication operation.
NTRU Prime algorithm uses the combined method of multiplication (Toom and Karatsuba) to multiply polynomials, which doesn’t take into account the special form of polynomials (1, -1, 0). This method is realized in function rq_mult.
A special type of polynomial may be used for multiplication, as we’ve investigated. Two arrays may be defined for a polynomial: an array with idices of positive elements and an array with indices of negative elements. The processing of arrays is parallelized.
Our method, like NTRU Prime, uses AVX2 commands, the critical code is written in assembler.
For parameters and keys that were generated by the NTRU Prime algorithm (q = 4591; p = 761; t = 125) on Linux, we got an acceleration of about 1.5 times when executing the operation of multiplying polynomials, compared to the function rq_mult.
Processor: Intel (R) Core (TM) i5-4440 CPU @3.1 GHz, Memory: 8GB
Best regards,
I. Gorbenko, E. Kachko, M. Yesina, O. Akolzina
Ukraine
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+unsubscribe@list.nist.gov.
Visit this group at https://groups.google.com/a/list.nist.gov/group/pqc-forum/.
Execution time is defined by total number of non-zero elements, this quantity is set by algorithm (t parameter). Time doesn’t depend on coefficients indices.
We didn't investigate the form (f1*f2)+f3.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
Visit this group at https://groups.google.com/a/list.nist.gov/group/pqc-forum/.
--
>> My understanding is that the authors measured their own implementation of these algorithms, not using state-of-the-art implementation techniques for this CPU.
We used NTRU Prime optimized code (AVX + assembler inserts) for speed measurement.
>> My understanding is that the claimed bottom line, "acceleration of about 1.5 times", is actually comparing a 4-core implementation to a 1-core implementation. This use of 4 cores has worse throughput, energy consumption, etc. than simply running separate computations on separate cores.
Yes, but both functions were performed on a 4-core processor, we do not see the possibility of effective vectorizing your algorithm, as the size of data being multiplied is gradually decreasing.
Thank you for recommendation, we’ll do an experiment on time and indices independence.
Best regards,
I. Gorbenko, E. Kachko, M. Yesina, O. Akolzina
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+unsubscribe@list.nist.gov.
More precise research results
1. For 10,000 keys’ maximum dispersion of all measurements is
-5.19676<=e<=6.62797(%).
The key numbers for which the minimum and maximum values were obtained when the measurements were repeated didn't match.
2. For 100 keys, the maximum dispersion is -4.142 <= e <= 6.06054 (%).
3. For 100 keys and 100 times measurements, the maximum dispersion of minima, min: -0.745778 <= e <= 0.732426 (%), maxima dispersion from the average maximum, max: -0.891563 <= e <= 0.977177 (%). Thus, the measurement error can be up to 3%.
4 At the moment, the function rand (standard C library) was used to generate the numbers of polynomials coefficients with nonzero values. In future we suppose to use for this purpose more reliable generator and perform a full statistical analysis of the results in order to identify the possibility of obtaining a key, taking into account the time difference in measurements.
Thank You!
Dear William Whyte!
Thank you very much for your positive evaluation of our work!
In accordance with your recommendations, we continued our work in the field of polynomial multiplication function for NTRU-similar algorithms that use the properties of a small polynomial and computation time for them doesn’t depend on this polynomial.
The following results were obtained.
Processor Intel (R), Core (TM) i5-4440 CPU @ 3.1 Ghz.
Programming languages: C, Assembler, 64 bits
Small polynomial representation in the form of nonzero coefficients set is:
The polynomial for NTRUPrime is (N = 761, q = 4591, d = 125)
Precalculations is 1540 tacts, depend only on the second (open) factor. The more often can be done in advance.
Multiplication is 12856 cycles.
The deviation from the mean doesn’t exceed 2.12% and no more than the measurement error.
Small polynomial representation is in the form A1 * A2 + A3
The polynomial: N = 743, D1 = 11, D2 = 11, D3 = 15, q = 2048
Multiplication is 10809 cycles. There is no precomputation.
The execution time deviation from the mean is ± 1.8559 and no more than the measurement error
Best regards,
I. Gorbenko, E. Kachko, M. Yesina, O. Akolzina
--
Dear William Whyte!
Thank you very much for your positive evaluation of our work!
In accordance with your recommendations, we continued our work in the field of polynomial multiplication function for NTRU-similar algorithms that use the properties of a small polynomial and computation time for them doesn’t depend on this polynomial.The execution time deviation from the mean is ± 1.8559 and no more than the measurement error
Best regards, I. Gorbenko, E. Kachko, M. Yesina, O. Akolzina.
Hello,
I have a question about the performance information on these chips for the tests that are being conducted. Are these chips on patched machines for the CPU vulnerabilities that have been found in the most recent dates? Those performance hits will be very significant.
I am speaking in regards to Spectre, Meltdown and the variations of the side-channel attacks that have been discovered since the initial findings as well as TLBleed.
Pertinent information would be whether or not it is a newer CPU that was designed to be protected from such attacks or patched after the discovery.
Thanks,
Kenny Herold, GWAPT
Principal Security Consultant
Telephone: +1 (612) 432 0479 | Email: ke...@odinseye.net | Web: www.odinseye.net
Address: P.O. Box 1440, Minnetonka, MN 55345 USA
--