The short answer is no. The version of SNOVA submitted to round 2 uses AVX2 shuffle instructions (_mm256_shuffle_epi8) to vectorize the multiplication.
Next week at the 6th PQC Standardization Conference we will present benchmark results for a faster version of SNOVA that uses regular, 16 bit multiplications (l=4 only). This version can be found in the oddqsrc directory at https://github.com/PQCLAB-SNOVA/SNOVA (You will probably want to use the minor update which is available at https://github.com/vacuas/SNOVA)
While this version uses only C statements, the compiler will actually vectorize the code to _mm256_mullo_epi16 instructions on x86. Interestingly, when using gcc 15.2.1, the plain-C version is faster than the version using explicit AVX2 instructions.
Jan Adriaan Leegwater
SNOVA team
The question is simple. Have we used the carry-less multiplication instructions available on x86 and ARM chips? F_2, F_16 can be easily accelerated by these instructions.
These instructions were intended for GCM and XTS, but can be applied to any scheme that use binary polynomials in fact.
Thanks.
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/8236D7F3-8E8E-4301-8D62-A69E501F871A%40icloud.com.