Dear John,
I certainly share the ultimate goal of having refined estimates for
those costs, however my assessment of the state of the art is that it is
still not stable; in that light such effort may soon be obsolete. My
approach for the last few years was to take the problem from the other
end: improve those algorithm until we get stuck. And in fact, your
estimate for SVP via BGJ1-sieving is already obsolete and invalidated by
experiments (see paragraph (*) ).
<digression>
I wish to use this opportunity to recall the context and rationales of
the `lower bounds' introduced in NewHope [1], for which I claim no
strong theoretical foundation. They have been chosen to be both simple
and convincingly on the safe side, accounting for an educated guess of
progress to come.
Previous works based their estimates on a table from [2,Table 1]
for enumeration. I've always been a bit frustrated that this table was
very hard to reproduce, and put a lot of effort into making this state
of the art more easily reproducible (contributing a pruner to FPLLL).
Eventually, I gave up enumeration because I figured out a number of
potential practical optimizations of sieving, that I guessed would
outperform enumeration. I'm still working on pushing this line further,
but already we should all be convinced by the latest records [5] that
SVP-160 cost quite less than the 100 * 2^79.9 CPU operations given
in [2, Table 1].
Of course one could also rely on the lower bound of [2, Table 4], but
I would consider that 100 * 2^28.8 CPU operations for SVP-160 is too
far on the safe side when compared to practice. Furthermore the
difficulty of reproduction and extrapolation remained.
</digression>
Getting back to evaluating the cost of Sieving, the lower bounds
introduced in NewHope [2] did indeed ignore a number of factors.
The most significant ones being:
A: cost of inner products
B: the fact that one need to reduce vectors many times, not just once
C: gap between exact and asymptotic spherical caps / wedges volumes
Factor A has since then been tackled using XOR-POPCNT trick [7, 4, 5].
Factor B has since then been tackled using progressive-Sieving [8, 4, 5].
I can only encourage you to study factor C into more depth. My guess is
the same as Thijs' that what is relevant are caps and wedges and not
average sizes of spherical codes. In fact, the issue may be even more
subtle: for now people reasoned over the sphere, but it may be more
relevant to consider balls, as the assumption that all vectors have
the same length is convenient for asymptotic analysis, but not accurate
in practice.
(*) At last, I did not find any traces of estimations of the `dimensions
for free' [4, 5] in your scripts. This feature may explain why G6K-BGJ1 [3]
solves exact SVP-100 within 2^42 CPU operations, while your table [6] for
BGJ1 predicts 2^56.1 unspecified unit operations.
I am looking forward to see your work eventually lead to an accurate model
that would match the repeatable experiments from the literature. In PS,
please find some comments on tuning BDGL for practice so as to avoid
studying the naive version of this algorithm.
Nevertheless, I would still not recommend using those fixed estimates
directly for choosing parameters of a to-be-standardized scheme. Indeed,
a desired quality for cryptographic a standard is to hold up to its
security claims for more than 18 months, and I do not think that we are
stuck yet when it comes to practical speed-ups for those algorithms.
For example, an improvement factor of at least O(n) on time and memory
of sieving is known to apply to symmetric lattices of dimension n, but
it remains to tweak BKZ (and G6K) in such a way that it preserves some
subgroup of symmetries in its blocks. This sounds more than plausible to
me, though it is not an open coding project of mine at the moment.
Best of luck with this refined study
-- Leo Ducas
[1]
https://eprint.iacr.org/2015/1092[2]
https://www.iacr.org/archive/asiacrypt2011/70730001/70730001.pdf[3]
https://eprint.iacr.org/2015/1128[4]
https://eprint.iacr.org/2017/999[5]
https://eprint.iacr.org/2019/089[6]
https://github.com/jschanck/sieve-tables/blob/master/b.3496.csv[7]
https://eprint.iacr.org/2014/788[8]
https://eprint.iacr.org/2018/079[9]
https://github.com/jschanck/sieve-tables/blob/master/b.2075.csvPS:
When applying such a reasoning to BDGL [3], I should also warn that this
algorithms ends itself to quite some tuning (and so does BGJ1, to a lesser
extent). When we wrote [3], sieving was so slow in practice that we focused
on the simplest version achieving these asymptotic speed. In practice, it
is for example possible:
- to use less filters per trial but retry, so as to maintain memory
complexity to 2^{.2075n + o(n)}.
- to use weaker filters to increase buckets content to more than O(1)
vectors, so as to amortize the cost of bucketing by testing more pairs.