Questions regarding pruning coefficients, their determination and the stability of the pruner.

33 views
Skip to first unread message

barak zackay

unread,
May 24, 2018, 10:56:31 AM5/24/18
to fplll-devel
Hi all,
I am Barak Zackay, an astrophysics postdoc at the Institute for Advanced Study in Princeton.

First, I want to express my deep appreciation for the fplll development team, it had made using lattice algorithms much more fun.

Second, I have a few questions regarding the Pruning stage:

Background for the questions:
I have a non-general lattice that I am trying to solve for shortest vector.
Detecting the shortest (non-trivial) vector would constitute a detection of a pulsar in gamma-rays, and would be a "real-world" application of lattice reduction algorithms outside cryptography (Yey!).
One of the non-general properties of the lattice is the structure of the diagonal of R in the QR decomposition of the reduced basis, which is flat on ~"1" for many dimensions, and then, ~ 70-90 (depending on the situation) dimensions before the end, it plunges linearly (R[x,x]**2 ~1/x**2).
For the dimensions which R[x,x]~1, R is nearly diagonal, so finding/verifying the shortest vector is nearly trivial.

Also, I have good reason to believe that in many cases I have, there is an ensemble (~10**3-10**4) of short vectors of which any member I recover would immediately constitute a solution to the problem.
I may also need to encounter 10**2-10**4 ("trivial") vectors that even though they are short, they will not be considered a solution to the problem.
Last, I may have other "indicators" that are not in the lattice, for the correctness of a specific vector I recover from the lattice.

Questions:
1) The fact that I don't have a general situation R makes me think that perhaps I want to determine the pruning coefficients myself (for the final lattice enumeration, not for the BKZ reduction that precedes). What units are these coefficients?, I saw that they are all between 0 and 1, are they probabilities? multiplicative bounds? what are they?
Is this documented somewhere?

2) I tried using the pruner for determining the coefficients before enumeration, but I saw that the optimization process of the coefficients may not return when I use dimension >120
Do you have any idea why? how can I bypass this problem? [can attach an example lattice if required]
example code is attached.

3) As I mentioned, the dimensions beyond the last ~90 coordinates in the lattice I am using are all "trivial" is there a way to flag the pruning code that the coefficients for these dimensions are "trivial" as well? is there an obvious way to complete a set of pruning coefficients determined on the non-trivial part of the lattice to the trivial part?

4) When I am able to use the pruner, on inspection of the detailed cost, I see that in some stages, there are large, discontinuous jumps in the cost that seem wrong (I expect something that exponentially rises, flattens a bit and then exponentially decreases) the thing I refer to are jumps that happen 10 dimesions after the exponential drop, and it reaches 4 times the highest peak that was before. This seems highly sub-optimal, Any ideas why this should happen? 

5) Is the detailed cost already accounting for the success probability? (I suspect yes, it was relatively fixed when I used succ_p ~ 0.5 and succ_p ~ 10**-4)


Thanks!
(And again, my deep thank and appreciation for the development team.)

Barak

Example code that makes the pruner not come back:
lattice dimension was 210, start_index was 90 [start_index=100 succeeds]

import fpylll
import fpylll.fplll.pruner
def dfs_pruning_fpylll(lattice, squared_length, start_index, nr_solutions=10**4, success_prob = 1e-4):
A = array2integer_matrix(np.array(lattice))
M = fpylll.GSO.Mat(A)

# Making sure the basis is reduced
LLL_obj = fpylll.LLL.Reduction(M, delta=0.99)
LLL_obj()
n = len(lattice)

gso_r = [M.get_r(i,i) for i in range(0,n)]
enum = fpylll.Enumeration(M,nr_solutions=nr_solutions, strategy=fpylll.EvaluatorStrategy.BEST_N_SOLUTIONS)

# Does this parameter change anything?
preproc_cost = 2 ** 20

#devising a pruning strategy
pr0 = fpylll.fplll.pruner.prune(squared_length, preproc_cost, success_prob, [gso_r[start_index:]],descent_method="gradient", float_type="long double")

(Later an enumeration comes)
 



Leo Ducas

unread,
May 25, 2018, 1:15:19 AM5/25/18
to fplll-devel
Dear Barak,


First, I want to express my deep appreciation for the fplll development team, it had made using lattice algorithms much more fun.

Thank you very much.
 
I have a non-general lattice that I am trying to solve for shortest vector.

Can you specify what is meant by ``non-general'' ?

 
Detecting the shortest (non-trivial) vector would constitute a detection of a pulsar in gamma-rays,
and would be a "real-world" application of lattice reduction algorithms outside cryptography (Yey!)

Yey indeed !
 
For the dimensions which R[x,x]~1, R is nearly diagonal, so finding/verifying the shortest vector is nearly trivial.

Hum, again, can you be more specific ? More generally, and for all that follows, it would be nice to have a minimal piece of code to reproduce all the issues you mention.
 
Questions:
1) The fact that I don't have a general situation R makes me think that perhaps I want to determine the pruning coefficients myself (for the final lattice enumeration, not for the BKZ reduction that precedes). What units are these coefficients?, I saw that they are all between 0 and 1, are they probabilities? multiplicative bounds? what are they?
Is this documented somewhere?

Most of the documentation is availaible in the source code:
https://github.com/fplll/fplll/blob/master/fplll/pruner.h

 

2) I tried using the pruner for determining the coefficients before enumeration, but I saw that the optimization process of the coefficients may not return when I use dimension >120
Do you have any idea why? how can I bypass this problem? [can attach an example lattice if required]
example code is attached.

It may just be slow, or the pruner could be bugged on many ways, or stuck in a loop due to numerical stability, or .... You can try to use weaker descent algorithm (eg just the GREEDY method), but your enumeration will be less powerful. You can increase the numerical precision, but that would be slower.
 

3) As I mentioned, the dimensions beyond the last ~90 coordinates in the lattice I am using are all "trivial" is there a way to flag the pruning code that the coefficients for these dimensions are "trivial" as well? is there an obvious way to complete a set of pruning coefficients determined on the non-trivial part of the lattice to the trivial part?

In principle it should be possible to exploit that indeed. Unfortunately, I don't think this can be precisely captured by the pruning we implemented... The optimal solution may require dedicated code. How large are the intances you have in mind, and how many ``trivial'' vectors do they have ?
 

4) When I am able to use the pruner, on inspection of the detailed cost, I see that in some stages, there are large, discontinuous jumps in the cost that seem wrong (I expect something that exponentially rises, flattens a bit and then exponentially decreases) the thing I refer to are jumps that happen 10 dimesions after the exponential drop, and it reaches 4 times the highest peak that was before. This seems highly sub-optimal, Any ideas why this should happen? 

Would you share a plot ?
 

5) Is the detailed cost already accounting for the success probability? (I suspect yes, it was relatively fixed when I used succ_p ~ 0.5 and succ_p ~ 10**-4)

No, cost is given for a single enumeration. The cost factor (cost/success) is computable by the user.
 
(And again, my deep thank and appreciation for the development team.)

Much appreciated.
 
Example code that makes the pruner not come back:
lattice dimension was 210, start_index was 90 [start_index=100 succeeds]

LLL_obj = fpylll.LLL.Reduction(M, delta=0.99)
LLL_obj()

May I suggest to run a stronger pre-processing after LLL. BKZ-60 maybe (using the default strategies of fplll/fpylll)

# Does this parameter change anything?
preproc_cost = 2 ** 20

Yes, it can. The pruner was devise to optimise (Enum_cost + Preproc_cost)/(success_proba).
 
Best
-- Leo

barak zackay

unread,
May 25, 2018, 11:06:43 AM5/25/18
to Leo Ducas, fplll-devel
Dear Leo,
thanks for the reply.
Attached are the plots of the diagonal of R and of the detailed cost from the coefficients that were derived for it.
I also added all the parameters that are required to produce the non-coming back of the pruner.

I still do not understand what is the meaning of the pruning coefficients.  I am positive that this does not appear in the code documentation.
How are they being used by the Enumeration algorithm to determine which vector is to pass and which is to fail ?
How is the cost of a coefficient set is being computed in advance? [In a different problem I had in the past, I could not solve this analytically and had to resort to quite complicated randomization procedure]
Reading code is much easier when you know the basic definitions and meaning of things.

For your questions:

"
In principle it should be possible to exploit that indeed. Unfortunately, I don't think this can be precisely captured by the pruning we implemented... The optimal solution may require dedicated code. How large are the intances you have in mind, and how many ``trivial'' vectors do they have ?
"
Actually there is an important use case for me in which I would want to utilize up to 10000 trivial dimensions. 
This would be just to verify a solution out of the enumeration tree. 
An important use case for me in the future would be to be able to get ~10**10 - 10**12 options from the non-trivial part of the lattice, and verify all of them on the 10000 dimensions.
Do you have any idea on how it would be possible to obtain such functionality from the Enum class?
[For example, If I could enumerate from the outside on the values of the (lets say) first 5 coordinates (the bottom of the enumeration tree) I could let the Enum class enumerate the rest, and return ~10**4-10**5 options from each instance, which is a very manageable memory size. Writing this, I think I know how to do this, I just need to set a CVP target and use all dimensions but the last few :)]

(These 10000 dimensions are to be thought of equations with substantially higher probability to be wrong. They are not so useful for the lattice enumeration part, but they can successfully verify a candidate solution. These would be used in the situation in which the high-probability equations are not containing enough information to find the shortest vector in the lattice, but contain enough information to reduce the problem to a full blown brute-force enumeration on everything that passes the "good equations".

"
No, cost is given for a single enumeration. The cost factor (cost/success) is computable by the user.
"
Then I don't understand, I use the last 110 coordinates of R that appears below. I reduce the success prob from 0.5 to 10**-4
the cost had only declined by a factor of ~8. I think I can prove that the cost should decline by at least a factor of 0.5 * 10**4 in this case.

"
Yes, it can. The pruner was devise to optimise (Enum_cost + Preproc_cost)/(success_proba). 
"
So the enumeration strategy also knows it prob of success? What if I set it from the outside? would it change?

Thanks!
Barak

Attachements:
----------------
If you want to reproduce the failure of the pruner, I figured that the only thing you need from me is the diagonal of r:
The target radius is:
target_squared_length = 6.130221594365229e+31 * n_enum_dimensions

n_enum_dimensions = 110 for the largest diagonal that came back from the pruner (comes back after less than 1 second).
n_enum_dimensions =120 already makes the pruner never come back

I feed the pruner with r_diagonal[-n_enum_dimensions:]

r_diagonal = np.array([  5.76786184e+14,  -6.08278424e+14,   1.97857233e+15,
         2.60735354e+15,  -2.59593594e+15,  -2.67093182e+15,
         2.67054522e+15,   2.29357302e+15,   2.46340666e+15,
        -2.57728176e+15,   2.49599066e+15,  -7.60910002e+15,
        -7.07882143e+16,   7.16221900e+16,  -7.16528730e+16,
        -7.10860387e+16,  -7.19950957e+16,  -7.10484776e+16,
         7.19830514e+16,   7.16086059e+16,  -7.01097880e+16,
        -7.15496153e+16,   7.20900268e+16,   6.98361496e+16,
         7.12785861e+16,  -6.81995994e+16,   7.22566228e+16,
         7.18765633e+16,   7.06417545e+16,  -7.15781172e+16,
         7.17205029e+16,  -7.21642889e+16,   7.13094185e+16,
        -7.12648828e+16,  -7.09315976e+16,  -7.01943383e+16,
        -7.13391831e+16,   7.17943244e+16,  -6.89905131e+16,
        -7.23094335e+16,  -7.17259435e+16,   7.04047960e+16,
         7.10989711e+16,   7.01336816e+16,   7.03816698e+16,
         7.13178468e+16,   7.09682360e+16,  -7.00474101e+16,
        -6.97353488e+16,  -7.12748154e+16,   7.01993848e+16,
         6.99954035e+16,  -7.14294597e+16,   7.20501044e+16,
        -7.14007932e+16,  -7.10695072e+16,   7.16914206e+16,
        -7.04180155e+16,   7.06359066e+16,   7.13759731e+16,
         7.14637283e+16,   7.14045290e+16,   7.17178617e+16,
         6.96950884e+16,   6.81403530e+16,   7.12255094e+16,
        -7.06095489e+16,   7.13363750e+16,   7.03844940e+16,
        -7.00586061e+16,   6.96037923e+16,   7.12250440e+16,
         6.84152384e+16,   7.12104895e+16,   6.59353345e+16,
        -7.10045114e+16,   6.90203055e+16,   7.03409494e+16,
         6.90055026e+16,  -6.98986151e+16,  -7.24395655e+16,
        -6.85554536e+16,   7.19408424e+16,  -6.86284542e+16,
         6.67139575e+16,  -7.25962681e+16,   6.91730152e+16,
        -6.85369308e+16,   6.56570936e+16,  -6.87893883e+16,
         7.22974479e+16,   6.94628093e+16,  -6.70210526e+16,
         6.58003215e+16,  -6.46537132e+16,   6.73897070e+16,
         7.06060613e+16,  -6.76022665e+16,  -6.41640282e+16,
        -7.25995702e+16,   6.11255751e+16,   6.96628992e+16,
         7.24510940e+16,  -7.14108045e+16,   7.06301217e+16,
        -6.95193070e+16,  -7.09956863e+16,   7.12038645e+16,
         7.10730788e+16,  -6.69108878e+16,  -7.04737636e+16,
         6.99219739e+16,  -6.76741845e+16,   6.82369756e+16,
        -8.17539245e+16,  -8.18140094e+16,   9.07353302e+16,
        -9.01055027e+16,   7.89282801e+16,  -7.88984419e+16,
        -8.47658527e+16,   9.18905512e+16,   8.54884428e+16,
        -5.50549631e+16,  -7.96513007e+16,  -8.86342547e+16,
         7.67310894e+16,  -8.49671029e+16,   8.29673617e+16,
         5.60996199e+16,  -4.01955356e+16,  -4.62291275e+16,
        -6.02546099e+16,   6.16958110e+16,   5.45712247e+16,
        -6.52936721e+16,   5.63476821e+16,   5.44474022e+16,
         4.33660769e+16,  -5.50265672e+16,   6.36869172e+16,
        -5.98555379e+16,  -5.62932640e+16,   6.22108693e+16,
        -3.92684880e+16,   5.79469549e+16,   3.13449864e+16,
         3.28175660e+16,  -5.74410232e+16,   3.05952466e+16,
        -5.58525801e+16,  -2.27776725e+16,  -5.48616093e+16,
         5.26860897e+16,   4.15402498e+16,  -4.41710945e+16,
        -5.14190923e+16,  -3.81590222e+16,  -5.17110701e+16,
        -3.45228631e+16,  -2.59139627e+16,   3.74460488e+16,
        -3.40081691e+16,  -2.48727706e+16,  -4.60221018e+16,
        -3.21823361e+16,  -2.28384823e+16,  -1.02391905e+16,
        -2.26699475e+16,   3.42808355e+16,   1.91754259e+16,
         1.50283469e+16,  -2.33465977e+16,   1.75374474e+16,
         2.38769162e+16,   2.85921760e+16,  -1.89745346e+16,
        -2.62786425e+16,   7.60930907e+15,   1.74090221e+16,
         2.23634297e+16,   2.50220124e+16,  -1.07848518e+16,
        -1.29309125e+16,  -4.19202837e+15,  -2.08891634e+16,
        -3.23948749e+15,  -1.22349527e+16,  -1.50026461e+16,
        -1.77421078e+16,  -1.27341412e+16,  -1.16448090e+16,
        -1.81560877e+16,   1.58806654e+16,  -1.20470454e+16,
        -9.12296667e+15,  -5.49187956e+15,  -1.17695843e+15,
         9.46119891e+15,   5.74057325e+15,   4.58373133e+15,
        -9.38499364e+15,  -2.00236937e+15,  -2.81880354e+15,
         8.48984212e+15,   3.70058175e+15,   2.84615429e+14,
        -2.13190000e+15,   8.66225213e+14,   4.53141387e+15,
         3.24828278e+15])

--
You received this message because you are subscribed to the Google Groups "fplll-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fplll-devel+unsubscribe@googlegroups.com.
To post to this group, send email to fplll...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fplll-devel/5cafa234-3022-459f-b47c-53531267a0c7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

detailed_cost.png
r_diagonal.png

Leo Ducas

unread,
May 25, 2018, 2:46:37 PM5/25/18
to fplll-devel
Dear Barak,

the meaning of pruning coefficients is as in the paper:
https://pdfs.semanticscholar.org/b66e/9b89704c7fb8d0447b3f6229bd4911b1805d.pdf

That is, if the squared enum radius is R^2, only vectors whose projection on the i...n
last coordinates have squared length less than c_i * R^2 will be considered.

The success probability is computed according to heuristisc, and is supposed to be the
probability that the returned vector is indeed the shortest. Another `metric' is available
to compute/optimize for the expectation of the number of returned point as a function of R,
and of the pruning coefficients c_i. The ``probability space'' is the input basis. So, if success
probability is too small, one is supposed to restart from scratch, by applying a random
unimodular transformation on the basis, and LLL/BKZ reduce it again before starting a
new enumeration.

The shape of the detailed cost function you show is indeed extremely pathologic, but
I must also say that the shape of the input basis seems rather unusual compared to what we
are typically targeting: your lattice does not look at all like a random lattice, and we
have certainly not tested pruning beyond random lattice. As this is a heuristic technique,
it is plausible that pruning is not really adapted to your input.

As you do mention considering lattices with up to 10000 dimensions, I am starting to
wonder if lattice algorithm can really solve the problem you are interested in. Indeed,
even computing the Gram-Schmidt Orthogonalization of such a large lattice sounds
terribly costly, and this is a fundamental routine for all lattice algorithms. Yet, at the
same time you refer to these dimension as trivial, so I suspect they could be brought
out of the picture somehow. Again, I need to better understand what is precisely your
problem.

To give you an idea, you can reasonably hope to run LLL up to dimension 1000 (with patience
and care), and enumeration up to dimension 100 (for typical random lattices an for an
enumeration radius close to the ``Gaussian heuristic).

Maybe we should rethink a bit all this from scratch. You seem to have a search problem
that you translate to a short vector problem in a lattice. Could you describe this initial
problem and explain how you translate it into SVP/CVP problem ?

Best
-- Leo

barak zackay

unread,
Jun 1, 2018, 10:52:40 AM6/1/18
to Leo Ducas, fplll-devel
Dear Leo,

Regarding the reduction to SVP, I can assure you it is well motivated. I appreciate your offer to reexamine it.
I suggest that we take it offline, as this is yet unpublished work, and I think that it is probably not so interesting to the fplll-development forum.
It would also take some time, but I will share a draft with you when the details would be finalized (with the current version, that is already working for very interesting use-cases)

A small comment regarding the LLL dimension:
Indeed, I do not consider running LLL on bases with dimension more than few x 100.
The reason is actually that the lattice reduction essentially cannot use more than ~75 vectors in a non-trivial fashion.
This is because there are trivial unity vectors in the lattice, with role to do mod 1 to a coordinate, and that the noise is expected to settle on a constant value per coordinate => at some n, the shortest possible vectors in a subspace of dim > n are all trivial.
In the lattice below, the number "75" arises as 75 ~ 12 * (100^(30/75)), where 10^30 is roughly the number of non-trivial vectors in the lattice. When the problem I have has much more entropy it can get to ~90 non-trivial vectors.
=> There is no reason to reduce 1000 dimensions using LLL

[Small comment:
The knowledge that all the vectors at the start of the matrix are orthogonal to each other may allow me to substantially improve over LLL in this context, as I can choose the vector closest to the non-trivial horizon from a big set of vectors.
This is definitely a direction I would want to explore in the future.
Because right now, with the enumeration of fplll it is possible that I can probably recover all the lattice solutions I need, so I will wait with such a radical development.]

Barak






--
You received this message because you are subscribed to the Google Groups "fplll-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fplll-devel+unsubscribe@googlegroups.com.
To post to this group, send email to fplll...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages