Main goal of this message: Unify notation and terminology for the
systems under discussion, to provide a framework for evaluating claims
regarding what's the same, what's different, etc. This message is
organized around a case study, namely a puzzling claim of a dividing
line between Ding's patented work and newer compressed-LPR proposals.
The unification in
https://cr.yp.to/papers.html#latticeproofs is
narrower in that it's limited to round-2 NISTPQC submissions (although
it also covers Quotient NTRU) and in that it focuses on the aspects of
the cryptosystems that arise in security proofs. To the extent that
there's an overlap in the coverage, the notation is synchronized, and
follows the known idea of using ECDH-like notation for noisy DH.
Vadim Lyubashevksy writes (email dated 2 Dec 2020 20:47:11 +0100):
> No version of the LPR paper presents "reconciliation".
Let's say Alice sends A = aG+e, and Bob sends B = Gb+d, where lowercase
letters are small quantities. Alice computes aB = aGb+ad. Bob computes
Ab = aGb+eb. Can we agree that "noisy DH" is a good name for this?
At this point Alice and Bob have computed something that's _similar_ but
not _identical_. People who say "reconciliation" in the noisy-DH context
are referring to an extra step of Alice and Bob computing the _same_
secret with the help of further communication.
The normal reconciliation pattern is that Bob sends C = Ab+M+c, where M
is in a sufficiently limited set to allow error correction: for example,
each position in M is restricted to the set {0,floor(q/2)}. Alice now
finds C-aB = M+c+eb-ad, and corrects errors to find M. Bob also finds M.
Example 1: Take the variables in R_q = F_q[x]/(x^n+1) where n is a power
of 2, and take C as Ab+M+c.
This cryptosystem, with some further restrictions on distributions etc.,
appeared in the revised version of the LPR paper (not with the name
"reconciliation", I agree) and in talk slides as early as April 2010.
This is pretty much always what people mean by "the LPR cryptosystem",
modulo the question of which number fields are considered; I'm focusing
here on the commonly used case of power-of-2 cyclotomic fields.
Example 2: Special case of Example 1, with the extra requirement of C
being in a more restricted set, so that C can be sent in less space.
This is what I call "compressed LPR". The LPR cryptosystem was in the
revised version of the LPR paper, but compressed LPR wasn't; see below
for further discussion of the history. (Another compression mechanism,
orthogonal to what I'm covering here, is to send, e.g., only 256
coefficients of C, or 256 sums of coefficients of C.)
Example 3: Special case of Example 2, where C is a deterministic
function of Ab+M: e.g., C is obtained by rounding Ab+M to a more
restricted set.
Example 4: Special case of Example 3, where M is the output of rounding
-Ab to have each position in {0,floor(q/2)}. This puts each position of
Ab+M between about -q/4 and q/4.
For definiteness let's say that C is then obtained by rounding each
position of Ab+M to {-ceil(q/8),ceil(q/8)}. The difference c = C-Ab-M
is then between about -q/4 and about q/4.
Notice that each position of the shared secret M is now communicating
one bit of information about Ab, namely whether that position of Ab is
closer to 0 or to q/2. One can also tweak the M range to communicate
more/fewer/different bits of Ab, as the next example illustrates.
Example 5: Another special case of Example 3, with the following extra
rules:
* Choose each position of M is in {0,1} to limit each position of
Ab+M to _even_ numbers.
* Round Ab+M to C where each position is 0 or 2 floor(q/4), using
an _even_ difference c between about -q/4 and about q/4.
* Choose _even_ d and e.
Now decoding M from M+c+eb-ad is simply reducing mod 2. The secret M
obtained in the end is exactly the secret Ab mod 2.
(Readers coming at all this from an FHE background will recognize the
difference between Example 5 and Example 4 as being analogous to the
difference between 2009 DGHV and 2000 Cohen/2003 Regev. But pointing to
inefficient prior art won't win a patent case; see below.)
I've chosen this progression of examples to first make C deterministic
and then make M deterministic. Some people prefer C being randomized,
for example by first adding a random error and then rounding, but
independently of this one can still make M deterministic, so that M
communicates information about Ab as in Examples 3, 4, and 5. One can
also make A and/or B deterministic; these choices are independent.
All of these examples have the following features:
(P) There's a univariate polynomial quotient ring R_q.
(A) Alice sends a noisy DH element A = aG+e in the ring.
(B) Bob sends a noisy DH element B = Gb+d in the ring.
(C) Bob sends C = Ab+M+c in the ring.
(D) Alice recovers the shared secret M.
A+B are noisy DH, allowing Alice and Bob to _approximately_ agree on a
shared secret in the ring. There was already a 2009 publication of a
P+A+B system.
C+D, in the context of A+B, is reconciliation, _exactly_ agreeing on a
shared secret in the ring. As far as I know, the first publication of a
P+A+B+C+D system was in April 2010, the aforementioned talk re LPR.
All of the compressed-LPR examples, everything starting from Example 2,
also have the following feature:
(S) Bob squeezes C into less space than an R_q element.
Example 1, original LPR, doesn't have this feature. As far as I know,
the first publication of a P+A+B+C+D+S system, and the first publication
of a compressed-LPR system, was the 2012 Ding cryptosystem, which is
essentially Example 5.
How do we stop _all_ P+A+B+C+D+S systems from being covered by Ding's
patent? Here are some ideas:
* The non-novelty/obviousness argument: i.e., the argument that the
claimed invention was already in the prior art, or at least that
the differences are such that the claimed invention "as a whole"
was obvious to people of ordinary skill in the art.
If you spend time reading through patent cases then you'll see that
this is one of the main points of dispute, but that even glaringly
obvious ideas have a considerable chance of being ruled unobvious
in court. The basic problem here is that patents come to court with
a presumption of validity (both novelty and unobviousness), and
overcoming this presumption requires "clear and convincing"
evidence. The "clear and convincing" rule applies even when the
patent office didn't consider the literature in question; see
_Microsoft v. i4i_, 564 U.S. 91 (2011).
Yes, this system is tilted in favor of patent holders. Welcome to
the real world.
In the case of Ding's patent, we aren't talking about the most
obvious ideas. The plaintiff's lawyer will pull out one expert
witness after another saying that efficiency improvements now
claimed in retrospect to be obvious weren't obvious at the time;
and will then pull out the big gun, 2014 Peikert. How is the
defendant's lawyer supposed to argue that saving space compared to
LPR was obvious in 2012 to people of ordinary skill in the art,
given that 2014 Peikert claimed that saving space compared to LPR
was new, the result of an "innovation" in 2014 Peikert?
If someone can find a publication before April 2012 that saves
space compared to LPR, doing not just P+A+B+C+D but also S,
fantastic! But so far each allegedly important piece of prior art
that I've seen is missing something. In analyses of patent threats,
it's a gigantic mistake to gloss over the difference between
"there's prior art for X+Y" and "there's prior art for X and
there's prior art for Y".
* The non-infringement argument. This time it's the defendant's
lawyer trying to argue that there's a "substantial" difference
between the patent claims and what the defendant is doing. The
defendant is starting in trouble at this point: a key is being
exchanged, and there's a rounding mechanism saving space compared
to LPR, so what exactly is the "substantial" difference from what's
claimed in the patent?
"Substantial" and "unobvious" are different legal standards (even
though the analyses overlap), and they're starting from different
baselines---the prior art _plus_ the patent, vs. the prior art
_before_ the patent. Furthermore, courts do _not_ presume
non-infringement; whichever side has the preponderance of evidence
regarding infringement wins.
For an academic paper, throwing around a few sentences, even a footnote,
can suffice to make the reader believe that the paper has some important
distinction from prior work. For a patent lawsuit, the plaintiff and
defendant typically spend millions, and every word is scrutinized by all
sides:
* A claimed distinction that isn't crystal clear will fail.
* A claimed distinction that isn't _correct_ will fail.
* A claimed distinction that's clear and correct, but that isn't more
"substantial" than any of the other clear correct distinctions that
have been rejected in patent cases, will also fail.
Efficiency improvements are easy to understand and are constantly ruled
"substantial" and "unobvious", but this is exactly what's in Ding's
favor: his system squeezes C into less space than the prior art. It's
normal in patent cases for defendants to try to avoid a patented
efficiency improvement by interpolating between the prior art and the
efficiency improvement, and it's normal for the patentee to win.
> In a *key exchange scheme* using "reconciliation" (see e.g.
> Jintai's talk
https://csrc.nist.gov/CSRC/media/Presentations/Ding-Key-Exchange/
> images-media/DING-KEY-EXCHANGE-April2018.pdf where the word reconciliation is
> explicitly used so there can be no confusion as to what it means),
I see nothing in the talk spelling out the level of generality of the
word "reconciliation". The examples of "reconciliation" mentioned in the
talk seem consistent with what I wrote above regarding how the word is
used in this context.
A patent case will settle on definitions on the words in the allegedly
infringed patent claim, and then the plaintiff's lawyers will go through
the details of matching up the defendant's device to the limitations in
the claim, and explaining why any differences are not "substantial".
Mere differences in _terminology_ are not "substantial"; at best they
force the plaintiff's lawyers to spend more time stripping away the
differences and showing what the defendant is in fact doing.
So it's not as if avoiding the word "reconciliation" is going to save
anybody from the 2010 patent (or the 2012 patent, which doesn't even use
the word). I find the word useful in clarifying what's going on. Clarity
is important so that readers can efficiently see what the technology is
actually doing, in particular as a starting point for evaluating the
patent threats.
> the users
> end up with a random shared key that neither party explicitly chose.
I understand that you're proposing to use the word "reconciliation"
differently from what I said above, but I'm unable to figure out what
you think it means. (Again, a claimed distinction that isn't crystal
clear will fail in court.) For example:
* If Bob takes M as RNG output, is he "explicitly" choosing M?
* Does it matter whether the RNG is a true RNG or a PRNG?
* Is Bob still "explicitly" choosing M if he passes RNG output
through his own hash function? Does it matter what kind of hash
function this is?
* Can the hash function also take external inputs? How about Alice's
public key?
* Is Bob choosing M as bits from Ab not an example of "explicitly"
choosing M? Why not? (This is what Examples 4 and 5 do, and you
want those to not qualify as "explicitly" chosen, right?)
The general theme of these questions is directly related to typical
practices in NISTPQC submissions and in applications.
Beyond my questions about what this "explicitly" dividing line is
supposed to mean, I'm unable to figure out why "reconciliation" is
supposed to be a sensible name for this line. The normal English usage
of "reconciliation" fits the idea of having Alice and Bob communicate so
as to come to exact agreement, whereas it doesn't seem to have anything
to do with the extent to which they _chose_ the outputs. (Financial
auditors reconciling accounts are taking data from other people, but
they're still engaging in a reconciliation process.)
Also, since it seems helpful to be able to refer to the process of Alice
and Bob turning their aGb+ad and aGb+eb into an exactly shared secret,
what would you suggest calling this process, if not "reconciliation"?
> In a public key encryption scheme, the sender chooses the message that
> he wants both parties to have.
I agree that the PKE definition takes a message as input and
communicates this message. This differs from the data flow in the KEM
definition, and in various other key-exchange definitions.
Each of the examples above can be turned into a PKE or into a KEM, with
the addition of various labels and steps that I haven't described (e.g.,
labeling (B,C) as "ciphertext"), and choices that I haven't described
(e.g., for each variable that isn't otherwise constrained, whether it's
generated randomly or as a function of another variable).
I agree that the choices made in turning Example 4 or 5 into a PKE or a
KEM have to be consistent with the deterministic choice of M. Also, the
choices made in turning Example 3 into a PKE or a KEM have to be
consistent with the deterministic choice of C.
> Of course one can trivially convert a key exchange scheme into an
> encryption scheme (by adding an xor with the message), but this is not
> what's happening in Kyber/Saber.
I don't see how the difference here is less "trivial" than the
conversion that you're calling "trivial". Let's go step by step through
the details.
Given any of the above procedures to share a secret---let me relabel the
secret as X here to avoid confusion---Bob can also append X xor U to the
ciphertext where U is the actual user message, at which point Alice
finds U. You're calling this transformation "trivial".
Let's take q as a power of 2 to simplify notation, and let's relabel the
0,1 bits in X and U as positions {0,q/2} in elements of R_q, so X xor U
is the same as U-X. The whole transformation is still "trivial", right?
It's not a question of how the bits in X and U are labeled?
In Example 4, X already had positions {0,q/2} without any relabeling.
Let's apply this transformation, and call the result Example 6:
* Alice sends A = aG+e.
* Bob sends (Gb+d,Ab+X+c,U-X), with the above restrictions on X and
c.
* Alice subtracts a(Gb+d) from Ab+X+c to obtain X+c+eb-ad, rounds to
obtain X, and adds U-X to obtain U.
The incorporation of U-X into the ciphertext, to send U instead of just
X, is "trivial", right?
Since X and U have positions in {0,q/2}, rounding to obtain X and then
adding U-X is the same as first adding U-X and then rounding. Let's call
the result Example 7:
* Alice sends A = aG+e.
* Bob sends (Gb+d,Ab+X+c,U-X), with the above restrictions on X and
c.
* Alice adds U-X to Ab+X+c, subtracts a(Gb+d) to obtain U+c+eb-ad,
and rounds to obtain U.
Is this not also a "trivial" step? It's not as _generic_ as the previous
transformation, but does this make it _nontrivial_?
At this point there's nothing using Ab+X+c and U-X separately, so we
might as well simply have Bob do the addition. Let's call this Example
8:
* Alice sends A = aG+e.
* Bob sends (Gb+d,Ab+U+c).
* Alice subtracts a(Gb+d) from Ab+U+c to obtain U+c+eb-ad, and rounds
to obtain U.
But wait a minute. How is this different from Example 2? Exactly which
of these examples are covered by "reconciliation"? Was the supposedly
important distinction between "reconciliation" and not "reconciliation"
crossed by something as _trivial_ as having Bob add two R_q elements
instead of sending them to Alice to add?
Patent courts are continually faced with conflicting narratives. Lawyers
on one side hype differences between the patent and what the defendant
is doing, while downplaying differences from the prior art. Lawyers on
the other side downplay differences between the patent and what the
defendant is doing, while hyping differences from the prior art. This
drives the _choices_ of what each side labels as "trivial", "obvious",
etc.; these choices are challenged by the other side, and the court
processes then dig into the details.
> You can see that there is a fundamental
> difference between the two approaches
If there's a "fundamental" difference, then I would expect each of my
questions to be very easily answered. We would all see a crystal-clear
definition of the dividing line, and we would all be able to check that
the dividing line implies these answers, and then we could start
_hoping_ that this dividing line would hold up in court.
Right now I'm having trouble seeing how the dividing line meets even
minimal scientific standards of clarity.
> in the fact that there is slight bias in the shared key in Ding's scheme
Now I'm even more puzzled.
I agree that there's a slight bias in the shared secret in the first
scheme Ding published. But one can easily remove the bias by tweaking
Ding's scheme, for example by being careful about the exact ranges of
variables in in Example 4. Are you saying that this tweak crosses the
"fundamental" line between a "key exchange scheme" using
"reconciliation" and a "public key encryption scheme" not using
"reconciliation"?
2014 Peikert specifically says it's unbiased. Does this mean it isn't
"key exchange" and isn't using "reconciliation"?
Also, people normally hash shared secrets for various reasons, one of
those reasons being to remove biases. Does replacing the shared secret M
with H(M) cross this "fundamental" line?
The only way I can see the defendant trying to use this is through
attacking the words "similar rounding" in the patent. Here's how I'd
expect the procedures to play out in court:
* There will be preliminary arguments about what the words in the
patent mean---including what qualifies as "similar" rounding. Each
side proposes definitions. In the U.S. (since 1996), the battle
between definitions is resolved by the judge rather than the jury,
which makes it somewhat more predictable.
* Of course the defendant's lawyers will _want_ the minimum possible
interpretation: namely, nothing beyond the specific rounding
schemes that Ding gave as examples. But they have zero hope of
getting this: this would effectively eliminate the "similar" claim,
and courts have a general rule against interpreting text in a way
that makes it content-free.
* Meanwhile the plaintiff's lawyers will _want_ the maximum possible
interpretation---without bumping into the prior art---so they'll
propose broad definitions that focus on user-visible features of
the rounding mentioned in the patent: "similar" rounding is
rounding that "allows us to get a common key" and has good
"communication and computation efficiency". There's no rule
stopping them from getting this.
* There will then be a dispute about what "good" means specifically
for the rounding. The plaintiff's lawyers (knowing that they need
to avoid covering LPR) will propose "reduces space", and maybe the
defendant's lawyers will counter-propose "produces only one bit of
information", which will be a tough sell---what's supposed to be so
special about one bit?
* The defendant's lawyers will propose much narrower definitions of
"similar" rounding, trying to cut it down by adding one dividing
line after another. For example, the defendant's lawyers will say
that rounding isn't "similar" unless it's biased.
* The judge will ask what "biased" means, and why this is supposed to
be important for the patent. The plaintiff's lawyers will ask how
the details of the defendant's argument that the rounding is
"biased" are supposed to be reconciled (ahem) with the patent
saying "We can also choose q to be even positive number and things
need slight modification."
* In the end the judge will select the simpler, and much broader,
interpretation proposed by the plaintiff's lawyers.
There could have been questions from the patent examiner about words in
the patent and how they relate to prior art, which could have resulted
in narrowing the patent claims, and then the judge will follow this.
However, I would expect any questions about "similar" rounding to have
forced more precise claim wording, and none of the prior art that I've
seen would have triggered such questions. Of course, it would be good to
download the patent history and read through it for anything helpful.
> Also notice that Jintai has a Ring-LWE encryption scheme (page 14 of https://
>
patentimages.storage.googleapis.com/53/08/b7/b93d5b6b131e46/US9246675.pdf)
> which is like LPR and *does not* (unless I am reading something wrong) do any
> rounding / compression - so it just outputs two elements D1,D2 (which could be
> matrices over some ring).
It's certainly common sense to ask why the patent didn't give a
compressed version of this example. Structurally, one can try to use
this question in court in an inequivalence argument, the same way that
the following questions support unobviousness arguments:
* If the LPR cryptosystem was already obvious in February 2010, then
why did the original May 2010 Springer version of the LPR paper
highlight a bigger, slower cryptosystem?
* If it was already obvious in April 2012 that one can compress LPR,
then why wasn't this in the April 2012 revision of the LPR paper?
As the Supreme Court put it long ago:
But it is plain from the evidence, and from the very fact that it was
not sooner adopted and used, that it did not, for years, occur in
this light to even the most skillful persons. It may have been under
their very eyes, they may almost be said to have stumbled over it;
but they certainly failed to see it, to estimate its value, and to
bring it into notice. ... Now that it has succeeded, it may seem very
plain to any one that he could have done it as well. This is often
the case with inventions of the greatest merit. It may be laid down
as a general rule, though perhaps not an invariable one, that if a
new combination and arrangement of known elements produce a new and
beneficial result, never attained before, it is evidence of invention.
Regarding "beneficial", new levels of efficiency are easy for courts to
understand, so it's hard to imagine getting anywhere with the whole idea
of killing the 2010 and 2012 patents by claiming analogies to older
cryptosystems that are indisputably less efficient. This would be true
even if the plaintiff's lawyers _didn't_ have the gift of 2014 Peikert
claiming that nothing before 2014 Peikert was smaller than LPR.
For the same reason, if there's something that isn't literally within a
patent claim, and if it's possible to construct an efficiency argument
saying that the differences are "substantial", then the court will
listen. But how would this work for the patents at issue here?
The relevant claims of the 2012 patent say "key exchange". What we see
in KEMs, TLS, etc. is exchanging a key. Any deviations from the claims
are regarding smaller points, such as details of the rounding method,
and then the equivalence analysis asks whether _those_ differences are
"substantial". I'm not seeing how efforts to distinguish "key exchange"
from "encryption" are relevant here.
For the 2010 patent, the literal scope is broader (a "cryptographic
method" for "communicating a confidential piece of information" etc.).
I've sent a separate message comparing LPR point by point to the claim
limitations; the only difference requiring an equivalence analysis was
X^n-1 vs. X^n+1. I've also sent a separate message regarding the idea
that switching from LPR to Kyber avoids "three central requirements" in
the patent; two parts of this idea are simply false, and I explained how
easy it will be for the lawyers to work around the third part, even in a
fantasy world without the doctrine of equivalents.
---Dan