Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[digest] 2024 Week 5

13 views
Skip to first unread message

IACR ePrint Archive

unread,
Feb 4, 2024, 10:29:11 PMFeb 4
to
## In this issue

1. [2024/114] Mask Conversions for d+1 shares in Hardware, with ...
2. [2024/115] Accelerating BGV Bootstrapping for Large $p$ Using ...
3. [2024/116] On the practical CPAD security of “exact” and ...
4. [2024/117] Breaking HWQCS: a code-based signature scheme from ...
5. [2024/118] Data Privacy Made Easy: Enhancing Applications with ...
6. [2024/119] R3PO: Reach-Restricted Reactive Program Obfuscation ...
7. [2024/120] K-Waay: Fast and Deniable Post-Quantum X3DH without ...
8. [2024/121] An acceleration of the AKS prime identification ...
9. [2024/122] SPRITE: Secure and Private Routing in Payment ...
10. [2024/123] Memory Checking Requires Logarithmic Overhead
11. [2024/124] Perceived Information Revisited II: Information- ...
12. [2024/125] New self-orthogonal codes from weakly regular ...
13. [2024/126] Monte Carlo Tree Search for automatic differential ...
14. [2024/127] Attacks Against the INDCPA-D Security of Exact FHE ...
15. [2024/128] Non-Binding (Designated Verifier) Signature
16. [2024/129] Finite Key OTP Functionality: Ciphers That Hold Off ...
17. [2024/130] HADES: Automated Hardware Design Exploration for ...
18. [2024/131] Practical Post-Quantum Signatures for Privacy
19. [2024/132] SimpleFT: A Simple Byzantine Fault Tolerant Consensus
20. [2024/133] Optimizing Implementations of Boolean Functions
21. [2024/134] Byzantine Fault Tolerance with Non-Determinism, ...
22. [2024/135] A Closer Look at the Belief Propagation Algorithm ...
23. [2024/136] Secure Transformer Inference Made Non-interactive
24. [2024/137] Sleepy Consensus in the Known Participation Model
25. [2024/138] Correction Fault Attacks on Randomized CRYSTALS- ...
26. [2024/139] Efficient Arithmetic in Garbled Circuits
27. [2024/140] Efficient ECDSA-based Adaptor Signature for Batched ...
28. [2024/141] Secure Statistical Analysis on Multiple Datasets: ...
29. [2024/142] GradedDAG: An Asynchronous DAG-based BFT Consensus ...
30. [2024/143] Scalable Collaborative zk-SNARK: Fully Distributed ...
31. [2024/144] Efficient (3,3)-isogenies on fast Kummer surfaces
32. [2024/145] Practical Batch Proofs of Exponentiation
33. [2024/146] Computing Orientations from the Endomorphism Ring ...
34. [2024/147] Prime Masking vs. Faults - Exponential Security ...
35. [2024/148] Preliminary Cryptanalysis of the Biscuit Signature ...
36. [2024/149] Evict+Spec+Time: Exploiting Out-of-Order Execution ...
37. [2024/150] SALSA FRESCA: Angular Embeddings and Pre-Training ...
38. [2024/151] Improving Linear Key Recovery Attacks using Walsh ...
39. [2024/152] Equivalence of Generalised Feistel Networks
40. [2024/153] Revisiting the Slot-to-Coefficient Transformation ...
41. [2024/154] Broadcast Encryption using Sum-Product ...
42. [2024/155] Fully Homomorphic Encryption on large integers
43. [2024/156] Homomorphic sign evaluation using functional ...
44. [2024/157] Delphi: sharing assessments of cryptographic ...
45. [2024/158] HiSE: Hierarchical (Threshold) Symmetric-key Encryption
46. [2024/159] Logstar: Efficient Linear* Time Secure Merge
47. [2024/160] LightDAG: A Low-latency DAG-based BFT Consensus ...
48. [2024/161] zkMatrix: Batched Short Proof for Committed Matrix ...
49. [2024/162] Zero-Knowledge Proofs of Training for Deep Neural ...

## 2024/114

* Title: Mask Conversions for d+1 shares in Hardware, with Application to Lattice-based PQC
* Authors: Quinten Norga, Jan-Pieter D'Anvers, Suparna Kundu, Ingrid Verbauwhede
* [Permalink](https://eprint.iacr.org/2024/114)
* [Download](https://eprint.iacr.org/2024/114.pdf)

### Abstract

The conversion between arithmetic and Boolean mask representations (A2B & B2A) is a crucial component for side-channel resistant implementations of lattice-based cryptography.
In this paper, we present a first- and high-order masked, unified hardware implementation which can perform both A2B & B2A conversions. We optimize the operation on several layers of abstraction, applicable to any protection order.
First, we propose novel higher-order algorithms for the secure addition and B2A operation. This is achieved through, among others, an improved method for repeated masked modular reduction and through the X2B operation, which can be viewed as a conversion from any type of additive masking to its Boolean representation. This allows for the removal of a full secure addition during B2A post-processing.
Compared to prior work, our $B2A_q$ requires 51/46/45 % less fresh randomness at first through third protection order when implemented in software or hardware.

Secondly, on the circuit level, we successfully introduce half-cycle data paths and demonstrate how careful, manual masking is a superior approach for masking highly non-linear operations and providing first- and high-order security.
Our techniques significantly reduce the high latency and fresh randomness overhead, typically introduced by glitch-resistant masking schemes and universally composable gadgets, including HPC3 by Knichel et al. presented at CCS 2022. Compared to state-of-the-art algorithms and masking techniques, our unified and high-throughput hardware implementation requires up to 89/84/86 % fewer clock cycles and 78/71/55 % fewer fresh random bits.

We show detailed performance results for first-, second- and third-order protected implementations on FPGA. Our proposed algorithms are proven secure in the glitch extended probing model and their implementations are validated via practical lab analysis using the TVLA methodology. We experimentally show that both our first- and second-order masked implementation is hardened against univariate and multivariate attacks using 100 million traces, for each mode of operation.



## 2024/115

* Title: Accelerating BGV Bootstrapping for Large $p$ Using Null Polynomials Over $\mathbb{Z}_{p^e}$
* Authors: Shihe Ma, Tairong Huang, Anyu Wang, Xiaoyun Wang
* [Permalink](https://eprint.iacr.org/2024/115)
* [Download](https://eprint.iacr.org/2024/115.pdf)

### Abstract

The BGV scheme is one of the most popular FHE schemes for computing homomorphic integer arithmetic.
The bootstrapping technique of BGV is necessary to evaluate arbitrarily deep circuits homomorphically.
However, the BGV bootstrapping performs poorly for large plaintext prime $p$ due to its digit removal procedure exhibiting a computational complexity of at least $O(\sqrt{p})$.
In this paper, we propose optimizations for the digit removal procedure with large $p$ by leveraging the properties of null polynomials over the ring $\mathbb{Z}_{p^e}$.
Specifically, we demonstrate that it is possible to construct low-degree null polynomials based on two observations of the input to the digit removal procedure:
1) the support size of the input can be upper-bounded by $(2B+1)^2$; 2) the size of the lower digits to be removed can be upper-bounded by $B$.
Here $B$ can be controlled within a narrow interval $[22,23]$ in our parameter selection, making the degree of these null polynomials much smaller than $p$ for large values of $p$.
These low-degree null polynomials can significantly reduce the polynomial degrees during homomorphic digit removal, thereby decreasing both running time and capacity consumption.
Theoretically, our optimizations reduce the computational cost of extracting a single digit from $O(\sqrt{pe})$ (by Chen and Han) or $O(\sqrt{p}\sqrt[4]{e})$ (by Geelen et al.) to $\min(2B+1,\sqrt{\lceil e/t\rceil(2B+1)})$ for some $t\ge 1$.
We implement and benchmark our method on HElib with $p=17,127,257,8191$ and $65537$.
With our optimized digit removal, we achieve a bootstrapping throughput $1.38\sim151$ times that in HElib, with the speedup increasing with the value of $p$.
For $p=65537$, we accelerate the digit removal step by 80 times and reduce the bootstrapping time from more than 12 hours to less than 14 minutes.



## 2024/116

* Title: On the practical CPAD security of “exact” and threshold FHE schemes and libraries
* Authors: Marina Checri, Renaud Sirdey, Aymen Boudguiga, Jean-Paul Bultel, Antoine Choffrut
* [Permalink](https://eprint.iacr.org/2024/116)
* [Download](https://eprint.iacr.org/2024/116.pdf)

### Abstract

In their 2021 seminal paper, Li and Micciancio presented a passive attack against the CKKS approximate FHE scheme and introduced the notion of CPAD security. The current status quo is that this line of attacks does not apply to ``exact'' FHE. In this paper, we challenge this statu quo by exhibiting a CPAD key recovery attack on the linearly homomorphic Regev cryptosystem which easily generalizes to other xHE schemes such as BFV, BGV and TFHE showing that these cryptosystems are not CPAD secure in their basic form. We also show that existing threshold variants of BFV, BGV and CKKS are particularily exposed to CPAD attackers and would be CPAD-insecure without smudging noise addition after partial decryption. Finally we successfully implement our attack against several mainstream FHE libraries and discuss a number of natural countermeasures and discuss their consequences in terms of FHE practice, security and efficiency. The attack itself is quite practical as it typically takes less than an hour on an average laptop PC, requiring a few thousand ciphertexts as well as up to around a million evaluations/decryptions, to perform a full key recovery.



## 2024/117

* Title: Breaking HWQCS: a code-based signature scheme from high weight QC-LDPC codes
* Authors: Alex Pellegrini, Giovanni Tognolini
* [Permalink](https://eprint.iacr.org/2024/117)
* [Download](https://eprint.iacr.org/2024/117.pdf)

### Abstract

We analyse HWQCS, a code based signature scheme presented at ICISC 2023, which uses quasi-cyclic low density parity check codes (QC-LDPC). The scheme introduces high Hamming weight errors and signs each message using a fresh ephemeral secret key rather than using only one secret key, so to avoid known attacks on QC-LDPC signature schemes.
In this paper, we show that the signatures of HWQCS leak substantial information concerning the ephemeral keys and formally describe this behaviour. Furthermore, we show that for each security level, we can exploit the leakage to efficiently reconstruct partial secret data from very few signatures, and finally mount a universal forgery attack.



## 2024/118

* Title: Data Privacy Made Easy: Enhancing Applications with Homomorphic Encryption
* Authors: Charles Gouert, Nektarios Georgios Tsoutsos
* [Permalink](https://eprint.iacr.org/2024/118)
* [Download](https://eprint.iacr.org/2024/118.pdf)

### Abstract

Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption.

The first approach, called Walrus, is suitable for arithmetic-intensive applications with limited depth and applications with high throughput requirements. Walrus provides an intuitive programming interface and handles parameterization automatically by analyzing the application and gathering statistics such as homomorphic noise growth to derive a parameter set tuned specifically for the application. We provide an in-depth example of this approach in the form of a neural network inference as well as guidelines for using Walrus effectively.

Conversely, the second approach (HELM) takes existing HDL designs and converts them to the encrypted domain for secure outsourcing on powerful cloud servers. Unlike Walrus, HELM supports FHE backends and is well-suited for complex applications. At a high level, HELM consumes netlists and is capable of performing logic gate operations homomorphically on encryptions of individual bits. HELM incorporates both CPU and GPU acceleration by taking advantage of the inherent parallelism provided by Boolean circuits. As a case study, we walk through the process of taking an off-the-shelf HDL design in the form of AES-128 decryption and running it in the encrypted domain with HELM.



## 2024/119

* Title: R3PO: Reach-Restricted Reactive Program Obfuscation and its Application to MA-ABE
* Authors: Kaartik Bhushan, Sai Lakshmi Bhavana Obbattu, Manoj Prabhakaran, Rajeev Raghunath
* [Permalink](https://eprint.iacr.org/2024/119)
* [Download](https://eprint.iacr.org/2024/119.pdf)

### Abstract

In recent breakthrough results, novel use of garbled circuits yielded constructions for several primitives like Identity-Based Encryption (IBE) and 2-round secure multi-party computation, based on standard assumptions in public-key cryptography. While the techniques in these different results have many common elements, these works did not offer a modular abstraction that could be used across them.

Our main contribution is to introduce a novel notion of obfuscation, called Reach-Restricted Reactive Program Obfuscation (R3PO) that captures the essence of these constructions, and exposes additional capabilities. We provide a powerful composition theorem whose proof fully encapsulates the use of garbled circuits in these works.
As an illustration of the potential of R3PO, and as an important contribution of independent interest, we present a variant of Multi-Authority Attribute-Based Encryption (MA-ABE) that can be based on (single-authority) CP-ABE in a blackbox manner, using only standard cryptographic assumptions (e.g., DDH). This is in stark contrast to the existing constructions for MA-ABE, which rely on the random oracle model and/or support only limited policy classes.



## 2024/120

* Title: K-Waay: Fast and Deniable Post-Quantum X3DH without Ring Signatures
* Authors: Daniel Collins, Loïs Huguenin-Dumittan, Ngoc Khanh Nguyen, Nicolas Rolin, Serge Vaudenay
* [Permalink](https://eprint.iacr.org/2024/120)
* [Download](https://eprint.iacr.org/2024/120.pdf)

### Abstract

The Signal protocol and its X3DH key exchange core are regularly used by billions of people in applications like WhatsApp but are unfortunately not quantum-secure. Thus, designing an efficient and post-quantum secure X3DH alternative is paramount. Notably, X3DH supports asynchronicity, as parties can immediately derive keys after uploading them to a central server, and deniability, allowing parties to plausibly deny having completed key exchange. To satisfy these constraints, existing post-quantum X3DH proposals use ring signatures (or equivalently a form of designated-verifier signatures) to provide authentication without compromising deniability as regular signatures would. Existing ring signature schemes, however, have some drawbacks. Notably, they are not generally proven secure in the quantum random oracle model (QROM) and so the quantum security of parameters that are proposed is unclear and likely weaker than claimed. In addition, they are generally slower than standard primitives like KEMs.

In this work, we propose an efficient, deniable and post-quantum X3DH-like protocol that we call K-Waay, that does not rely on ring signatures. At its core, K-Waay uses a split-KEM, a primitive introduced by Brendel et al. [SAC 2020], to provide Diffie-Hellman-like implicit authentication and secrecy guarantees. Along the way, we revisit the formalism of Brendel et al. and identify that additional security properties are required to prove a split-KEM-based protocol secure. We instantiate split-KEM by building a protocol based on the Frodo key exchange protocol relying on the plain LWE assumption: our proofs might be of independent interest as we show it satisfies our novel unforgeability and deniability security notions. Finally, we complement our theoretical results by thoroughly benchmarking both K-Waay and existing X3DH protocols. Our results show even when using plain LWE and a conservative choice of parameters that K-Waay is significantly faster than previous work.



## 2024/121

* Title: An acceleration of the AKS prime identification algorithm
* Authors: Stephen Meredith Williams
* [Permalink](https://eprint.iacr.org/2024/121)
* [Download](https://eprint.iacr.org/2024/121.pdf)

### Abstract

In its standard form, the AKS prime identification algorithm is deterministic and polynomial time but too slow to be of practical use. By dropping its deterministic attribute, it can be accelerated to an extent that it is practically useful, though still much slower than the widely used Miller-Rabin-Selfridge-Monier (MRSM) algorithm based on the Fermat Little Theorem or the Solovay-Strassen algorithm based on the Euler Criterion. The change made, in the last stage of AKS, is to check a modular equation not for a long sequence of values but for a single one. Another change is to reduce, arbitrarily, the size of the parameter $r$ giving the degree of the polynomial used for the last stage.



## 2024/122

* Title: SPRITE: Secure and Private Routing in Payment Channel Networks
* Authors: Gaurav Panwar, Roopa Vishwanathan, George Torres, Satyajayant Misra
* [Permalink](https://eprint.iacr.org/2024/122)
* [Download](https://eprint.iacr.org/2024/122.pdf)

### Abstract

Payment channel networks are a promising solution to the scalability challenge of blockchains and are designed for significantly increased transaction throughput compared to the layer one blockchain. Since payment channel networks are essentially decentralized peer-to-peer networks, routing transactions is a fundamental challenge. Payment channel networks have some unique security and privacy requirements that make pathfinding challenging, for instance, network topology is not publicly known, and sender/receiver privacy should be preserved, in addition to providing atomicity guarantees for payments. In this paper, we present an efficient privacy-preserving routing protocol, SPRITE, for payment channel networks that supports concurrent transactions. By finding paths offline and processing transactions online, SPRITE can process transactions in just two rounds, which is more efficient compared to prior work. We evaluate SPRITE’s performance using Lightning Net- work data and prove its security using the Universal Composability framework. In contrast to the current cutting-edge methods that achieve rapid transactions, our approach significantly reduces the message complexity of the system by 3 orders of magnitude while maintaining similar latencies.



## 2024/123

* Title: Memory Checking Requires Logarithmic Overhead
* Authors: Elette Boyle, Ilan Komargodski, Neekon Vafa
* [Permalink](https://eprint.iacr.org/2024/123)
* [Download](https://eprint.iacr.org/2024/123.pdf)

### Abstract

We study the complexity of memory checkers with computational security and prove the first general tight lower bound.

Memory checkers, first introduced over 30 years ago by Blum, Evans, Gemmel, Kannan, and Naor (FOCS '91, Algorithmica '94), allow a user to store and maintain a large memory on a remote and unreliable server by using small trusted local storage. The user can issue instructions to the server and after every instruction, obtain either the correct value or a failure (but not an incorrect answer) with high probability. The main complexity measure of interest is the size of the local storage and the number of queries the memory checker makes upon every logical instruction. The most efficient known construction has query complexity $O(\log n/\log\log n)$ and local space proportional to a computational security parameter, assuming one-way functions, where $n$ is the logical memory size. Dwork, Naor, Rothblum, and Vaikuntanathan (TCC '09) showed that for a restricted class of ``deterministic and non-adaptive'' memory checkers, this construction is optimal, up to constant factors. However, going beyond the small class of deterministic and non-adaptive constructions has remained a major open problem.

In this work, we fully resolve the complexity of memory checkers by showing that any construction with local space $p$ and query complexity $q$ must satisfy
$$ p \ge \frac{n}{(\log n)^{O(q)}} \;. $$
This implies, as a special case, that $q\ge \Omega(\log n/\log\log n)$ in any scheme, assuming that $p\le n^{1-\varepsilon}$ for $\varepsilon>0$. The bound applies to any scheme with computational security, completeness $2/3$, and inverse polynomial in $n$ soundness (all of which make our lower bound only stronger). We further extend the lower bound to schemes where the read complexity $q_r$ and write complexity $q_w$ differ. For instance, we show the tight bound that if $q_r=O(1)$ and $p\le n^{1-\varepsilon}$ for $\varepsilon>0$, then $q_w\ge n^{\Omega(1)}$. This is the first lower bound, for any non-trivial class of constructions, showing a read-write query complexity trade-off.

Our proof is via a delicate compression argument showing that a ``too good to be true'' memory checker can be used to compress random bits of information. We draw inspiration from tools recently developed for lower bounds for relaxed locally decodable codes. However, our proof itself significantly departs from these works, necessitated by the differences between settings.



## 2024/124

* Title: Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
* Authors: Akira Ito, Rei Ueno, Naofumi Homma
* [Permalink](https://eprint.iacr.org/2024/124)
* [Download](https://eprint.iacr.org/2024/124.pdf)

### Abstract

In conventional deep-learning-based side-channel attacks (DL-SCAs), an attacker trains a model by updating parameters to minimize the negative log-likelihood (NLL) loss function. Although a decrease in NLL improves DL-SCA performance, the reasons for this improvement remain unclear because of the lack of a formal analysis. To address this open problem, this paper explores the relationship between NLL and the attack success rate (SR) and conducts an information-theoretical analysis of DL-SCAs with an NLL loss function to solve open problems in DL-SCA. To this end, we introduce a communication channel for DL-SCAs and derive an inequality that links model outputs to the SR. Our inequality states that mutual information between the model output and intermediate value, which is named the latent perceived information (LPI), provides an upper bound of the SR of a DL-SCA with a trained neural network. Subsequently, we examine the conjecture by Ito et al. on the relationship between the effective perceived information (EPI) and SR and clarify its valid conditions from the perspective of LPI. Our analysis results reveal that a decrease in NLL correlated with an increase in LPI, which indicates that the model capability to extract intermediate value information from traces is enhanced. In addition, the results indicate that the LPI bounds the SR from above, and a higher upper bound of the SR could directly improve the SR if the selection function satisfies certain conditions, such as bijectivity. Finally, these theoretical insights are validated through attack experiments on neural network models applied to AES software and hardware implementations with masking countermeasures.



## 2024/125

* Title: New self-orthogonal codes from weakly regular plateaued functions and their application in LCD codes
* Authors: Melike Çakmak, Ahmet Sınak, Oğuz Yayla
* [Permalink](https://eprint.iacr.org/2024/125)
* [Download](https://eprint.iacr.org/2024/125.pdf)

### Abstract

A linear code is considered self-orthogonal if it is contained within its dual code. Self-orthogonal codes have applications in linear complementary dual codes, quantum codes and so on. The construction of self-orthogonal codes from functions over finite fields has been studied in the literature. In this paper, we generalize the construction method given by Heng et al. (2023) to weakly regular plateaued functions. We first construct several families of ternary self-orthogonal codes from weakly regular plateaued unbalanced functions. Then we use the self-orthogonal codes to construct new families of ternary LCD codes. As a consequence, we obtain (almost) optimal ternary self-orthogonal codes and LCD codes. Moreover, we propose two new classes of $p$-ary linear codes from weakly regular plateaued unbalanced functions over the finite fields of odd characteristics.



## 2024/126

* Title: Monte Carlo Tree Search for automatic differential characteristics search: application to SPECK
* Authors: Emanuele Bellini, David Gerault, Matteo Protopapa, Matteo Rossi
* [Permalink](https://eprint.iacr.org/2024/126)
* [Download](https://eprint.iacr.org/2024/126.pdf)

### Abstract

The search for differential characteristics on block ciphers is a difficult combinatorial problem. In this paper, we investigate the performances of an AI-originated technique, Single Player Monte-Carlo Tree Search (SP-MCTS), in finding good differential characteristics on ARX ciphers, with an application to the block cipher SPECK. In order to make this approach competitive, we include several heuristics, such as the combination of forward and backward searches, and achieve significantly faster results than state-of-the-art works that are not based on automatic solvers.
We reach 9, 11, 13, 13 and 15 rounds for SPECK32, SPECK48, SPECK64, SPECK96 and SPECK128 respectively. In order to build our algorithm, we revisit Lipmaa and Moriai's algorithm for listing all optimal differential transitions through modular addition, and propose a variant to enumerate all transitions with probability close (up to a fixed threshold) to the optimal, while fixing a minor bug in the original algorithm.



## 2024/127

* Title: Attacks Against the INDCPA-D Security of Exact FHE Schemes
* Authors: Jung Hee Cheon, Hyeongmin Choe, Alain Passelègue, Damien Stehlé, Elias Suvanto
* [Permalink](https://eprint.iacr.org/2024/127)
* [Download](https://eprint.iacr.org/2024/127.pdf)

### Abstract

A new security model for fully homomorphic encryption (FHE), called INDCPA-D security and introduced by Li and Micciancio [Eurocrypt'21], strengthens INDCPA security by giving the attacker access to a decryption oracle for ciphertexts for which it should know the underlying plaintexts. This includes ciphertexts that it (honestly) encrypted and those obtained from the latter by evaluating circuits that it chose. Li and Micciancio singled out the CKKS FHE scheme for approximate data [Asiacrypt'17] by giving an INDCPA-D attack on it and (erroneously) claiming that INDCPA-D security and INDCPA security coincide for FHEs on exact data.

We correct the widespread belief according to which INDCPA-D attacks are specific to approximate homomorphic computations. Indeed, the  equivalency formally proved by Li and Micciancio assumes that the schemes are not only exact but have a negligible probability of incorrect decryption. However, almost all competitive implementations of exact FHE schemes give away strong correctness by analyzing correctness heuristically and allowing noticeable probabilities of incorrect decryption. 

We exploit this imperfect correctness  to mount efficient indistinguishability and key-recovery attacks against all major exact FHE schemes.  We illustrate their strength by concretely breaking the default BFV implementation of OpenFHE and simulating an attack for the default parameter set of the CGGI implementation of TFHE-rs (the attack is too expensive to be run on commodity desktops, because of the cost of CGGI bootstrapping). Our attacks extend to threshold versions of the exact FHE schemes, when the correctness is similarly loose.



## 2024/128

* Title: Non-Binding (Designated Verifier) Signature
* Authors: Ehsan Ebrahimi
* [Permalink](https://eprint.iacr.org/2024/128)
* [Download](https://eprint.iacr.org/2024/128.pdf)

### Abstract

We argue that there are some scenarios in which
plausible deniability might be desired for a digital signature
scheme. For instance, the non-repudiation property of conventional
signature schemes is problematic in designing an Instant
Messaging system (WPES 2004). In this paper, we formally
define a non-binding signature scheme in which the Signer
is able to disavow her own signature if she wants, but, the
Verifier is not able to dispute a signature generated by the
Signer. That is, the Signer is able to convince a third party
Judge that she is the owner of a signature without disclosing
her secret information. We propose a signature scheme that
is non-binding and unforgeable. Our signature scheme is post-quantum secure if the underlying cryptographic primitives are post-quantum secure. In addition, a modification to our nonbinding
signature scheme leads to an Instant Messaging system
that is of independent interest.



## 2024/129

* Title: Finite Key OTP Functionality: Ciphers That Hold Off Attackers Smarter Than Their Designers
* Authors: Gideon Samid
* [Permalink](https://eprint.iacr.org/2024/129)
* [Download](https://eprint.iacr.org/2024/129.pdf)

### Abstract

The prevailing ciphers rely on the weak assumption that their attacker is not smarter than expected by their designers. The resultant crypto ecology favors the cryptographic powerhouses, and hinders cyber freedom, cyber privacy and cyber democracy. This weakness can be remedied by using the gold standard of cryptography -- One Time Pad, OTP. Alas, it comes with a prohibitive cost of a key as long as the message it encrypts. When the stakes are high enough users pay this high price because OTP is immunized against smarter and better equipped attackers. Claude Shannon has shown that this size imposition on the key is non-negotiable in the context he analyzed. Alas, changing the context, one could achieve OTP equivalence. Three simple changes are introduced: (i) make the size of the key an integral part of the secret, (ii) every finite message is encrypted with an arbitrary part of the key, (iii) allow for open-ended dilution of the contents-bearing bits of the ciphertext, with content-devoid bits, which don't confuse the intended recipient, but impose an open-ended cryptanalytic barrier before the attacker. A-priori a cryptanalyst is facing a set of messages each of them deemed plausible to be the one hidden in the ciphertext. If the ciphertext is Finite Key OTP compliant then membership in this set will not change after an exhaustive cryptanalytic processing of the ciphertext. This constitutes functional equivalence with OTP. OTP functionality with a shared finite key creates a path to digital freedom, digital privacy and digital democracy.



## 2024/130

* Title: HADES: Automated Hardware Design Exploration for Cryptographic Primitives
* Authors: Fabian Buschkowski, Georg Land, Jan Richter-Brockmann, Pascal Sasdrich, Tim Güneysu
* [Permalink](https://eprint.iacr.org/2024/130)
* [Download](https://eprint.iacr.org/2024/130.pdf)

### Abstract

While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without developers being able to weigh all design options.

This immediately raises the necessity for tool-assisted Design Space Exploration (DSE) for efficient and secure cryptographic hardware. For this, we present the progressive HADES framework, offering a customizable, extendable, and streamlined DSE for efficient and secure cryptographic hardware accelerators. This tool exhaustively traverses the design space driven by security requirements, rapidly predicts user-defined performance metrics, e.g., area footprint or cycle-accurate latency, and instantiates the most suitable candidate in a synthesizable Hardware Description Language (HDL).

We demonstrate the capabilities of our framework by applying our proof-of-concept implementation to a wide-range selection of state-of-the-art symmetric and PQC schemes, including the ChaCha20 stream cipher and the designated PQC standard Kyber, for which we provide the first set of arbitrary-order masked hardware implementations.



## 2024/131

* Title: Practical Post-Quantum Signatures for Privacy
* Authors: Sven Argo, Tim Güneysu, Corentin Jeudy, Georg Land, Adeline Roux-Langlois, Olivier Sanders
* [Permalink](https://eprint.iacr.org/2024/131)
* [Download](https://eprint.iacr.org/2024/131.pdf)

### Abstract

The transition to post-quantum cryptography has been an enormous challenge and effort for cryptographers over the last decade, with impressive results such as the future NIST standards. However, the latter has so far only considered central cryptographic mechanisms (signatures or KEM) and not more advanced ones, e.g., targeting privacy-preserving applications. Of particular interest is the family of solutions called blind signatures, group signatures and anonymous credentials, for which standards already exist, and which are deployed in billions of devices. Such a family does not have, at this stage, an efficient post-quantum counterpart although very recent works improved this state of affairs by offering two different alternatives: either one gets a system with rather large elements but a security proved under standard assumptions or one gets a more efficient system at the cost of ad-hoc interactive assumptions or weaker security models. Moreover, all these works have only considered size complexity without implementing the quite complex building blocks their systems are composed of. In other words, the practicality of such systems is still very hard to assess, which is a problem if one envisions a post-quantum transition for the corresponding systems/standards.

In this work, we propose a construction of so-called signature with efficient protocols (SEP), which is the core of such privacy-preserving solutions. By revisiting the approach by Jeudy et al. (Crypto 2023) we manage to get the best of the two alternatives mentioned above, namely short sizes with no compromise on security. To demonstrate this, we plug our SEP in an anonymous credential system, achieving credentials of less than 80 KB. In parallel, we fully implemented our system, and in particular the complex zero-knowledge framework of Lyubashevsky et al. (Crypto'22), which has, to our knowledge, not be done so far. Our work thus not only improves the state-of-the-art on privacy-preserving solutions, but also significantly improves the understanding of efficiency and implications for deployment in real-world systems.



## 2024/132

* Title: SimpleFT: A Simple Byzantine Fault Tolerant Consensus
* Authors: Rui Hao, Chenglong Yi, Weiqi Dai, Zhaonan Zhang
* [Permalink](https://eprint.iacr.org/2024/132)
* [Download](https://eprint.iacr.org/2024/132.pdf)

### Abstract

Although having been popular for a long time, Byzantine Fault Tolerance (BFT) consensus under the partially-synchronous network is denounced to be inefficient or even infeasible in recent years, which calls for a more robust asynchronous consensus. On the other hand, almost all the existing asynchronous consensus are too complicated to understand and even suffer from the termination problem. Motivated by the above problems, we propose SimpleFT in this paper, which is a simple asynchronous consensus and is mainly inspired by the simplicity of the Bitcoin protocol. With a re-understanding of the Bitcoin protocol, we disassemble the life cycle of a block into three phases, namely proposal, dissemination, and confirmation. Corresponding to these phases, we devise or introduce the sortition algorithm, reliable broadcast algorithm, and quorum certificate mechanism in SimpleFT, respectively. To make full use of the network resources and improve the system throughput, we further introduce the layered architecture to SimpleFT, which enables multiple blocks to be confirmed at the same height. Comprehensive analysis is made to validate the correctness of SimpleFT and various experiments are conducted to demonstrate its efficient performance.



## 2024/133

* Title: Optimizing Implementations of Boolean Functions
* Authors: Meltem Sonmez Turan
* [Permalink](https://eprint.iacr.org/2024/133)
* [Download](https://eprint.iacr.org/2024/133.pdf)

### Abstract

Symmetric cryptography primitives are constructed by iterative applications of linear and nonlinear layers. Constructing efficient circuits for these layers, even for the linear one, is challenging. In 1997, Paar proposed a heuristic to minimize the number of XORs (modulo 2 addition) necessary to implement linear layers. In this study, we slightly modify Paar’s heuristics to find implementations for nonlinear Boolean functions, in particular to homogeneous Boolean functions. Additionally, we show how this heuristic can be used to construct circuits for generic Boolean functions with small number of AND gates, by exploiting affine equivalence relations.



## 2024/134

* Title: Byzantine Fault Tolerance with Non-Determinism, Revisited
* Authors: Sisi Duan, Yue Huang
* [Permalink](https://eprint.iacr.org/2024/134)
* [Download](https://eprint.iacr.org/2024/134.pdf)

### Abstract

The conventional Byzantine fault tolerance (BFT) paradigm requires replicated state machines to execute deterministic operations only. In practice, numerous applications and scenarios, especially in the era of blockchains, contain various sources of non-determinism. Despite decades of research on BFT, we still lack an efficient and easy-to-deploy solution for BFT with non-determinism—BFT-ND, especially in the asynchronous setting.
We revisit the problem of BFT-ND and provide a formal and asynchronous treatment of BFT-ND. In particular, we design and implement Block-ND that insightfully separates the task of agreeing on the order of transactions from the task of agreement on the state: Block-ND allows reusing existing BFT implementations; on top of BFT, we reduce the agreement on the state to multivalued Byzantine agreement (MBA), a somewhat neglected primitive by practical systems. Block-ND is completely asynchronous as long as the underlying BFT is asynchronous.
We provide a new MBA construction significantly faster than existing MBA constructions. We instantiate Block-ND in both the partially synchronous setting (with PBFT, OSDI 1999) and the purely asynchronous setting (with PACE, CCS 2022). Via a 91-instance WAN deployment on Amazon EC2, we show that Block-ND has only marginal performance degradation compared to conventional BFT.



## 2024/135

* Title: A Closer Look at the Belief Propagation Algorithm in Side-Channel-Assisted Chosen-Ciphertext Attacks
* Authors: Kexin Qiao, Siwei Sun, Zhaoyang Wang, Zehan Wu, Junjie Cheng, An Wang, Liehuang Zhu
* [Permalink](https://eprint.iacr.org/2024/135)
* [Download](https://eprint.iacr.org/2024/135.pdf)

### Abstract

The implementation security of post-quantum cryptography (PQC) algorithms has emerged as a critical concern with the PQC standardization process reaching its end. In a side-channel-assisted chosen-ciphertext attack, the attacker builds linear inequalities on secret key components and uses the belief propagation (BP) algorithm to solve. The number of inequalities leverages the query complexity of the attack, so the fewer the better. In this paper, we use the PQC standard algorithm Kyber512 as a study case to construct bilateral inequalities on key variables with substantially narrower intervals using a side-channel-assisted oracle. The number of such inequalities required to recover the key with probability 1 utilizing the BP algorithm is reduced relative to previous unilateral inequalities. Furthermore, we introduce strategies aimed at further refining the interval of inequalities. Diving into the BP algorithm, we discover a measure metric named JSD-metric that can gauge the tightness of an inequality. We then develop a heuristic strategy and a machine learning-based strategy to utilize the JSD-metrics to contract boundaries of inequalities even with fewer inequalities given, thus improving the information carried by the system of linear inequalities. This contraction strategy is at the algorithmic level and has the potential to be employed in all attacks endeavoring to establish a system of inequalities concerning key variables.



## 2024/136

* Title: Secure Transformer Inference Made Non-interactive
* Authors: Jiawen Zhang, Jian Liu, Xinpeng Yang, Yinghao Wang, Kejia Chen, Xiaoyang Hou, Kui Ren, Xiaohu Yang
* [Permalink](https://eprint.iacr.org/2024/136)
* [Download](https://eprint.iacr.org/2024/136.pdf)

### Abstract

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server.

In this paper, we propose NEXUS the first non-interactive protocol for secure transformer inference, where the client is only required to submit an encrypted input and await the encrypted result from the server. Central to NEXUS are two innovative techniques: SIMD ciphertext compression/decompression, and SIMD slots folding. Consequently, our approach achieves a speedup of 2.8$\times$ and a remarkable bandwidth reduction of 368.6$\times$, compared to the state-of-the-art solution presented in S&P '24.



## 2024/137

* Title: Sleepy Consensus in the Known Participation Model
* Authors: Chenxu Wang, Sisi Duan, Minghui Xu, Feng Li, Xiuzhen Cheng
* [Permalink](https://eprint.iacr.org/2024/137)
* [Download](https://eprint.iacr.org/2024/137.pdf)

### Abstract

We study sleepy consensus in the known participation model, where replicas are aware of the minimum number of awake honest replicas. Compared to prior works that almost all assume the unknown participation model, we provide a fine-grained treatment of sleepy consensus in the known participation model and show some interesting results. First, we present a synchronous atomic broadcast protocol with $5\Delta+2\delta$ expected latency and $2\Delta+2\delta$ best-case latency, where $\Delta$ is the bound on network delay and $\delta$ is the actual network delay. In contrast, the best-known result in the unknown participation model (MMR, CCS 2023) achieves $14\Delta$ latency, more than twice the latency of our protocol. Second, in the partially synchronous network (the value of $\Delta$ is unknown), we show that without changing the conventional $n \geq 3f+1$ assumption, one can only obtain a secure sleepy consensus by making the stable storage assumption (where replicas need to store intermediate consensus parameters in stable storage). Finally, still in the partially synchronous network but not assuming stable storage, we prove the bounds on $n \geq 3f+2s+1$ without the global awake time (GAT) assumption (all honest replicas become awake after GAT) and $n \geq 3f+s+1$ with the GAT assumption, where $s$ is the maximum number of honest replicas that may become asleep simultaneously. Using these bounds, we transform HotStuff (PODC 2019) into a sleepy consensus protocol via a timeoutQC mechanism and a low-cost recovery protocol.



## 2024/138

* Title: Correction Fault Attacks on Randomized CRYSTALS-Dilithium
* Authors: Elisabeth Krahmer, Peter Pessl, Georg Land, Tim Güneysu
* [Permalink](https://eprint.iacr.org/2024/138)
* [Download](https://eprint.iacr.org/2024/138.pdf)

### Abstract

After NIST’s selection of Dilithium as the primary future standard for quantum-secure digital signatures, increased efforts to understand its implementation security properties are required to enable widespread adoption on embedded devices. Concretely, there are still many open questions regarding the susceptibility of Dilithium to fault attacks. This is especially the case for Dilithium’s randomized (or hedged) signing mode, which, likely due to devastating implementation attacks on the deterministic mode, was selected as the default by NIST.
This work takes steps towards closing this gap by presenting two new key-recovery fault attacks on randomized/hedged Dilithium. Both attacks are based on the idea of correcting faulty signatures after signing. A successful correction yields the value of a secret intermediate that carries information on the key. After gathering many faulty signatures and corresponding correction values, it is possible to solve for the signing key via either simple linear algebra or lattice-reduction techniques. Our first attack extends a previously published attack based on an instruction-skipping fault to the randomized setting. Our second attack injects faults in the matrix A, which is part of the public key. As such, it is not sensitive to side-channel leakage and has, potentially for this reason, not seen prior analysis regarding faults.
We show that for Dilithium2, the attacks allow key recovery with as little as 1024 and 512 faulty signatures, respectively, with each signature generated by injecting a single targeted fault. We also demonstrate how our attacks can be adapted to circumvent several popular fault countermeasures with a moderate increase in the computational runtime and the number of required faulty signatures. These results are verified using both simulated faults and clock glitches on an ARM-based microcontroller.
The presented attacks demonstrate that also randomized Dilithium can be subject to diverse fault attacks, that certain countermeasures might be easily bypassed, and that potential fault targets reach beyond side-channel sensitive operations. Still, many further operations are likely also susceptible, implying the need for increased analysis efforts in the future.



## 2024/139

* Title: Efficient Arithmetic in Garbled Circuits
* Authors: David Heath
* [Permalink](https://eprint.iacr.org/2024/139)
* [Download](https://eprint.iacr.org/2024/139.pdf)

### Abstract

Garbled Circuit (GC) techniques usually work with Boolean circuits. Despite intense interest, efficient arithmetic generalizations of GC were only known from heavy assumptions, such as LWE.

We construct arithmetic garbled circuits from circular correlation robust hashes, the assumption underlying the celebrated Free XOR garbling technique. Let $\lambda$ denote a computational security parameter, and consider the integers $\mathbb{Z}_m$ for any $m \geq 2$. Let $\ell = \lceil \log_2 m \rceil$ be the bit length of $\mathbb{Z}_m$ values. We garble arithmetic circuits over $\mathbb{Z}_m$ where the garbling of each gate has size $O(\ell \cdot \lambda)$ bits. Constrast this with Boolean-circuit-based arithmetic, requiring $O(\ell^2\cdot \lambda)$ bits via the schoolbook multiplication algorithm, or $O(\ell^{1.585}\cdot \lambda)$ bits via Karatsuba's algorithm.

Our arithmetic gates are compatible with Boolean operations and with Garbled RAM, allowing to garble complex programs of arithmetic values.



## 2024/140

* Title: Efficient ECDSA-based Adaptor Signature for Batched Atomic Swaps
* Authors: Binbin Tu, Min Zhang, Yu Chen
* [Permalink](https://eprint.iacr.org/2024/140)
* [Download](https://eprint.iacr.org/2024/140.pdf)

### Abstract

Adaptor signature is a novel cryptographic primitive which ties together the signature and the leakage of a secret value. It has become an important tool for solving the scalability and interoperability problems in the blockchain. Aumayr et al. (Asiacrypt 2021) recently provide the formalization of the adaptor signature and present a provably secure ECDSA-based adaptor signature, which requires zero-knowledge proof in the pre-signing phase to ensure the signer works correctly. However, the number of zero-knowledge proofs is linear with the number of participants.

In this paper, we propose efficient ECDSA-based adaptor signature schemes and give security proofs based on ECDSA. In our schemes, the zero-knowledge proofs in the pre-signing phase can be generated in a batch and offline. Meanwhile, the online pre-signing algorithm is similar to the ECDSA signing algorithm and can enjoy the same efficiency as ECDSA. In particular, considering specific verification scenarios, such as (batched) atomic swaps, our schemes can reduce the number of zero-knowledge proofs in the pre-signing phase to one, independent of the number of participants. Last, we conduct an experimental evaluation, demonstrating that the performance of our ECDSA-based adaptor signature reduces online pre-signing time by about 60% compared with the state-of-the-art ECDSA-based adaptor signature.



## 2024/141

* Title: Secure Statistical Analysis on Multiple Datasets: Join and Group-By
* Authors: Gilad Asharov, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Ariel Nof, Benny Pinkas, Junichi Tomida
* [Permalink](https://eprint.iacr.org/2024/141)
* [Download](https://eprint.iacr.org/2024/141.pdf)

### Abstract

We implement a secure platform for statistical analysis over multiple organizations and multiple datasets. We provide a suite of protocols for different variants of JOIN and GROUP-BY operations. JOIN allows combining data from multiple datasets based on a common column. GROUP-BY allows aggregating rows that have the same values in a column or a set of columns, and then apply some aggregation summary on the rows (such as sum, count, median, etc.). Both operations are fundamental tools for relational databases. One example use case of our platform is in data marketing in which an analyst would join purchase histories and membership information, and then obtain statistics, such as "Which products were bought by people earning this much per annum?"

Both JOIN and GROUP-BY involve many variants, and we design protocols for several common procedures. In particular, we propose a novel group-by-median protocol that has not been known so far. Our protocols rely on sorting protocols, and work in the honest majority setting and against malicious adversaries. To the best of our knowledge, this is the first implementation of JOIN and GROUP-BY protocols secure against a malicious adversary.



## 2024/142

* Title: GradedDAG: An Asynchronous DAG-based BFT Consensus with Lower Latency
* Authors: Xiaohai Dai, Zhaonan Zhang, Jiang Xiao, Jingtao Yue, Xia Xie, Hai Jin
* [Permalink](https://eprint.iacr.org/2024/142)
* [Download](https://eprint.iacr.org/2024/142.pdf)

### Abstract

To enable parallel processing, the Directed Acyclic Graph (DAG) structure is introduced to the design of asynchronous Byzantine Fault Tolerant (BFT) consensus protocols, known as DAG-based BFT. Existing DAG-based BFT protocols operate in successive waves, with each wave containing three or four Reliable Broadcast (RBC) rounds to broadcast data, resulting in high latency due to the three communication steps required in each RBC. For instance, Tusk, a state-of-the-art DAG-based BFT protocol, has a good-case latency of 7 communication steps and an expected worst latency of 21 communication steps.

To reduce latency, we propose GradedDAG, a new DAG-based BFT consensus protocol based on our adapted RBC called Graded RBC (GRBC) and the Consistent Broadcast (CBC), with each wave consisting of only one GRBC round and one CBC round. Through GRBC, a replica can deliver data with a grade of 1 or 2, and a non-faulty replica delivering the data with grade 2 can ensure that more than 2/3 of replicas have delivered the same data. Meanwhile, through CBC, data delivered by different non-faulty replicas must be identical. In each wave, a block in the GRBC round will be elected as the leader. If a leader block has been delivered with grade 2, it and all its ancestor blocks can be committed. GradedDAG offers a good-case latency of 4 communication steps and an expected worst latency of 7.5 communication steps, significantly lower than the state-of-theart. Experimental results demonstrate GradedDAG’s feasibility and efficiency.



## 2024/143

* Title: Scalable Collaborative zk-SNARK: Fully Distributed Proof Generation and Malicious Security
* Authors: Xuanming Liu, Zhelei Zhou, Yinghao Wang, Bingsheng Zhang, Xiaohu Yang
* [Permalink](https://eprint.iacr.org/2024/143)
* [Download](https://eprint.iacr.org/2024/143.pdf)

### Abstract

The notion of collaborative zk-SNARK is introduced by Ozdemir and Boneh (USENIX 2022), which allows multiple parties to jointly create a zk-SNARK proof over distributed secrets (also known as the witness).
This approach ensures the privacy of the witness, as no corrupted servers involved in the proof generation can learn anything about the honest servers' witness.
Later, Garg et al. continued the study, focusing on how to achieve faster proof generation (USENIX 2023).
However, their approach requires a powerful server that is responsible for the most resource-intensive computations and communications during the proof generation.
This requirement results in a scalability bottleneck, making their protocols unable to handle large-scale circuits.

In this work, we address this issue by lifting a zk-SNARK called Libra (Crypto 2019) to a collaborative zk-SNARK and achieve a fully distributed proof generation, where all servers take roughly the same portion of the total workload.
Further, our protocol can be adapted to be secure against a malicious adversary by incorporating some verification mechanisms.
With 128 consumer machines and a 4Gbps network, we successfully generate a proof for a data-parallel circuit containing $2^{23}$ gates in merely 2.5 seconds and take only 0.5 GB memory for each server. This represents a $19\times$ speed-up, compared to a local Libra prover.
Our benchmark further indicates an impressive 877$\times$ improvement in running time and a 992$\times$ enhancement in communication compared to the implementation in previous work. Furthermore, our protocol is capable of handling larger circuits, making it scalable in practice.



## 2024/144

* Title: Efficient (3,3)-isogenies on fast Kummer surfaces
* Authors: Maria Corte-Real Santos, Craig Costello, Benjamin Smith
* [Permalink](https://eprint.iacr.org/2024/144)
* [Download](https://eprint.iacr.org/2024/144.pdf)

### Abstract

We give an alternative derivation of (N,N)-isogenies between fast Kummer surfaces which complements existing works based on the theory of theta functions. We use this framework to produce explicit formulae for the case of N = 3, and show that the resulting algorithms are more efficient than all prior (3,3)-isogeny algorithms.



## 2024/145

* Title: Practical Batch Proofs of Exponentiation
* Authors: Charlotte Hoffmann, Pavel Hubáček, Svetlana Ivanova
* [Permalink](https://eprint.iacr.org/2024/145)
* [Download](https://eprint.iacr.org/2024/145.pdf)

### Abstract

A Proof of Exponentiation (PoE) allows a prover to efficiently convince a verifier that $y=x^e$ in some group of unknown order. PoEs are the basis for practical constructions of Verifiable Delay Functions (VDFs), which, in turn, are important for various higher-level protocols in distributed computing. In applications such as distributed consensus, many PoEs are generated regularly, motivating protocols for secure aggregation of batches of statements into a few statements to improve the efficiency for both parties. Rotem (TCC 2021) recently presented two such generic batch PoEs.

In this work, we introduce two batch PoEs that outperform both proposals of Rotem and we evaluate their practicality. First, we show that the two batch PoEs of Rotem can be combined to improve the overall efficiency by at least a factor of two. Second, we revisit the work of Bellare, Garay, and Rabin (EUROCRYPT 1998) on batch verification of digital signatures and show that, under the low order assumption, their bucket test can be securely adapted to the setting of groups of unknown order. The resulting batch PoE quickly outperforms the state of the art in the expected number of group multiplications with the growing number of instances, and it decreases the cost of batching by an order of magnitude already for hundreds of thousands of instances. Importantly, it is the first batch PoE that significantly decreases both the proof size and complexity of verification. Our experimental evaluations show that even a non-optimized implementation achieves such improvements, which would match the demands of real-life systems requiring large-scale PoE processing.

Finally, even though our proof techniques are conceptually similar to Rotem, we give an improved analysis of the application of the low order assumption towards secure batching of PoE instances, resulting in a tight reduction, which is important when setting the security parameter in practice.



## 2024/146

* Title: Computing Orientations from the Endomorphism Ring of Supersingular Curves and Applications
* Authors: Jonathan Komada Eriksen, Antonin Leroux
* [Permalink](https://eprint.iacr.org/2024/146)
* [Download](https://eprint.iacr.org/2024/146.pdf)

### Abstract

This work introduces several algorithms related to the computation of orientations in endomorphism rings of supersingular elliptic curves. This problem boils down to representing integers by ternary quadratic forms, and it is at the heart of several results regarding the security of oriented-curves in isogeny-based cryptography.
Our main contribution is to show that there exists efficient algorithms that can solve this problem for quadratic orders of discriminant $n$ up to $O(p^{4/3})$. Our approach improves upon previous results by increasing this bound from $O(p)$ to $O(p^{4/3})$ and removing some heuristics.
We introduce several variants of our new algorithm and provide a careful analysis of their asymptotic running time (without heuristic when it is possible). The best proven asymptotic complexity of one of our variant is $O(n^{3/4}/p)$ in average. The best heuristic variant has a complexity of $O(p^{1/3})$ for big enough $n$.
We then introduce several results regarding the computation of ideals between oriented orders. The first application of this is a simplification of the known reduction from vectorization to computing the endomorphism ring, removing the assumption on the factorization of the discriminant. As a second application, we relate the problem of computing fixed-degree isogenies between supersingular curves to the problem of computing orientations in endomorphism rings, and we show that for a large range of degree $d$, our new algorithms improve on the state-of-the-art, and in important special cases, the range of degree $d$ for which there exist a polynomial-time algorithm is increased. In the most special case we consider, when both curves are oriented by a small degree endomorphism, we show heuristically that our techniques allow the computation of isogenies of any degree, assuming they exist.



## 2024/147

* Title: Prime Masking vs. Faults - Exponential Security Amplification against Selected Classes of Attacks
* Authors: Thorben Moos, Sayandeep Saha, François-Xavier Standaert
* [Permalink](https://eprint.iacr.org/2024/147)
* [Download](https://eprint.iacr.org/2024/147.pdf)

### Abstract

Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then either exploits some knowledge about the position of the injected fault or about its value. The latter class of attacks, which can be applied without ever obtaining faulty outputs, such as Statistical Ineffective Fault Attacks (SIFA), then either exploits a dependency between the effectiveness of the fault injection and the value to be faulted (e.g., an LSB stuck-at-0 only affecting odd numbers), denoted as SIFA-1, or a conditional propagation of a faulted value based on a sensitive intermediate (e.g., multiplication of a faulted value by 0 prevents propagation), denoted as SIFA-2. The aptitude of additive masking schemes, which were designed to prevent side-channel analysis, to also thwart fault attacks is typically assumed to be limited. Common fault models, such as toggle/bit-flip, stuck-at-0 or stuck-at-1 survive the recombination of Boolean shares well enough for generic attacks to succeed. More precisely, injecting a fault into one or multiple Boolean shares often results in the same, or at least a predictable, error appearing in the sensitive variable after recombination. In this work, we show that additive masking in prime-order fields breaks such relationships, causing frequently exploited biases to decrease exponentially in the number of shares. As a result, prime masking offers surprisingly strong protection against generic statistical attacks, which require a dependency between the effectiveness of an injected fault and the secret variable that is manipulated, such as SIFA-1. Operation-dependent statistical attacks, such as SIFA-2 and Fault Template Attacks (FTA), may still be performed against certain prime-field structures, even if they are masked with many shares. Yet, we analyze the corresponding cases and are able to provide specific guidelines on how to avoid vulnerabilities either at the cipher design or implementation level by making informed decisions about the primes, non-linear mappings and masked gadgets used. Since prime-field masking appears to be one of the rare instances of affordable countermeasures that naturally provide sound protection against sidechannel analysis and certain fault injection attacks, we believe there is a strong incentive for developing new ciphers to leverage these advantages.



## 2024/148

* Title: Preliminary Cryptanalysis of the Biscuit Signature Scheme
* Authors: Charles Bouillaguet, Julia Sauvage
* [Permalink](https://eprint.iacr.org/2024/148)
* [Download](https://eprint.iacr.org/2024/148.pdf)

### Abstract

Biscuit is a recent multivariate signature scheme based on the MPC-in-the-Head paradigm. It has been submitted to the NIST competition for additional signature schemes. Signatures are derived from a zero-knowledge proof of knowledge of the solution of a structured polynomial system. This extra structure enables efficient proofs and compact signatures. This short note demonstrates that it also makes these polynomial systems easier to solve than random ones. As a consequence, the original parameters of Biscuit failed to meet the required security levels and had to be upgraded.



## 2024/149

* Title: Evict+Spec+Time: Exploiting Out-of-Order Execution to Improve Cache-Timing Attacks
* Authors: Shing Hing William Cheng, Chitchanok Chuengsatiansup, Daniel Genkin, Dallas McNeil, Toby Murray, Yuval Yarom, Zhiyuan Zhang
* [Permalink](https://eprint.iacr.org/2024/149)
* [Download](https://eprint.iacr.org/2024/149.pdf)

### Abstract

Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks.

In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.

Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-table-based implementation of AES. We further show an Evict+Spec+Time attack on an S-box-based implementation of AES, recovering the key with as little as 14389 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.



## 2024/150

* Title: SALSA FRESCA: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors
* Authors: Samuel Stevens, Emily Wenger, Cathy Yuanchen Li, Niklas Nolte, Eshika Saxena, Francois Charton, Kristin Lauter
* [Permalink](https://eprint.iacr.org/2024/150)
* [Download](https://eprint.iacr.org/2024/150.pdf)

### Abstract

Learning with Errors (LWE) is a hard math problem underlying post-quantum cryptography (PQC) systems for key exchange and digital signatures, recently standardized by NIST. Prior work [Wenger et al., 2022; Li et al., 2023a;b] proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods—better pre-processing, angular embeddings and model pre-training—to improve these attacks, speeding up preprocessing by 25× and improving model sample efficiency by 10×. We demonstrate for the first time that pre-training improves and reduces
the cost of ML attacks on LWE. Our architecture improvements enable scaling to larger-dimension LWE problems: this work is the first instance of ML attacks recovering sparse binary secrets in dimension n = 1024, the smallest dimension used in practice for homomorphic encryption applications of LWE where sparse binary secrets are proposed.



## 2024/151

* Title: Improving Linear Key Recovery Attacks using Walsh Spectrum Puncturing
* Authors: Antonio Flórez-Gutiérrez, Yosuke Todo
* [Permalink](https://eprint.iacr.org/2024/151)
* [Download](https://eprint.iacr.org/2024/151.pdf)

### Abstract

In some linear key recovery attacks, the function which determines the value of the linear approximation from the plaintext, ciphertext and key is replaced by a similar map in order to improve the time or memory complexity at the cost of a data complexity increase. We propose a general framework for key recovery map substitution, and introduce Walsh spectrum puncturing, which consists of removing carefully-chosen coefficients from the Walsh spectrum of this map. The capabilities of this technique are illustrated by describing improved attacks on reduced-round Serpent (including the first 12-round attack on the 192-bit key variant), GIFT-128 and NOEKEON, as well as the full DES.



## 2024/152

* Title: Equivalence of Generalised Feistel Networks
* Authors: Patrick Derbez, Marie Euler
* [Permalink](https://eprint.iacr.org/2024/152)
* [Download](https://eprint.iacr.org/2024/152.pdf)

### Abstract

This paper focuses on equivalences between Generalised Feistel Networks (GFN) of type-II. We introduce a new definition of equivalence which captures the concept that two GFNs are identical up to re-labelling of the inputs/outputs, and give a procedure to test this equivalence relation. Such two GFNs are therefore cryptographically equivalent for several classes of attacks. It induces a reduction of the space of possible GFNs: the set of the $(k!)^2$ possible even-odd GFNs with $2k$ branches can be partitioned into $k!$ different classes.

This result can be very useful when looking for an optimal GFN regarding specific computationally intensive properties, such as the minimal number of active S-boxes in a differential trail. We also show that in several previous papers, many GFN candidates are redundant as they belong to only a few classes. Because of this reduction of candidates, we are also able to suggest better permutations than the one of WARP: they reach 64 active S-boxes in one round less and still have the same diffusion round that WARP. Finally, we also point out a new family of permutations with good diffusion properties.



## 2024/153

* Title: Revisiting the Slot-to-Coefficient Transformation for BGV and BFV
* Authors: Robin Geelen
* [Permalink](https://eprint.iacr.org/2024/153)
* [Download](https://eprint.iacr.org/2024/153.pdf)

### Abstract

Numerous algorithms in homomorphic encryption require an operation that moves the slots of a ciphertext to the coefficients of a different ciphertext. We describe an FFT-like method for decomposing this slot-to-coefficient transformation (and its inverse) for BGV and BFV. The proposed method is specific to power-of-two cyclotomic rings and can handle both fully and sparsely packed slots. Previously, such a method was only known for non-power-of-two cyclotomic rings.

Our algorithm admits more freedom in the complexity-depth trade-off than prior works. Moreover, it brings down the computational complexity of the slot-to-coefficient transformation from a linear to a logarithmic number of FHE operations in the best case, which is shown via a detailed complexity analysis. We also provide a proof-of-concept implementation in the Magma bootstrapping library.



## 2024/154

* Title: Broadcast Encryption using Sum-Product decomposition of Boolean functions
* Authors: Aurélien Dupin, Simon Abelard
* [Permalink](https://eprint.iacr.org/2024/154)
* [Download](https://eprint.iacr.org/2024/154.pdf)

### Abstract

The problem of Broadcast Encryption (BE) consists in broadcasting an encrypted message to a large number of users or receiving devices in such a way that the emitter of the message can control which of the users can or cannot decrypt it.

Since the early 1990's, the design of BE schemes has received significant interest and many different concepts were proposed. A major breakthrough was achieved by Naor, Naor and Lotspiech (CRYPTO 2001) by partitioning cleverly the set of authorized users and associating a symmetric key to each subset. Since then, while there have been many advances in public-key based BE schemes, mostly based on bilinear maps, little was made on symmetric cryptography.

In this paper, we design a new symmetric-based BE scheme, named $\Sigma\Pi$BE, that relies on logic optimization and consensual security assumptions. It is competitive with the work of Naor et al. and provides a different tradeoff: the bandwidth requirement is significantly lowered at the cost of an increase in the key storage.



## 2024/155

* Title: Fully Homomorphic Encryption on large integers
* Authors: Philippe Chartier, Michel Koskas, Mohammed Lemou, Florian Méhats
* [Permalink](https://eprint.iacr.org/2024/155)
* [Download](https://eprint.iacr.org/2024/155.pdf)

### Abstract

At the core of fully homomorphic encryption lies a procedure to refresh the ciphertexts whose noise component has grown too big. The efficiency of the so-called bootstrap is of paramount importance as it is usually regarded as the main bottleneck towards a real-life deployment of fully homomorphic crypto-systems. In two of the fastest implementations so far, the space of messages is limited to binary integers. If the message space is extended to the discretized torus $T_{p_i}$ or equivalently to $Z_{p_i}$ with values of $p_i$ large as compared to the dimension of the quotient ring in which the operations are realised, the bootstrap delivers incorrect results with far too high probability. As a consequence, the use of a residue numeral system to address large integers modulo $p=p_1 \times \ldots \times p_\kappa$ would be of limited interest in practical situations without the following remedy: rather than increasing the polynomial degree and thus the computational cost, we introduce here a novel and simple technique (hereafter referred to as ``collapsing") which, by grouping the components of the mask, attenuates both rounding errors and computational costs, and greatly helps to sharpen the correctness of the bootstrap. We then rigorously estimate the probability of success as well as the output error and determine practical parameters to reach a given correctness threshold.



## 2024/156

* Title: Homomorphic sign evaluation using functional bootstrapping with a RNS representation of integers
* Authors: Philippe Chartier, Michel Koskas, Mohammed Lemou, Florian Méhats
* [Permalink](https://eprint.iacr.org/2024/156)
* [Download](https://eprint.iacr.org/2024/156.pdf)

### Abstract

In the context of fully-homomorphic-encryption, we consider the representation of large integers by their decomposition over a product of rings (through the Chinese Remainder Theorem) and introduce a new algorithm for the determination of the sign solely through the knowledge of ring-components. We then prove that our algorithm delivers a correct result with a very high probability.



## 2024/157

* Title: Delphi: sharing assessments of cryptographic assumptions
* Authors: Jeroen van de Graaf, Arjen K. Lenstra
* [Permalink](https://eprint.iacr.org/2024/157)
* [Download](https://eprint.iacr.org/2024/157.pdf)

### Abstract

Almost all practical cryptographic protocols are based on computational or ad-hoc assumptions. Assessing the strengths of these assumptions is therefore a key factor in evaluating the risks of the systems using them. As a service to (and by) cryptographic researchers and practitioners, we propose to create Delphi, a public database where researchers document their opinions and beliefs about the strengths of the most important assumptions. We believe this effort will be of great value when deciding which cryptographic primitives to keep or start using.



## 2024/158

* Title: HiSE: Hierarchical (Threshold) Symmetric-key Encryption
* Authors: Pousali Dey, Pratyay Mukherjee, Swagata Sasmal, Rohit Sinha
* [Permalink](https://eprint.iacr.org/2024/158)
* [Download](https://eprint.iacr.org/2024/158.pdf)

### Abstract

Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk by interacting only once with the key servers, while decryption must be performed individually for each record, potentially by a “less privileged” client. However, typical enterprises collect or generate data once and query it several times over its lifecycle in various data processing pipelines; i.e., enterprise workloads are often decryption heavy! ATSE does not meet the bar for this setting because of linear interaction / computation (in the number of records to be decrypted) – our experiments show that ATSE provides a sub-par throughput of a few hundred records / sec.

We observe that a large class of queries read a subsequence of records (e.g. a time window) from the database. With this access structure in mind, we build a new TSE scheme which allows for both encryption and decryption with flexible granularity, in that a client’s interactions with the key servers is at most logarithmic in the number of records. Our idea is to employ a binary-tree access structure over the data, where only one interaction is needed to decrypt all ciphertexts within a sub-tree, and thus only log-many for any arbitrary size sub-sequence. Our scheme incorporates ideas from binary-tree encryption by Canetti et al. [Eurocrypt 2003] and its variants, and carefully merges that with Merkle-tree commitments to fit into the TSE setting. We formalize this notion as hierarchical threshold symmetric-key encryption (HiSE), and argue that our construction satisfies all essential TSE properties, such as correctness, privacy and authenticity with respect to our definition. Our analysis relies on a well-known XDH assumption and a new assumption, that we call $\ell$-masked BDDH, over asymmetric bilinear pairing in the programmable random oracle model. We also show that our new assumption does hold in generic group model.

We provide an open-source implementation of HiSE. For practical parameters, we see 65$\times$ improvement in latency and throughput over ATSE. HiSE can decrypt over 6K records / sec on server-grade hardware, but the logarithmic overhead in HiSE’s encryption (not decryption) only lets us encrypt up to 3K records / sec (about 3-4.5$\times$ slowdown) and incurs roughly 500 bytes of ciphertext expansion per record – while reducing this penalty is an important future work, we believe HiSE can offer an acceptable tradeoff in practice.



## 2024/159

* Title: Logstar: Efficient Linear* Time Secure Merge
* Authors: Suvradip Chakraborty, Stanislav Peceny, Srinivasan Raghuraman, Peter Rindal
* [Permalink](https://eprint.iacr.org/2024/159)
* [Download](https://eprint.iacr.org/2024/159.pdf)

### Abstract

Secure merge considers the problem of combining two sorted lists into a single sorted secret-shared list. Merge is a fundamental building block for many real-world applications. For example, secure merge can implement a large number of SQL-like database joins, which are essential for almost any data processing task such as privacy-preserving fraud detection, ad conversion rates, data deduplication, and many more.

We present two constructions with communication bandwidth and rounds tradeoff. Logstar, our bandwidth-optimized construction, takes inspiration from Falk and Ostrovsky (ITC, 2021) and runs in $O(n\log^*n)$ time and communication with $O(\log n)$ rounds. In particular, for all conceivable $n$, the $\log^*n$ factor will be equal to the constant $2$ and therefore we achieve a near-linear running time. Median, our rounds-optimized construction, builds on the classic parallel median-based merge approach of Valiant (SIAM J. Comput., 1975), and requires $O(n \log^c n)$, $1<c<2$, communication with $O(\log \log n)$ rounds.

We introduce two additional constructions that merge input lists of different sizes. SquareRootMerge, merges lists of sizes $n^{\frac{1}{2}}$ and $n$, and runs in $O(n)$ time and communication with $O(\log n)$ rounds. CubeRootMerge is inspired by Blunk et al.'s (2022) construction and merges lists of sizes $n^{\frac{1}{3}}$ and $n$. It runs in $O(n)$ time and communication with $O(1)$ rounds.

We optimize our constructions for concrete efficiency. Today, concretely efficient secure merge protocols rely on standard techniques such as GMW or generic sorting. These approaches require a $O(n \log n)$ sized circuit of $O(\log n)$ depth. In contrast, our constructions are efficient and achieve superior asymptotics. We benchmark our constructions and obtain significant improvements. For example, Logstar reduces bandwidth costs $\approx 3.3\times$ and Median reduces rounds $\approx2.43\times$.



## 2024/160

* Title: LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast
* Authors: Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin
* [Permalink](https://eprint.iacr.org/2024/160)
* [Download](https://eprint.iacr.org/2024/160.pdf)

### Abstract

To improve the throughput of Byzantine Fault Tolerance (BFT) consensus protocols, the Directed Acyclic Graph (DAG) topology has been introduced to parallel data processing, leading to the development of DAG-based BFT consensus. However, existing DAG-based works heavily rely on Reliable Broadcast (RBC) protocols for block broadcasting, which introduces significant latency due to the three communication steps involved in each RBC. For instance, DAGRider, a representative DAG-based protocol, exhibits a best latency of 12 steps, considerably higher than non-DAG protocols like PBFT, which only requires 3 steps. To tackle this issue, we propose LightDAG, which replaces RBC with lightweight broadcasting protocols such as Consistent Broadcast (CBC) and Plain Broadcast (PBC). Since CBC and PBC can be implemented in two and one communication steps, respectively, LightDAG achieves low latency.

In our proposal, we present two variants of LightDAG, namely LightDAG1 and LightDAG2, each providing a trade-off between the best latency and the expected worst latency. In LightDAG1, every block is broadcast using CBC, which exhibits a best latency of 5 steps and an expected worst latency of 14 steps. Since CBC cannot guarantee the totality property, we design a block retrieval mechanism in LightDAG1 to assist replicas in retrieving missing blocks. LightDAG2 utilizes a combination of PBC and CBC for block broadcasting, resulting in a best latency of 4 steps and an expected worst latency of $12(t+1)$ steps, where $t$ represents the number of actual Byzantine replicas. Since a Byzantine replica may equivocate through PBC, LightDAG2 prohibits blocks from directly referencing contradictory blocks. To ensure liveness, we propose a mechanism to identify and exclude Byzantine replicas if they engage in equivocation attacks. Extensive experiments have been conducted to evaluate LightDAG, and the results demonstrate its feasibility and efficiency.



## 2024/161

* Title: zkMatrix: Batched Short Proof for Committed Matrix Multiplication
* Authors: Mingshu Cong, Tsz Hon Yuen, Siu Ming Yiu
* [Permalink](https://eprint.iacr.org/2024/161)
* [Download](https://eprint.iacr.org/2024/161.pdf)

### Abstract

Matrix multiplication is a common operation in applications like machine learning and data analytics. To demonstrate the correctness of such an operation in a privacy-preserving manner, we propose zkMatrix, a zero-knowledge proof for the multiplication of committed matrices. Among the succinct non-interactive zero-knowledge protocols that have an $O(\log n)$ transcript size and $O(\log n)$ verifier time, zkMatrix stands out as the first to achieve $O(n^2)$ prover time and $O(n^2)$ RAM usage for multiplying two $n \times n$ matrices. Significantly, zkMatrix distinguishes itself as the first zk-SNARK protocol specifically designed for matrix multiplication. By batching multiple proofs together, each additional matrix multiplication only necessitates $O(n)$ group operations in prover time.



## 2024/162

* Title: Zero-Knowledge Proofs of Training for Deep Neural Networks
* Authors: Kasra Abbaszadeh, Christodoulos Pappas, Dimitrios Papadopoulos, Jonathan Katz
* [Permalink](https://eprint.iacr.org/2024/162)
* [Download](https://eprint.iacr.org/2024/162.pdf)

### Abstract

A zero-knowledge proof of training (zkPoT) enables a party to prove that they have correctly trained a committed model based on a committed dataset without revealing any additional information about the model or the dataset. An ideal zkPoT should offer provable security and privacy guarantees, succinct proof size and verifier runtime, and practical prover efficiency. In this work, we present Kaizen, a zkPoT targeted for deep neural networks (DNNs) that achieves the above ideals all at once. In particular, our construction enables a prover to iteratively train their model by the (mini-batch) gradient-descent algorithm where the number of iterations need not be fixed in advance; at the end of each iteration, the prover generates a commitment to the trained model attached with a succinct zkPoT, attesting to the correctness of the entire training process. The proof size and verifier time are independent of the iteration number.

Kaizen relies on two essential building blocks to achieve both prover efficiency and verification succinctness. First, we construct an optimized GKR-style (sumcheck-based) proof system for the gradient-descent algorithm with concretely efficient prover cost; this scheme allows the prover to generate a proof for each iteration of the training process. Then, we recursively compose these proofs across multiple iterations to attain succinctness. As of independent interests, we propose a framework for recursive composition of GKR-style proofs and techniques, such as aggregatable polynomial commitment schemes, to minimize the recursion overhead.

Benchmarks indicate that Kaizen can handle a large model of VGG-$11$ with $10$ million parameters and batch size $16$. The prover runtime is $22$ minutes (per iteration), which is $\mathbf{43\times}$ faster than generic recursive proofs, while we further achieve at least $\mathbf{224 \times}$ less prover memory overhead. Independent of the number of iterations and, hence, the size of the dataset, the proof size is $1.36$ megabytes, and the verifier runtime is only $103$ milliseconds.
0 new messages