Analysis of Replacement Cycling Attacks Risks on L2s (beyond LN)

178 views
Skip to first unread message

Antoine Riard

unread,
May 23, 2024, 12:09:25 AMMay 23
to Bitcoin Development Mailing List
Hi,

Following up on detailing more the non-lightning bitcoin use-cases affected by replacement cycling attacks, mostly under the denial-of-service angle (cf. "All your mempool are belong to us" - bitcoin-dev 2023).

Excerpt from the original public disclosure:

>From my understanding the following list of Bitcoin protocols and
> applications could be affected by new denial-of-service vectors under some
> level of network mempools congestion. Neither tests or advanced review of
> specifications (when available) has been conducted for each of them:
> - on-chain DLCs
> - coinjoins
> - payjoins
> - wallets with time-sensitive paths
> - peerswap and submarine swaps
> - batch payouts
> - transaction "accelerators"
>
> Inviting their developers, maintainers and operators to investigate how
> replacement cycling attacks might disrupt their in-mempool chain of
> transactions, or fee-bumping flows at the shortest delay.

Also, this post intends to provide the lineaments of a common template to be useful in case of future cross-layer security issues arising in the bitcoin ecosystem. Such template to be leveraged by any skilled folk involved in the resolution of a cross-layer security-issue handling process.

(To be understood: without the necessary tangible involvement of the present author post, there is a sufficient number of other folks in this ecosystem with the skillset and _the guts_ to conduct such  process in a reasonable fashion in the future).

## Replacement Cycling Attack (a quick reminder)

The attacker goal of a replacement cycling attack is to delay the confirmation of a HTLC-timeout on an outgoing link of a routing node, sufficiently to enable an off-chain double-spend of a HTLC-preimage on an incoming link.

The attack scenario works in the following ways:
- Assume the Mallory - Alice - Mallet channel topology
- Mallory forwards a HTLC of 1 BTC to Mallet by the intermediary of Alice
- This HTLC expires at chain tip + 100 outgoing link, chain tip + 140 incoming link (Alice Pov)
- Mallet receives the HTLC on the Alice-Mallet links and does not settle it
- At chain tip + 100, Alice broadcasts commitment tx + HTLC-timeout tx
- Mallet replaces Alice's HTLC-timeout tx with a HTLC-preimage tx
- Mallet then replaces HTLC-preimage with a conflicting double-spend
- Mallet repeats this trick until chain tip reaches tip + 140
- When chain tip + 140, Mallory broadcasts HTLC-timeout to double-spend  incoming link
- In parallel, Mallet broadcasts a HTLC-preimage to double-spend the forwarding link

This is a rough summary of one of the simplest scenario, for further details refers back to the original public disclosure, already cf. above.

## Conditions of Attacks Exploitation

From my understanding, protocols and applications with a subset of the following characteristics can be affected by a replacement cycling attack.

a) Shared-UTXO spendings. Two or more distinct users each owns at least a spending path in a redeem script encumbering a single coin.

b) Join-UTXO spendings. Two or more distinct users each contributes a coin spend or destination outputs to a common transaction. Each user can commit more than one coin to the common transaction.

c) Pre-signed transactions. The group of users is pre-signing a chain of transactions to execute the protocol steps during an interactive phase. After this phase, any user can broadcast the transaction at any time, without further interactivity.

d) Absolute / Relative Timelocks. The set of pre-signed transactiosn might be encumbered by relative (nSequence) or absolute timelocks (nLockTime).

If you combine b) + c) you have things like coinjoins. If you combine a) + c) + d) you have things like lightning. Usually, the first class of things have been designated as a multi-party application, the second class of things a contracting protocol (e.g on the effects of mempool policy changes).

This distinction mostly matters in term of security models. All of them sounds to present some vector of transaction or package malleability.

## Time-value Denial-of-Service Risks

Leveraging transaction-relay and mempools mechanism to trigger a time-value denial-of-service in a target application or protocol phase has already been considered many times in the past.

E.g reaching hypothetical replacement limits to DoS payment channels participants (cf. "Anti DoS for tx replacement" - bitcoin-dev 2013) or DoSing a multi-party transaction by opt-ing out from replacement with a double-spend (cf. "On Mempool Funny Games against Multi-Party Funded Transactions" - lightning-dev 2021).

Under current mempool rules (i.e ones deployed on 99% of network over the last years), a replacement cycling opens a new generic way to trigger a denial-of-service in a Bitcoin application or protocol flow to paralyze the execution.

This denial-of-service can constitute a prolonged denial-of-service of the targeted application / protocol, or a waste of the on-chain timevalue of the coins consumed by the application / protocol. Here again, risks exposures is function of the application / protocol concrete combination of characteristics.

Some protocols have lightweight anti-DoS measures to alleviate this vector of denial-of-concern. E.g in lightning after 2016 blocks, participants to a payment channel can forget the funding transaction (BOLT2).

## Time-value Denial-of-Service Risks: The Lightning One-Link Case

Let's see a concrete example of a time-value DoS triggered by a replacement cycling.

The public disclosure of replacement cycling attack has been mostly centered on loss of funds risks affecting HTLC forwarding over Lightning routing nodes. Independently, a replacement cycling attack can be leveraged to provoke denial-of-service among a Lightning routing node and an end-node on a spoke link.

The attack works in the following fashion (offered HTLC on outgoing link) as it was not fully fleshed out in the disclosure communications:
- Alice and Bob are lightning nodes, they share a funded chan
- Alice forwads a HTLC to Bob for further routing to Caroll
- Bob forwards the HTLC to Caroll and gets the HTLC preimage
- Bob witholds settltement on Alice - Bob link until chain tip height reaches `cltv_expiry`
- Alice broadcast a HTLC-timeout to recover her funds
- Bob engages in a replacement cycling by repeatedly rebroadcasting the HTLC-preimage and double-spending it

Alice is stuck with her HTLC funds that cannot be recovered on-chain. While Bob is paying a replacement penalty every time it happens, there might be a scaling effect targeting many HTLC-timeout with a single HTLC preimage (`option_anchors_zero_fee_htlc_tx`).

It should be noted that in matters of offered HTLC expiration on an outgoing link, each lightning implementation has its own logic, as this is not something standardized (e.g ldk's `LATENCY_GRACE_PERIOD_BLOCKS`).

It is left as an open question how an an attacker can economically benefit from this denial-of-service.

## Loss of Funds Risks

As it has been exposed during the public disclosure of the replacement cycling attack, it can be leveraged to steal users funds from lightning payment channels, as one protocol affected.

As an extension, it can affect any other contracting protocol (characterisics a. + c. + d.). On those protocols (e.g lightning or swaps), the protocol semantic is driven by absolute / relative timelocks initialized in a set of pre-signed transactions and finalized by the chain tip height or epoch time.

The underlying funds security is conditional on the time-sensitive broadcast and inclusion of the pre-signed transactions to execute an off-chain state. Failing to fulfill this time-sensitive requirement can lead to loss of funds.

Generally, loss of funds risks affecting a multi-party application / contracting protocols still depends on the usage of "short duration" of relative / absolute timelocks.

## Second-Layers and Use-Cases

We're further surveying deployed second-layers and use-cases either affected by time-value DoS or loss of funds risks.

(Transaction-relay technique like "transaction accelerators" have been excluded from the list of potentially affected second-layers initially published, actually it's neither a multi-party application or contracting protocol).

On-chain DLC (contracting protocol): a funding transaction locks funds in a 2-of-2. A subsequent pair of contract execution transaction encodes DLC result from oracle contribution. There can be a refund transaction under timelocks (model: cf. "dlcspecs" - github 2020).

On-chain DLC risks: loss of funds _only if oracle gets wrong_. Time-value DoS risk on the funding transaction or with refund if timelock miselection.

Coinjoin (multi-party application): a single joint transaction with contributions from N inputs (model: cf. "Coinjoin: Bitcoin privacy for the real world" - bitcointalkg.org 2013)

Coinjoin risks: no loss of funds risks. Time-value DoS risk, if a fee-bumping of the joint transaction can be done by any user.

Payjoin (multi-party application): a single joint transaction with contributions from N inputs owned by a single user paying another user (model: cf. "improving privacy using pay-to-endpoint" - blockstream blog 2018).

Payjoin risks: no loss of funds risks. Time-value DoS risk, if a fee-bumping of the joint transaction can be done by any user.

Wallet with time-sensitive paths (contracting protocols): a user locks up funds with a set of pre-signed transactions. Each pre-signed transaction can have unique spending conditions and/or send to another user (model: cf. "bip65 op_checklocktimeverify"
- bips 2014).

Wallet with time-sensitive paths risks: loss of funds risk _only if spend path to third-party with divergent interest and timelock miselection_. Time-value DoS risk _only if spend to third-party with divergent interest and timelock miselection_.

Peerswap and submarine swaps (contracting protocol): a funding transaction locks funds in a 2-of-2. A swap can be spend by 3 subsequent transactions (invoice, coop, csv) to settle positively or negatively the state of the swap (model: cf. "peerswap" - element github 2022).

Peerswap and submarine swaps risks: loss of funds risk if timelock miselection. Time value DoS risk.

Batch payouts (multi-party application): a single joint transactions with contributions from N inputs owned by a singler user paying a N number of users (model: cf. "scaling bitcoin using payment batching" - bitcoin optech 2021).

Batch payouts risks: no loss of funds risks. Time-value DoS risk, if a fee-bumping of the joint transaction can be done by any user.

For all those second-layers and use-cases risks identification, I think a replacement cycling attack is plausible, independently of the level of network mempools congestion.

On this area, thanks to the insights and observation from folks who have participated in the initial security-handling around February 2023 - All names have already been listed in the initial email.

## Conclusion

A transaction-relay jamming can be identified as a protocol counterparty or application participant interfering with the relay of transaction. If the transactions are time-sensitive per the protocol semantic, this interference can constitute a loss of funds risk. If the transactions are only collaboratively built, this interference can constitute a timevalue DoS risk. Replacement cycling attack constitutes one variant of class of attacks, of which pinning is the other well-known variant.

Additionally, in this context of class of attacks arising from the interfacing of bitcoin applications and protocols with the base-layer transaction-relay network and its mempools rules, it can be noteworthy to under-light some observations concerning
security-issue handling process.

Firstly, there is not only a difficulty of diagnosticing correctly what specific bitcoin software is potentially affected. Establishing a relevant diagnostic is not only saying what is affected, though also saying the type of risk exposures (e.g plain loss of funds, fee griefing, bandwidth denial-of-service) grieving each specific software.

Secondly, once the diagnostic is done, there is the curative phase where mitigation patches are developed and included in the codebase. Each codebase is unique (e.g have its own language) and it can have its own usual release schedule, indicating a the rate at which a mitigation patch can disseminate across its crowds of active users.

Furthermore, in a decentralized ecosystem where each full-node can run its own configuration of mempool policy rules on a wide variety of hardware host, not all mitigation strategies are equally viable. Considerations on the same level have already been weighted in the past e.g at the occasion of CVE-2021-31876 (replacement inheritance defect on bitcoin core).

Don't trust, verify. All mistakes and opinions are my own.

Cheers,
Antoine

/dev /fd0

unread,
May 24, 2024, 6:59:28 AMMay 24
to Bitcoin Development Mailing List
Hi Antoine,

Does this also affect coinswap? If yes, what are the risks involved?

/dev/fd0
floppy disk guy

Antoine Riard

unread,
May 25, 2024, 4:56:59 AMMay 25
to Bitcoin Development Mailing List
Hi /dev/fd0

From my understanding of coinswap (model: cf. "Detailed protocol design for routed multi-transaction CoinSwap"), the contract transaction can be spend by either Alice timeout or Bob preimage. Belcher's coinswap (a contracting protocol) does not strictly restrain the second-stage transaction spending the contract transaction.

Let's say you have Caroll -> Alice -> Bob as a routed coinswap topology. Bob can broadcast a contract transaction, get it confirmed, then engage in replacement cycling attack by leveraging a child transaction spending the preimage path (it's only Bob private key), then continuously replacing this child by.conflicting a UTXO non related to the coinswap. At expiration of the relative timelock on C-A link, Caroll clawback the swapped UTXO with the timeout path.

If this transaction flow is the correct one, coinswap suffers from loss of funds and denial-of-service risks, at the image of lightning. Scaling up timelocks or monitoring the local mempool for preimage might be imperfect, yet practical mitigations hardening against exploitations.

Best,
Antoine

Reply all
Reply to author
Forward
0 new messages