lnd v0.16.3 - Release Candidate 1

56 views
Skip to first unread message

Olaoluwa Osuntokun

unread,
May 30, 2023, 5:49:17 PM5/30/23
to lnd
Hi y'all,

We've just tagged the first release candidate for the upcoming v0.16.3
release:
https://github.com/lightningnetwork/lnd/releases/tag/v0.16.3-beta.rc1.

As can be seen from the release notes
(https://github.com/lightningnetwork/lnd/blob/master/docs/release-notes/release-notes-0.16.3.md)
this is a very small release that contains another set of optimizations for
the mempool recondition logic, a bug fix for macaroon regeneration, and two
fixes for potential spurious for close errors.

## Mempool Reconciliation Optimizations

Providing some context on the first fix: in v0.16.1, we introduced some new
mempool watching logic to allow lnd to extract a preimage spend directly
from the mempool. This was meant to help to mitigate some pinning related
attack vectors, as once we find the preimage in the mempool, then we can
settle back instantly. Right around the time we added this logic, the
mempool quickly ballooned to recent heights. This latest round of
optimizations uses a more efficient data structure to track the mempool
spends which need to be reconciled periodically.

One question for anyone reading this that has an lnd node hooked up to a
bitcoind node on mainnet: how long does the initial load and reconciliation
take on your machine? Including any relevant hardware/machine specs would
also be useful.

Here's a log sample from one of my nodes (it's a very beefy machine tho,
Intel® Core™ i9-13900K, 16 cores, 32 threads): ``` 2023-05-09 14:44:40.170
[DBG] LNWL: Loaded mempool spends in 35.204195178s 2023-05-09 14:44:40.170
[INF] LNWL: Started polling mempool for new bitcoind transactions via RPC.
2023-05-26 18:12:38.682 [TRC] LNWL: Reconciled mempool spends in
265.320254ms ```

This is with a default 100 MB mempool.

### Param Modifications for Nodes w/ Meager Hardware

For those that are having issues with the new logic, I recommend updating to
the rc or applying the patch in isolation if it's causing dire issues with
your node. One thing to note is that if that last log line takes _longer_
than 1 minute, then you should increase the: `bitcoind.txpollinginterval`
config setting to something greater than 1m (the current default).

We know of ways we can achieve similar defense-in-depth security protection
with more efficient API calls. However, these require very recent versions
of bitcoind (24 onwards), so we'll need to gate this behind the appropriate
version detection logic.

FWIW, btcd isn't affected by this issue as it has first-class APIs to detect
spends in the mempool for an RPC caller, so we don't need to maintain and
scan with our own index.

## Force Close Bug Fixes

This release contains two changes that should help to reduce both spurious
force closes as well as the dreaded _cascade_ force close.

The first fix is that we'll now properly disconnect the _peer_ connection if
we think the other side is stalled (hasn't replied with a revoke and ack).
Before we'd just stop the link (processing new channel state machine
messages) in the hopes that it was a TCP issue, so the read/write timeout
would kick in and disconnect the TCP connection. However, it's possible that
the other side _eventually_ responds, but we don't process the message as
we've stopped the link. There's some other details here as well, such as
being able to cancel back the incoming link messages to ensure that can
proceed independently as well, but this should help with this particular
case.

The second fix, which may be the culprit of the cascade force closes some
have seen, is related to our HTLC sweeping logic. The sweeper has a concept
of "negative yield", so it won't sweep an HTLC it costs more than the HTLC
amt itself. However this doesn't factor in the fact that we can now batch
these HTLC sweeps into once (negative yield in isolation may be a positive
yield in aggregate). We'll now ensure that if we go to chain, we take into
account the group to properly resolve the HTLC. Ensuring we always try to
resolve this HTLC also means that once that's complete, things can be
properly cancelled back.

In addition to the above, we're also looking into a number of other
go-to-chain heuristics more bound in the unit economics of the
chain/network. This may mean things like cancelling small value incoming
HTLCs back early if we feel they'll be very contested, or even cancelling
them back even earlier in the pipeline, before we even go to chain. These
options need a lot more analysis though, as done incorrectly, a node can be
left on the hook for very large HTLCs.

With all that said, we recommend that all nodes update to the final release!

-- Laolu
Reply all
Reply to author
Forward
0 new messages