Reversing remote signer connections

Alex Akselrod

unread,

Apr 11, 2023, 4:58:35 PM4/11/23

to lnd

Hi LND folks,

Laolu and I have been in discussion about reversing the connectivity between remote signer and watch-only node, as originally proposed a year ago[1] by Graham. We'd like to get this out into the open so we can discuss the proposed architecture with the community.

For the initial implementation, the watch-only node would wait for the signer connection in order to start operation, and shut down/"crash" (similar to now), or at least drop all peer connections, when the signer is disconnected. The signer would register itself over gRPC, similar to how an HTLC interceptor does now[2].

In future work, we could start adding the capability for limited operation without a signer connected, possibly by caching the node identity privkey (or known peer ECDH-derived shared keys) at the watch-only node. This would let the node run in a "safe mode" to participate in gossip without a signer, and allow the signer to connect/disconnect to the node for receiving/sending/routing money as needed.

For lndsigner[3] (which will soon be moved to a repo under a new GitHub org), we believe the signer->watch-only node connectivity method is more secure than the current method. For maintainability, we're not planning on supporting the current method after the new one is available. On the lnd side, maybe we can deprecate and eventually remove the current remote signer connection method when we implement the new one.

[1]: https://groups.google.com/a/lightning.engineering/g/lnd/c/G_7h45-5WAg

[2]: https://lightning.engineering/api-docs/api/lnd/router/htlc-interceptor/index.html

[3]: https://github.com/NYDIG/lndsigner/

Olaoluwa Osuntokun

unread,

Apr 11, 2023, 6:39:35 PM4/11/23

to Alex Akselrod, lnd

Hi Alex,

Thanks for starting this thread!

Just to provide some wider context to the list:

* Today the cold node connects to the signer (hot) node.

* We're looking to support a model where the signer node connects to the
cold node.

* This is useful, as then it allows things like mobile nodes to connect
into the signer. The reverse isn't always possible as the mobile node
doesn't have a public listening port.

Re that last point, it's also possible to leverage an LNC-like connection to
still have the cold node connect to the mobile node. However, for this to
work well things need to be super snappy, which argues for the mobile node
connecting to the signer, as once it's up it initiates the handshake.

As we pursue this line of thinking, we'll also want to examine to what
extent we can _speed up_ the restart time of a node. Today with bbolt,
things can take some time, but with sqlite things a _much_ faster, and start
up time is basically instant. With that out of the way, the other aspect to
examine would be: how long it takes to establish new persistent connections
to peers.

> For the initial implementation, the watch-only node would wait for the
> signer connection in order to start operation, and shut down/"crash"
> (similar to now), or at least drop all peer connections, when the signer
> is disconnected.

Yeah this is the model I had in mind. IMO it's much simpler for the daemon
to simply shutdown or go into a "safe mode" once the remote signer
connection is dropped. Ideally in this restricted mode, the main daemon can
also respond to some basic RPCs, like `GetInfo` or `WalletBalance` so a
monitoring tool (or w/e) with a restricted macaroon can still check in on
the node.

So at a high level, any time the remote signer connection is dropped, then
the main daemon needs to _immediately_ drop all active connections and
revert to this same mode.

> This would let the node run in a "safe mode" to participate in gossip
> without a signer, and allow the signer to connect/disconnect to the node
> for receiving/sending/routing money as needed.

Yep, I had that same thought: it's technically _possible_ for it to retain
the brontide p2p connections, as after the initial handshake (auth), the
connection object has the shared secret, so it can continue to
encrypt/decrypt messages.

Related to the comment above about snappy restarts: if the daemon is able to
hand on to a few "gossip only" connections while the signer is down, then it
can ensure by the time the signer is back, we've synced all the latest
gossip.

In terms of architecture, the way things work today is that:

* The signer node implements the Signer gRPC service.

* On start-up, the cold node connects to the signer node.

* If it can't reach the singer node, then it just crashes

What I think we want is instead something like:

* On start up, the cold node connects out to what it _thinks_ is the
signer node.

* This is instead a gRPC proxy that'll wait until the signer node
makes an inbound connection, and will then proxy the messages back and
forth (basically an io.Copy, but for gRPC).

* The cold node inherits a global context.Context that's linked to the
actual inbound gRPC connection. If this is ever cancelled, then things
go back to that safe mode.

This should allow for minimal-ish changes, as the behavior of the cold node
is more or less the same. It _thinks_ it has an actual connection, but then
should hit some sort of health check endpoint to ensure it exists before it
tries to do anything that requires the signer.

> we believe the signer->watch-only node connectivity method is more secure
> than the current method.

That's really interesting, can you elaborate on the security model that led
to that conclusion? Is it that then the signer node doesn't actually need a
listening port?

-- Laolu

Alex Akselrod

unread,

Apr 11, 2023, 8:30:33 PM4/11/23

to lnd, Olaoluwa Osuntokun, lnd

Laolu,

Thanks for the additional context!

> Yep, I had that same thought: it's technically _possible_ for it to retain
> the brontide p2p connections, as after the initial handshake (auth), the
> connection object has the shared secret, so it can continue to
> encrypt/decrypt messages.
>
> Related to the comment above about snappy restarts: if the daemon is able to
> hand on to a few "gossip only" connections while the signer is down, then it
> can ensure by the time the signer is back, we've synced all the latest

> gossip.

Great point! Maybe even in an initial PR, we could drop only peer connections

with active channels. Gossip-only connections can remain up, and no new

connections can be made.

This wouldn't be fully reliable, as we may lose gossip-only connections over

time. However, it's still a better starting point than losing our ability to sync

gossip altogether. Eventually, we can cache shared secrets for known peers

or even the identity privkey.

> * The cold node inherits a global context.Context that's linked to the
> actual inbound gRPC connection. If this is ever cancelled, then things
> go back to that safe mode.

If we go this way, we might start brontide connections with a context that's

linked to the server, and substitute the signer-linked context upon channel

activation. This way, gossip-only links can stay up when the signer

disconnects.

> That's really interesting, can you elaborate on the security model that led
> to that conclusion? Is it that then the signer node doesn't actually need a
> listening port?

Indeed that's the case. The signer doesn't need to be reachable - there's

no requirement for a listening port or inbound NAT. The signer doesn't

even have to be powered up when not needed.

- Alex

Luke Gao

unread,

May 29, 2024, 8:29:50 AM5/29/24

to lnd, Alex Akselrod, Olaoluwa Osuntokun, lnd

Our team（Lnfi Network） is exploring this direction to continue the remotesign project. If anyone is interested in collaborating, please let me know.
email : lu...@hephalabs.com

Viktor Tigerstrom

unread,

May 29, 2024, 10:04:18 AM5/29/24

to Luke Gao, lnd, Alex Akselrod, Olaoluwa Osuntokun

Hi Luke,

As I believe it'll be of interest for you, I recently opened a PR that enables a new model for the remote signer, where the cold node (the signer) connects to the watch-only node:

https://github.com/lightningnetwork/lnd/pull/8754

The PR is in early stages and has had no reviews yet.

Please note that this PR only focuses on reversing the connection, and adds no extra validation on the signer side or similar. It also doesn't address connectivity issues or other issues of that nature. The idea is to address such topics in future PRs.

Any feedback you'd have on this new model would be very much appreciated, as well as feedback on what you'd need support for in future PRs as well!

Feel free to reach out to me if you have any questions or collaboration proposals on:

vik...@lightning.engineering

Best regards,

Viktor Tigerström

Reply all

Reply to author

Forward