Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Coda development (rpc2 handshake / instance authentication)

0 views
Skip to first unread message

u-m...@aetey.se

unread,
May 6, 2016, 4:08:56 AM5/6/16
to
On Thu, May 05, 2016 at 11:25:36AM -0400, Jan Harkes wrote:
> On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
> > they verify that the other party belongs to the correct realm,
> > but this might happen to be a different server in the same realm. I guess
> > mixing the server id into the handshake would eliminate this uncertainty.
>
> Eh? Server ids should not be exposed like that to begin with.
>
> Aside from that a client isn't trying to connect to a server, it is
> trying to bind to a volume. If you get connected to the the wrong server
> (how in the world is that even a thing that would 'happen'?) it wouldn't
> be able to bind to the volume anyway and so the end result is the same
> without needing to put serverids in the handshake.
>
> A client should have no need to know a server id, ever.

I guess you are thinking about things which are unrelated.

Think of a "server" as of an "RPC2 server" (f.i. update?),
the "server id" is the idea of the client about
"which service instance I am to talk to".

There are indeed situations where the outcome by intent does not depend
on which service instance a client is talking to (auth2). Then the
"server id" could be considered "empty".

This is not the case for other services. The server instances
generally are not equal e.g. for update the scm is special, for
resolution the coordinator is special.

I would not dare to analyze _all_ cases including possibly unknown future
ones and be sure that talking to a wrong instance never ever can lead
to a problem.

IOW as soon as we _assume_ that a client is talking to a certain service
instance, let us better _ensure_ that this is the case, not trust the ip.

This is a very basic thing to do and does not look expensive,
what would be the reason to oppose this?

Regards,
Rune

Jan Harkes

unread,
May 6, 2016, 9:19:57 AM5/6/16
to
On Fri, May 06, 2016 at 10:07:35AM +0200, u-m...@aetey.se wrote:
> On Thu, May 05, 2016 at 11:25:36AM -0400, Jan Harkes wrote:
> > On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
> > > they verify that the other party belongs to the correct realm,
> > > but this might happen to be a different server in the same realm. I guess
> > > mixing the server id into the handshake would eliminate this uncertainty.
> >
> > Eh? Server ids should not be exposed like that to begin with.
> >
> > Aside from that a client isn't trying to connect to a server, it is
> > trying to bind to a volume. If you get connected to the the wrong server
> > (how in the world is that even a thing that would 'happen'?) it wouldn't
> > be able to bind to the volume anyway and so the end result is the same
> > without needing to put serverids in the handshake.
> >
> > A client should have no need to know a server id, ever.
>
> I guess you are thinking about things which are unrelated.
>
> Think of a "server" as of an "RPC2 server" (f.i. update?),
> the "server id" is the idea of the client about
> "which service instance I am to talk to".

We are talking about RPC2, which is a messaging protocol between clients
and servers that relies on shared secrets to get a common session key.

If client A wants to connect to server B and somehow finds out that it
is supposed to connect to address X port Y, and whoever 'picks up the
phone' at that address uses the correct shared secret to complete the
handshake, then there is no use for the instance id.

Because if A somehow got connected to something that isn't B, then there
is no reliable way to resolve the issue of 'how do I connect to B'. And
no, disconnecting and reconnecting until we randomly hit the right
instance id is not a solution. This is not an RPC2 issue, this is an
issue with the application and adding that instance id in the RPC2
handshake is not going to solve it.

> This is not the case for other services. The server instances
> generally are not equal e.g. for update the scm is special, for

Only because of how some Coda server updates are stored and propagated.
They could be stored in etcd or zookeeper or a mysql database for all I
care and things would be very different. The problem is that this backend
detail shouldn't even have to propagate out to clients, there should be
no need for cpasswd to know that the auth2 daemon on 'server5681' is any
more or less special than any of the other auth2 daemons. The auth2
daemons should be aware how updates are propagated, so they could
delegate the password change request to the correct place.

That is how every other system that uses something like PAXOS or RAFT to
choose a master/coordinator handles such situations. Not by having
clients just reconnect to a random server in the hope to hit the one
that is special. How would a client know that instance '1', or maybe
'0' happens to be the special one, if there is a PAXOS style master
selection where any of the servers could be the read/write replica and
they can even revote an pick a new one in case the current master is
lost.

> resolution the coordinator is special.

Yes it is and how it is chosen? The client connects to a random replica
for a volume and by sheer luck that server is the coordinator for the
resolution. Or maybe it wasn't luck after all, maybe any of the servers
can become the coordinator for the duration of a resolution and it is
whichever server the client connected to.

> I would not dare to analyze _all_ cases including possibly unknown future
> ones and be sure that talking to a wrong instance never ever can lead
> to a problem.

There is no such thing as a wrong instance, and if you think the client
application could have a better idea than the server instances I've got
some bad news for you.

Jan

u-m...@aetey.se

unread,
May 6, 2016, 10:32:50 AM5/6/16
to
On Fri, May 06, 2016 at 09:18:55AM -0400, Jan Harkes wrote:
> We are talking about RPC2, which is a messaging protocol between clients
> and servers that relies on shared secrets to get a common session key.
>
> If client A wants to connect to server B and somehow finds out that it
> is supposed to connect to address X port Y, and whoever 'picks up the
> phone' at that address uses the correct shared secret to complete the
> handshake, then there is no use for the instance id.

Exactly, I would like a small change in rpc2 to be able to make use of it.

An alternative would be in protocols on top of rpc2 begin the
conversation with the server party presenting oneself.

> Because if A somehow got connected to something that isn't B, then there
> is no reliable way to resolve the issue of 'how do I connect to B'.

This is not the issue I am worried about, but rather "prevent
A from talking to B's brother C when A intended to talk to B".

(The issue of an unavailable service is the one Coda is meant to deal
with gracefully)

> > I would not dare to analyze _all_ cases including possibly unknown future
> > ones and be sure that talking to a wrong instance never ever can lead
> > to a problem.
>
> There is no such thing as a wrong instance, and if you think the client
> application could have a better idea than the server instances I've got
> some bad news for you.

:)

For me, if the instance is not the one the rpc2 client meant to contact,
it is wrong. There is nothing in the code which prevents this.

You insist that when this happens, it is guaranteed to be harmless.

I do not see any guarantee for being harmless and I doubt you
have analyzed the system exhaustively, to be able to say _never_.

Note, I do not say that there _will_ be any harm, but I want to be sure
there will be not.

You are right that a peer instance check is not included in the rpc2
functionality. This does not seem to be hard to add.

Do you feel this would be expensive or risky? What would be the downsides,
besides the corresponding API extension (adding a "server instance id"
argument)?

Regards,
Rune

Jan Harkes

unread,
May 6, 2016, 11:49:52 AM5/6/16
to
On Fri, May 06, 2016 at 04:31:25PM +0200, u-m...@aetey.se wrote:
> On Fri, May 06, 2016 at 09:18:55AM -0400, Jan Harkes wrote:
> > We are talking about RPC2, which is a messaging protocol between clients
> > and servers that relies on shared secrets to get a common session key.
> >
> > If client A wants to connect to server B and somehow finds out that it
> > is supposed to connect to address X port Y, and whoever 'picks up the
> > phone' at that address uses the correct shared secret to complete the
> > handshake, then there is no use for the instance id.
>
> Exactly, I would like a small change in rpc2 to be able to make use of it.

I think you didn't actually read what I said there. There is no
'exactly' there where we agree on. For reference I'll copy the above
text again down here.

If client A wants to connect to server B and somehow finds out that it
is supposed to connect to address X port Y, and whoever 'picks up the
phone' at that address uses the correct shared secret to complete the
handshake, then there is no use for the instance id.

Where I clearly state "there is _no_ use for the instance id" and in
response you say (paraphrased), "Exactly, that is why I want to add an
instance id to the rpc2 handshake".

> > Because if A somehow got connected to something that isn't B, then there
> > is no reliable way to resolve the issue of 'how do I connect to B'.
>
> This is not the issue I am worried about, but rather "prevent
> A from talking to B's brother C when A intended to talk to B".
>
> (The issue of an unavailable service is the one Coda is meant to deal
> with gracefully)

Oh now we're suddenly back at talking about Coda servers then? Because
last time I checked only Coda clients handle unavailable servers
gracefully with write-disconnected operation. Why did you bring in the
auth2 and update and other server.

Coda servers either have the volume the clients wants to connect to, or
they do not. If the volume is not available the server returns an error
and the client is expected to go away and try again later. If the server
has the volume the client's request can be handled. It doesn't matter if
we've got B or C, as long as they have the needed volume.

> For me, if the instance is not the one the rpc2 client meant to contact,
> it is wrong. There is nothing in the code which prevents this.

Yes there is, the server has to run at the right ip address and the
right port, and it has to use the correct shared secret when given a
particular client identifier, and the the RPC2 protocols have unique
values (subsystems) so you cannot send volutil commands to an auth2
daemon, and so finaly you connected to a server on the expected
address/port, and it knows the shared secret and it uses the same
'subsystem', and then finally it happens to actually have (for instance)
a copy of the volumeid we are trying to connect to.

I would say that is a whole lot of levels of preventing to talk to the
wrong guy.

> You insist that when this happens, it is guaranteed to be harmless.
>
> I do not see any guarantee for being harmless and I doubt you
> have analyzed the system exhaustively, to be able to say _never_.

You aren't even clear about which 'protocol' you are talking about.
vice.rpc2? auth.rpc2? volutil? update? resolution? repair?

You are just handwaving about a 'server id' or 'instance id'. What is
this 'ID', a single 8 bit number what the Coda servers currently prepend
to volumeids? a 32-bit number? 64-bit? UUID? Maybe an X509 identifier?
Signed? How about a 4096-bit PGP public key.

I have given multiple concrete examples. I have exhaustively analyzed
'the system' to the point that I am convinced there is first of all no
need for a server identifier. Second of all it would be a gross layering
violation to stuff it into the rpc2 handshake. And finally...

> Do you feel this would be expensive or risky? What would be the downsides,
> besides the corresponding API extension (adding a "server instance id"
> argument)?

Are you seriously still saying that after all that I wrote in my
previous email?

Jan

u-m...@aetey.se

unread,
May 6, 2016, 3:00:24 PM5/6/16
to
On Fri, May 06, 2016 at 11:48:57AM -0400, Jan Harkes wrote:
> On Fri, May 06, 2016 at 04:31:25PM +0200, u-m...@aetey.se wrote:
> > On Fri, May 06, 2016 at 09:18:55AM -0400, Jan Harkes wrote:
> > > If client A wants to connect to server B and somehow finds out that it
> > > is supposed to connect to address X port Y, and whoever 'picks up the
> > > phone' at that address uses the correct shared secret to complete the
> > > handshake, then there is no use for the instance id.
> >
> > Exactly, I would like a small change in rpc2 to be able to make use of it.
>
> I think you didn't actually read what I said there. There is no
> 'exactly' there where we agree on. For reference I'll copy the above
> text again down here.
>
> If client A wants to connect to server B and somehow finds out that it
> is supposed to connect to address X port Y, and whoever 'picks up the
> phone' at that address uses the correct shared secret to complete the
> handshake, then there is no use for the instance id.

Trust me I did read. :)

> Where I clearly state "there is _no_ use for the instance id" and in
> response you say (paraphrased), "Exactly, that is why I want to add an
> instance id to the rpc2 handshake".

This is not what I wrote.

Trying to summarize:

I say:
======
- It is DESIRABLE to verify the service instance at current rpc2 handshake,
BECAUSE there is no guarantee that talking to a wrong instance
is harmless.

This is BASED ON my perception of secure design.

- (Unfortunately) there is no place for the service instance id in the
current rpc2 handshake.

- This can be improved and I want to create such a place and add the
missing piece of data and the verification.

I perceive that you say: (among others in this letter below)
========================
- It is NOT NECESSARY to verify server instance at rpc2 handshake,
BECAUSE in all current uses of rpc2 the upper layers of the protocols
ensure that there will be no harm ever caused by talking to a wrong
instance.

This is BASED ON your exhaustive analysis of the interaction
between the different parts of the code.

I am now taking your word
=========================
for the analysis of the implications, exhaustive or not.

It looks like the matter does not deserve any more discussion,
because I am convinced that it is of no immediate importance.

There is no point in fixing potential problems when we have more
tangible stuff to take care of.

I am still commenting your letter below, not for any continuation
of the discussion but for a casual reader.

> > For me, if the instance is not the one the rpc2 client meant to contact,
> > it is wrong. There is nothing in the code which prevents this.
>
> Yes there is, the server has to run at the right ip address and the
> right port, and it has to use the correct shared secret when given a
> particular client identifier, and the the RPC2 protocols have unique
> values (subsystems) so you cannot send volutil commands to an auth2
> daemon, and so finaly you connected to a server on the expected
> address/port, and it knows the shared secret and it uses the same
> 'subsystem', and then finally it happens to actually have (for instance)
> a copy of the volumeid we are trying to connect to.
>
> I would say that is a whole lot of levels of preventing to talk to the
> wrong guy.

Nice that this covers a whole lot of potential situations.
Nevertheless, one example to the contrary:

There is nothing preventing an update client to happily talk
to a wrong update server instance if the ip is spoofed and "scm"
data on some server hosts already are inconsistent
(iow there is a possibility of error escalation or recovery prevention).

I agree though that this is unlikely to pose any immediate problem
in practice.

> You aren't even clear about which 'protocol' you are talking about.
> vice.rpc2? auth.rpc2? volutil? update? resolution? repair?

The use of an extra "server id" argument is applicable to all services,
with whatever form the "id" can have, if any (auth2 is a nice example
of a service where talking to a "wrong" guy does not do any harm,
the client can even connect to all instances at once without
caring which one replies first).

> You are just handwaving about a 'server id' or 'instance id'. What is
> this 'ID', a single 8 bit number what the Coda servers currently prepend
> to volumeids? a 32-bit number? 64-bit? UUID? Maybe an X509 identifier?
> Signed? How about a 4096-bit PGP public key.

Ok, let's go into the details.

Of course this varies depending on the service and on the choice
of the developer, which data is to be checked.

In current upstream Coda vice rpc2 the reference to a particular
server is the corresponding IP number; for update clients
the reference is also an IP number, for different services can be any
information which the client uses for the purpose of distinguishing a
service instance.

E.g. a server f.q.d.n would do, as long as the server knows that it is
to be contacted as this certain f.q.d.n.

IOW depending on the service the id can be represented by any byte
sequence, of "any" length, as appropriate for the service.

This does not matter for the RPC2 layer. Given a pointer and the data length
it can always hash the "id" data, which will be sufficient for comparison
of the client's and the server's idea of the server identity.

> I have given multiple concrete examples. I have exhaustively analyzed
> 'the system' to the point that I am convinced there is first of all no
> need for a server identifier. Second of all it would be a gross layering
> violation to stuff it into the rpc2 handshake. And finally...

I take you word about the protection existing in the upper layers.

The layering violation is a serious argument, but as long as the
"service instance id" data remains opaque to rpc2, does this constitute
any violation?

The specification of the rpc2 connection handshake would thus change
from "checking the shared secret of the peer" to "checking the shared
secret of the peer and of the server identification blob".

From my perspective this is not any more of a layering violation than
CN checking in TLS.

> > Do you feel this would be expensive or risky? What would be the downsides,
> > besides the corresponding API extension (adding a "server instance id"
> > argument)?

> Are you seriously still saying that after all that I wrote in my
> previous email?

Of course I was seriously looking for the answers.

Thanks for your comments Jan.

Regards,
Rune

Jan Harkes

unread,
May 6, 2016, 8:39:34 PM5/6/16
to
On Fri, May 06, 2016 at 08:59:15PM +0200, u-m...@aetey.se wrote:
> On Fri, May 06, 2016 at 11:48:57AM -0400, Jan Harkes wrote:
> > I would say that is a whole lot of levels of preventing to talk to the
> > wrong guy.
>
> Nice that this covers a whole lot of potential situations.
> Nevertheless, one example to the contrary:
>
> There is nothing preventing an update client to happily talk
> to a wrong update server instance if the ip is spoofed and "scm"
> data on some server hosts already are inconsistent
> (iow there is a possibility of error escalation or recovery prevention).

In that case I can put your mind at easy very easily.

- There is only one update server in a realm, so a client cannot
accidentally talk to the wrong server within the same realm.
- The update clients and update server use the 'update token' as a shared
secret to set up their connections.

So any accidential or malicious other update server that happens to take
over the same IP as the official update server will not possess the
shared secret to succesfully fool the update client. And if it does have
the right shared secret, a serverid isn't going to save anything here.

> > You are just handwaving about a 'server id' or 'instance id'. What is
> > this 'ID', a single 8 bit number what the Coda servers currently prepend
> > to volumeids? a 32-bit number? 64-bit? UUID? Maybe an X509 identifier?
> > Signed? How about a 4096-bit PGP public key.
...
> Of course this varies depending on the service and on the choice
> of the developer, which data is to be checked.
>
> In current upstream Coda vice rpc2 the reference to a particular
> server is the corresponding IP number; for update clients
> the reference is also an IP number, for different services can be any
> information which the client uses for the purpose of distinguishing a
> service instance.
>
> E.g. a server f.q.d.n would do, as long as the server knows that it is
> to be contacted as this certain f.q.d.n.
>
> IOW depending on the service the id can be represented by any byte
> sequence, of "any" length, as appropriate for the service.
>
> This does not matter for the RPC2 layer. Given a pointer and the data length
> it can always hash the "id" data, which will be sufficient for comparison
> of the client's and the server's idea of the server identity.
...
...
> From my perspective this is not any more of a layering violation than
> CN checking in TLS.

TLS actually checks the common name in TLS, your server ids are some
opaque blob passed up the application and are not checked by RPC2. So
they can just as easily be implemented by adding a single RPC call right
after connection setup doing something like a 'GetServerId', and then
taking action in the client based on the result.

This is actually quite close to what Coda clients do when they connect
to Coda servers, they send an RPC request to 'bind' the connection to a
specific volume, if the server claims it does not have the volume the
connection is closed.

> > > Do you feel this would be expensive or risky? What would be the downsides,
> > > besides the corresponding API extension (adding a "server instance id"
> > > argument)?
>
> > Are you seriously still saying that after all that I wrote in my
> > previous email?
>
> Of course I was seriously looking for the answers.

Ok here is a simple answer. The server-id would probably end up as part
of the INIT2 packet, which then turns any Coda server into a source for
an amplification attack. Someone can send INIT1 packets from a fake
origin, which then result in some very large INIT2 sent to the victim.

If you move it to the INIT4, by then the connection is already set up
and it can just as well get sent as the first RPC on the new connection.

Jan

u-m...@aetey.se

unread,
May 7, 2016, 11:37:36 AM5/7/16
to
On Fri, May 06, 2016 at 08:38:49PM -0400, Jan Harkes wrote:
> On Fri, May 06, 2016 at 08:59:15PM +0200, u-m...@aetey.se wrote:
> > Nevertheless, one example to the contrary:
> >
> > There is nothing preventing an update client to happily talk
> > to a wrong update server instance if the ip is spoofed and "scm"
> > data on some server hosts already are inconsistent
> > (iow there is a possibility of error escalation or recovery prevention).
>
> In that case I can put your mind at easy very easily.
>
> - There is only one update server in a realm, so a client cannot
> accidentally talk to the wrong server within the same realm.
> - The update clients and update server use the 'update token' as a shared
> secret to set up their connections.

> So any accidential or malicious other update server that happens to take
> over the same IP as the official update server will not possess the
> shared secret to succesfully fool the update client. And if it does have
> the right shared secret, a serverid isn't going to save anything here.

I am not worried about a man-in-the-middle who redirects traffic to a
server under his control, but about one who redirects traffic from one
service instance to another one.

In the example I am discussing a combination of several circumstances:
- corrupted consistency between different server instances so that
they do not agree on who is scm
- a malicious party who can modify ip-numbers in packets in transit

Then an update client can be fooled to connect to a wrong/stale
update server which for any reason wrongly believes that it is scm.
As a result updates will be picked from a wrong source and they
can happen to be stale or corrupted, and the situation will be
undetectable for this update client.

A shared secret does not protect against this kind of influence.

The scenario above is artificially constructed but this does not
mean it is impossible.

> > From my perspective this is not any more of a layering violation than
> > CN checking in TLS.
>
> TLS actually checks the common name in TLS, your server ids are some
> opaque blob passed up the application and are not checked by RPC2. So
> they can just as easily be implemented by adding a single RPC call right
> after connection setup doing something like a 'GetServerId', and then
> taking action in the client based on the result.

This looks like a clean and hardly disputable approach, applying the check
in the layer just above the rpc2 connection.

Indeed such a check is not automatically useful/necessary for every
protocol and thus rpc2 code might not be the most natural place for it.

(
If there are multiple protocols making use of such a check, it probably
should be "CheckServerId" (instead of Get...), to avoid to agree on a
certain form and size of "serverid" and also to avoid the need to pass
it over the wire - it can be long. A hash comparison with a generous
hash size would do.

The only downside I see is that this implies an additional rtt time
at connection establishement, when the protocol applies the check.

OTOH connection establishement is a heavy operation by itself and does
not happen too often. If the check is done as an RPC, the extra rtt for
the protocols which need this would be hardly noticeable.
)

Such an RPC would have a tiny impact on the code size, would not
postulate any modification of the existing code but allow for checking.
What can be better!

I agree otherwise that there is for the moment no practical impact
of _not_ doing such a check.

But when we make changes this can be useful for avoiding
the need of related exhaustive analysis of every case.

> This is actually quite close to what Coda clients do when they connect
> to Coda servers, they send an RPC request to 'bind' the connection to a
> specific volume, if the server claims it does not have the volume the
> connection is closed.

Given that this is a replica id (which of course can not be present on
another server), we are certainly in the clear.

> > > > Do you feel this would be expensive or risky? What would be the downsides,
> > > > besides the corresponding API extension (adding a "server instance id"

> Ok here is a simple answer. The server-id would probably end up as part
> of the INIT2 packet, which then turns any Coda server into a source for
> an amplification attack. Someone can send INIT1 packets from a fake
> origin, which then result in some very large INIT2 sent to the victim.

This would be bad indeed, but a change of the on-wire formats would be
probably not necessary.

> If you move it to the INIT4, by then the connection is already set up
> and it can just as well get sent as the first RPC on the new connection.

What I had in mind was mixing the data into the secret. This would
not change the size of any packet but ensure that the secret verification
fails as soon as there is a mismatch between what the client and the server
supply as the "id". Supplying an empty "id" would be even fully
compatible with peers running the current "instance-unaware" code.

The strengthening step would look to me like a suitable place for mixing
in the extra data.

This would generalize rpc2 to naturally support services
with multiple instances who share the authentication secret,
despite being distinct, from the upper layer protocol's viewpoint.

Coda servers are an example of such setup (they serve data which
differs between the servers, but share the authentication secret).
There are probably examples outside of Coda, too.

Given that Coda is the only known rpc2 user and that we do not see
apparent related vulnerabilities in its protocols, such a change is of
course less relevant.

Thanks Jan!

Rune

0 new messages