Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Coda development

0 views
Skip to first unread message

u-m...@aetey.se

unread,
May 4, 2016, 5:47:07 AM5/4/16
to
Dear Satya and Jan,

Coda plays an important role in the computing infrastructure at Chalmers
University of Technology.

For this reason Chalmers allocates this year 6 months of developer time
specifically for Coda maintenance and development.

The goal is to ensure that Coda keeps its virtues in the long run
and remains useful and usable at Chalmers. We hope also to be able to
lift some limitations.

I would appreciate to hear your opinion on what are the most valuable
changes which ought to be done. It is important and beneficial to everyone
if CMU's and Chalmers' efforts are coordinated and most efficient.

Since about 10 years ago Chalmers runs on both clients and servers a Coda
version differing from upstream, with locally maintained patches. The
main reason is that we depend on transparent support for Kerberos
authentication. The upstream version contains Kerberos-related hooks
but they are unfortunately insufficient for actual deployment.

Until now we strictly kept our changes to be as small as possible and
fully compatible with upstream servers and clients, except for the
additional features.

On the other side, with time we have identified certain issues which
do not seem to be solvable by compatible changes. Unfortunately some of
these issues, if left unfixed, will hit us hard some day. Some of them
must be addressed now, before it is too late.

Probably the most apparent one is the limit on the key length in the
security layer. It is a hard one too, because the limitation is hardwired
in the current protocol.

Would you agree that we face a need for incompatible changes
in the protocol?

If yes, then probably we have a suitable occasion for lifting several/many
limitations at once and possibly for replacing some components.

For the good and for the bad, despite Coda's global nature, its use
is still "concentrated". The majority of the client computers access a
few Coda realms and most often happen to be under the same management
(or strong influence) as the servers of the corresponding realms.

This may allow for a coordinated switch between incompatible protocol
versions even if the clients or the servers can not be made to support
both protocol generations at the same time.

Satya and Jan, you have the best knowledge of the code and of the hard
spots, both in the functionality and in the implementation.

Would you outline your idea of a "Coda ten years later"?
How much of compatibility could the future version preserve?
Given half a man-year of development, how much of that vision would be
achievable and should be aimed for?

What do you think of the possible changes? Which changes are
- inevitable, to be able to keep Coda's virtues (like security)
- highly desirable
- desirable
and what would be the expected man-year cost for taking each step down
this stack?

Of course I appreciate if everyone on the list also tells his or her
opinion.

Such feedback will help me to properly arrange and use the development
resources in the planned effort at Chalmers.

Thanks for creating Coda, a wonderful and unique tool.

Regards,
Rune

Greg Troxel

unread,
May 4, 2016, 7:52:02 PM5/4/16
to

My quick not very well filtered reactions:

I am uncomfortable about the coda security scheme being a roll-your-own
thing. I would want to look at Kerberos or DTLS. I realize this is
harder than it sounds.

Whatever the next steps, IPv6 support is a big deal.

I think it's critical to have a FUSE client so that coda can run
anywhere that supports FUSE (the high-level interface, preferably). I
think it's perfectly ok if performance suffers a bit for this; working
at 25% speed is 90% of the utility function for me, if not 95%. And
with modern CPUs the performane issues are not going to be a big deal;
glusterfs on NetBSD (userspace FS, FUSE) is doing most of 1 Gb/s. I
think it's fine to have in-kernel implementations for those that
really care about speed, but it seems the number of supported
platforms has dwindled to Linux and NetBSD.

The last big thing is to make conflict repair more robust, so that
normal people rarely lose. It's quite good already, but the last
little bit is likely real work.

Coda's behavior of doing a store when one does
open-for-write/read/close is really awkward. Arguably programs should
not do that, but they do. So I think it's necessary to not store
then, even if that does result in calling in the read locks.
Alternatively, open-for-write can be open-for-read, and upgraded on
the first write, but I think just not storing is 90% of the win.


Note that I'm not using coda any more, despite having started in the 90s
and continued until 2015 some time. The two reasons are having to run
IPsec to be comfortable with security and lack of portability.
signature.asc

Jan Harkes

unread,
May 4, 2016, 8:41:14 PM5/4/16
to
On Wed, May 04, 2016 at 11:44:35AM +0200, u-m...@aetey.se wrote:
> Probably the most apparent one is the limit on the key length in the
> security layer. It is a hard one too, because the limitation is hardwired
> in the current protocol.

Can you point out where the key length is hardwired to a undesirable
length? Because I am not aware I did so and the only limitation at the
rpc2/secure layer is that AES does not go beyond 256-bit keys, which is
a very good keysize for a symmetric key encryption cypher. It is, as far
as I know, considered secure even at its minimal 128-bit key size.

https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/secure/README.secure#L123

It in fact will even use separate encryption and authentication keys if
enough key material is provided.

https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/secure/secure_setup.c#L47

So at the security layer there is no forced limit, it just depends on
the amount of key material provided during connection setup. So see that
we go one layer up and look at the new connection handshake which
results in a session key used for all the following packets in that
communication. The actual key size here is chosen by the server based on
the list of supported algorithms it just got from the client and sent
back on the second packet in the handshake.

https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/rpc2-src/rpc2a.c#L1498

So it picks the largest encryption keysize supported by both the client
and the server and then adds the size needed for the authentication key.
Because the info we got from the client was on the first packet of the
handshake it is not encrypted at that point, so an active attacker could
potentially try to force a downgrade attack to min_keysize, which is for
now still sufficient, but just in case it isn't sufficient there is the
'RPC2_Preferred_Keysize' configuration override which can be used to
prevent such downgrading, this can be set with the RPC2SEC_KEYSIZE
environment variable, we will still pick a large keysize when it is
possible.

https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/rpc2-src/rpc2b.c#L109

When we send the init2 response we clearly do not yet have a session key
because we just started negotiating it's parameters, so the init2 is
sent encrypted using a shared secret derived from the client identity
sent in the init1 packet. This is either a username (clog/auth2) and the
shared secret is looked up in the password database, you aren't even
using this bit because you are using kerberos. The other client identity
is the encrypted Coda token, which the client has a plaintext copy of
and the server is able to decrypt because it (or one of it's peers)
generated the original. For either of these two we have to go yet
another layer higher and end up at auth2/codatoken.c.

Now at this point the key exchange that Coda uses is using an old
RPC2_EncryptionKey to store the random bytes of the secret, and this one
is only 8 lousy bytes, 64-bits which is clearly sub-par and so bad it
doesn't even qualify for the min_keylen we need to get an encrypted
connection going to begin with. So the actual encryption key is derived
using a PBKDF, which runs a non-parallelizable operator 10000 times to
slow down the speed at which someone can iterate through possible keys.
(Password Based Key Derivation Functions are normally used for
passwords, which quite often have less than 64-bits of entropy).

Now you are concerned about the one 64-bit key that is used after
strenghtening using a PBKDF on a single packet that is normally sent
once at the beginning of the handshake. I actually am more in line with
Greg Troxel's thinking because all of this is 'homegrown crypto' and
although I have tried my very best to avoid trying to be smart and
creative by very closely following IPSec RFCs, aggressively limiting the
used encryption and authentication algorithms, basically we can't ever
hit an RC4 issue, or get downgraded to export ciphers and such.

I've also closely looked at CVE's for existing IPSec implementations and
checked if my implementation could be affected. The latest is that I
have started to introduce constant time comparisons in places, there
actually was only one in rpc2/secure there are probably some more where
we are checking passwords and Coda tokens. But nobody else is clearly
looking for vulnerabilities in such a little used implementation.

In the long run using TLS over TCP is a better solution, but a whole lot
more needs to change than the size of a variable before that is even
close to a workable solution.

Jan

Jan Harkes

unread,
May 4, 2016, 9:02:02 PM5/4/16
to
On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> I think it's critical to have a FUSE client so that coda can run
> anywhere that supports FUSE (the high-level interface, preferably). I
> think it's perfectly ok if performance suffers a bit for this; working
> at 25% speed is 90% of the utility function for me, if not 95%. And
> with modern CPUs the performane issues are not going to be a big deal;
> glusterfs on NetBSD (userspace FS, FUSE) is doing most of 1 Gb/s. I
> think it's fine to have in-kernel implementations for those that
> really care about speed, but it seems the number of supported
> platforms has dwindled to Linux and NetBSD.

Fuse would be nice, but its support is very on-off across platforms and
it will never be possible to extend the fuse api with a cross-platform
pioctl style interface, so pioctls would have to get implemented as some
sort of virtual filesystem (similar to /proc /sys) probably somewhere
under the same /coda root, we already use a hidden /coda/.CONTROL for
global pioctls, so maybe something like that could become a root
directory for a pioctl-fs where writing/reading virtual files would
replace the current set of ioctl operations and still maintain the
proper kernel-level tagging of user identity, which is more important
than ever with namespaces you cannot just connect over a tcp or unix
domain socket and prove you are in the same filesystem namespace as your
/coda mountpoint. Again, this is a big project.

> Coda's behavior of doing a store when one does
> open-for-write/read/close is really awkward. Arguably programs should
> not do that, but they do. So I think it's necessary to not store
> then, even if that does result in calling in the read locks.
> Alternatively, open-for-write can be open-for-read, and upgraded on
> the first write, but I think just not storing is 90% of the win.

This is both simple and expensive. We already are partly there because
of lookaside caching. We just need to make sure we keep around a valid
checksum of the last known data version for every cache file. So when a
file is closed after the open-for-write/read/close cycle and we have to
recompute the checksum to update it we can first check against the old
value and if is wasn't changed not send the store.

The only problem is that write-optimizations on the CML occur when a
file is opened for writing so that we do not send back data that will be
replaced soon anyway. So that fact needs to be tracked so we can still
force writeout in case a Store CML was cancelled during the open. Minor
detail probably not too hard, just need to make sure it isn't forgotten.

Jan

Jan Harkes

unread,
May 4, 2016, 9:08:11 PM5/4/16
to
On Wed, May 04, 2016 at 09:01:00PM -0400, Jan Harkes wrote:
> On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> > Coda's behavior of doing a store when one does
> > open-for-write/read/close is really awkward. Arguably programs should
> > not do that, but they do. So I think it's necessary to not store
> > then, even if that does result in calling in the read locks.
> > Alternatively, open-for-write can be open-for-read, and upgraded on
> > the first write, but I think just not storing is 90% of the win.
>
> This is both simple and expensive. We already are partly there because

Oh forgot to mention why this is expensive.

The Coda servers need an RVM layout change to persistently store these
checksums, right now they are computed on demand during getattr
operations and some subset is cached in memory to handle frequently
accessed or popular files. With large files, or a volume with many files
it actually sometimes causes timeouts when validateattrs runs.

Jan

Jan Harkes

unread,
May 4, 2016, 11:03:42 PM5/4/16
to
On Wed, May 04, 2016 at 11:44:35AM +0200, u-m...@aetey.se wrote:
> version differing from upstream, with locally maintained patches. The
> main reason is that we depend on transparent support for Kerberos
> authentication. The upstream version contains Kerberos-related hooks
> but they are unfortunately insufficient for actual deployment.

The upstream version was used as far as I know to integrate with the
kerberos realms of ANDREW.CMU.EDU and CS.CMU.EDU. But I have never seen
them in use that way, so I can't even tell you if it was using
kerberos4, kerberos5 or a mix of the two.

Either way if the upstream hooks are insufficient for actual deployment,
I guess they should be removed so that nobody has to bother trying to set
up something that won't work anyway. Or is your code dependent on these
insufficient hooks?

Jan

Jan Harkes

unread,
May 5, 2016, 12:23:25 AM5/5/16
to
On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> The last big thing is to make conflict repair more robust, so that
> normal people rarely lose. It's quite good already, but the last
> little bit is likely real work.

I've been thinking long and hard about this. I've pretty much been
trying to get repair working more reliably ever since I joined CMU.
We've had several undergraduate and master students do project work on
trying to make repair and application specific resolvers work reliably.

During the recent break from working on Coda, I've played around with
building applications using what you can call 'web-technology'. Simple
restful web apps, sharing state through sql/redis/nosql databases, even
handling things like push updates to client 'programs' consisting of the
javascript running in browsers. And there are a lot of ways to scale
with load balancing multiple webapps behind nginx or haproxy, sharding
data across databases etc.

Anyhow, it allowed me to take a step back and look at Coda from a
different perspective and the one thing that adds a lot of complexity
and ultimately causes every single server-server conflict is
replication. Now optimistic replication is totally awesome and Coda has
proven that it can be made to work well enough to have a usable system
for users that are not afraid to repair the occasional conflict.

But my wordprocessor doesn't want to deal with that conflict, neither
does my webserver, or many times me when I'm working on a deadline. So
lets look at the pros and cons of (optimistic) replication for a bit.

pro:
- Awesome concept to prove it can work, you can write papers about it.
- When every server but one crashes you can still continue working.
- When network routing messes up and different clients see a different
subset you can continue on both sides of the split.
- When something gets corrupted in a client's volume replica data
structure you can remove the replica and rebuild by walking the tree.

con:
- Somewhat tricky fallbacks when operations cross directories (rename),
or to handle how to deal with a resolution log when a directory is
removed, especially when it contains a mix of source and target of
rename operations. These are still the most common reasons for manual
repair failures, it is a real pain when you need to repair
mkdir foo ; touch foo/bar ; mv foo/bar . ; rmdir foo
- Extra metadata to track what we know of other replicas, version
vectors, store identifiers.
- Extra protocol messages (COP2) to distribute said knowledge.
- Special set of heuristics for most common replica differences, missed
COP2 by looking for identical storeids, missed update on one or more
replicas by looking at version vectors, runt resolution to rebuild a
replica, etc.
- Keep track of reintegration (operation) logs so we can merge directory
operations from diverged replicas.
- A protocol that goes through 6 phases for each conflicting object to
handle server resolution. This actually makes placing servers in
different locations not work very well.
- As a result, the need to have all servers basically co-located, so we
still can't handle datacenter outages or multiple site replication.
- Manual repair fallback when the heuristics and logs fail, which
requires very different steps for directories (fixing log operations)
and files (overwrite with new/preferred version) which isn't obvious
to the user when repair starts.
- Need to have the concept of a high level 'replicated volume' and a low
level 'volume replica' on the client.

I probably should stop here.

Now when there is no optimistic replication:
- we just need read-write and read-only volumes on clients which
probably could even be represented by a single 'readonly' flag on a
single volume data structure. Turning a readonly backup snapshot back
in to a read-write volume may become just toggling that flag.
- If a server crashes or otherwise becomes unreachable we still have
disconnected operation on a (hoarded) cache of files.
- We only have to deal with reintegration conflict, however unlike
server-server conflict they only block the local user who performed
the operations, and the server side implementation is pretty much (or
should be close to) the normal reintegation path. There is a also very
cheap fix if the user doesn't care about the local changes in the form
of a 'cfs purgeml'. A headless server automated conflict resolution
could be a cfs checkpointml/cfs purgeml/send reintegration log with
the conflict to the appropriate user.
- Reduce the size of a version vector to 2 values, a single data version
and the store identifier, on the wire it can be even more compact
because the client identifier part of the storeid does not change for
the lifetime of a client-server connection so it could be passed on
connection setup. This in return makes operations like ValidateAttrs
more efficient because we can pack a lot more results in the response.
- The server can lose functionality related to directory operation
logging, resolution, (server-server) repair.
- On the server side things like RAID can help deal with drive failures,
and everyone is making backups, right?
- More exciting, back the server data with something like Ceph's RADOS
object store which gives replication, self-healing and a lot of
goodies, and have Coda provide disconnected operation/hoarding/etc.
- As servers become simpler and store data in a replicated backend store
(rados/s3) and mostly deal with handling applying reintegration logs
and breaking callbacks they become much easier to scale out, clients
can be scattered across multiple frontends, grabbing a volume
lock/lease and callback breaks can be distributed between frontends
with publish/subscribe queues.

I probably should stop here, because it is getting quite a bit pie in
the sky and with the available manpower might just hit that magic 10
year window that Rune was talking about.

Jan

u-m...@aetey.se

unread,
May 5, 2016, 7:15:21 AM5/5/16
to
Hello Greg and Jan and thanks for your insights!

(answering to several messages at once)

-------------------------------------------------------
On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> I am uncomfortable about the coda security scheme being a roll-your-own
> thing. I would want to look at Kerberos or DTLS. I realize this is
> harder than it sounds.

Besides the token size, the security does not look bad.

It is also an advantage, to be able to use Coda without a Kerberos
infrastructure.

> Whatever the next steps, IPv6 support is a big deal.

I agree. OTOH this is not a showstopper yet.

> I think it's critical to have a FUSE client so that coda can run
> anywhere that supports FUSE (the high-level interface, preferably). I

As Jan commented and I agree, FUSE is unfortunately hardly viable.

A more general alternative would be not going through the kernel
at all (like Christer Bernérus did in ULOCoda) - which unfortunately
has its own limitations.
There is another fully user space technology which could work well (use
of Proot) but it exists only for Linux. Given the presence of the linux
Coda kernel module we can as well continue to use the module.

> think it's perfectly ok if performance suffers a bit for this; working
> at 25% speed is 90% of the utility function for me, if not 95%.

Sure, for portability I would certainly accept this.

> The last big thing is to make conflict repair more robust, so that
> normal people rarely lose. It's quite good already, but the last
> little bit is likely real work.

+1

> Coda's behavior of doing a store when one does
> open-for-write/read/close is really awkward. Arguably programs should
> not do that, but they do.

The cost of detecting writes is the need of intercepting them. May be
the cost is not too high if the interception can be shortcut at the first
write and if we manage to put such a change into the upstream kernel(s).
Kernel changes are hard though.

> Note that I'm not using coda any more,

That's a loss for Coda, really.

> despite having started in the 90s
> and continued until 2015 some time. The two reasons are having to run
> IPsec to be comfortable with security and lack of portability.

I hope you could find Coda attractive again when we improve security
and possibly portability too. Which level of portability is crucial for
you? To open platforms or to closed ones?

-------------------------------------------------------
On Wed, May 04, 2016 at 08:40:32PM -0400, Jan Harkes wrote:
> On Wed, May 04, 2016 at 11:44:35AM +0200, u-m...@aetey.se wrote:
> > Probably the most apparent one is the limit on the key length in the
> > security layer. It is a hard one too, because the limitation is hardwired
> > in the current protocol.

> Can you point out where the key length is hardwired to a undesirable
> length?

[BTW thanks for the illustrated analysis of the handshake]

> This is either a username (clog/auth2) and the
> shared secret is looked up in the password database, you aren't even
> using this bit because you are using kerberos.

We are using multiple authentication authorities at the same time,
including Kerberos realms _and_ the Coda authentication database.
The security of Coda password authentication is relevant.

> The other client identity
> is the encrypted Coda token, which the client has a plaintext copy of
> and the server is able to decrypt because it (or one of it's peers)
> generated the original.

There is the secret length limitation in the token, which has the same
implications as the key size (I think it was you who pointed out this
matter once upon a time). The key in the database is also short
and is being used without strengthening, for backward compatibility.

> Now at this point the key exchange that Coda uses is using an old
> RPC2_EncryptionKey to store the random bytes of the secret, and this one
> is only 8 lousy bytes

These limitations aside (which are not in your code), you did indeed a
great job when you constructed a respectable security layer, compatible
with the old protocol.

But those 8-byte limitations, they become too bad now.

There is another potential weakness in the way Coda authentication is
being used. When clients talk to servers or servers connect to each
other, they verify that the other party belongs to the correct realm,
but this might happen to be a different server in the same realm. I guess
mixing the server id into the handshake would eliminate this uncertainty.

If we are going to touch the crypto-related stuff, there is a library
I have special respect for, [Tweet]NaCl.

Do you think we could rely on it in Coda?

-------------------------------------------------------
On Thu, May 05, 2016 at 12:22:09AM -0400, Jan Harkes wrote:
> On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> > The last big thing is to make conflict repair more robust, so that
> > normal people rarely lose. It's quite good already, but the last
> > little bit is likely real work.
>
> I've been thinking long and hard about this. I've pretty much been
> trying to get repair working more reliably ever since I joined CMU.
...
> lets look at the pros and cons of (optimistic) replication for a bit.

I see that you are sceptical about the optimistic replication (no
pun intended), you mentioned this also when we talked earlier long ago.

From my perspective unfortunately it feels differently. I see optimistic
replication as one of the crucially useful features in Coda. It allows
a different and convenient system architecture (based on Coda)
and corresponding system administration.

> Now when there is no optimistic replication:
...

This looks like the way AFS took. Surely this does bring certain
advantages but buys them by losing other ones. Optimistic replication
was one of the reasons we chose Coda before OpenAFS. This was not the
only reason but an important one.

> with the available manpower might just hit that magic 10
> year window that Rune was talking about.

This means that the repair improvements may have to wait, we
can live with the status quo.

Some other mentioned issues will soon become hard to live with,
they have to get the manpower in the first hand. Sigh.

Thanks again for the comments!

Regards,
Rune

Jan Harkes

unread,
May 5, 2016, 9:18:59 AM5/5/16
to
On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
> On Wed, May 04, 2016 at 08:40:32PM -0400, Jan Harkes wrote:
> > On Wed, May 04, 2016 at 11:44:35AM +0200, u-m...@aetey.se wrote:
> > > Probably the most apparent one is the limit on the key length in the
> > > security layer. It is a hard one too, because the limitation is hardwired
> > > in the current protocol.
>
> > Can you point out where the key length is hardwired to a undesirable
> > length?
>
> [BTW thanks for the illustrated analysis of the handshake]
>
> > This is either a username (clog/auth2) and the
> > shared secret is looked up in the password database, you aren't even
> > using this bit because you are using kerberos.
>
> We are using multiple authentication authorities at the same time,
> including Kerberos realms _and_ the Coda authentication database.
> The security of Coda password authentication is relevant.
...
> matter once upon a time). The key in the database is also short
> and is being used without strengthening, for backward compatibility.

No, it is using the exact same strenghtening as the Coda token because
it happens under the covers during the RPC2 new connection handshake.

https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/rpc2-src/rpc2a.c#L233

Jan

Jan Harkes

unread,
May 5, 2016, 9:25:49 AM5/5/16
to
On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
> On Thu, May 05, 2016 at 12:22:09AM -0400, Jan Harkes wrote:
> > Now when there is no optimistic replication:
> ...
>
> This looks like the way AFS took. Surely this does bring certain
> advantages but buys them by losing other ones. Optimistic replication
> was one of the reasons we chose Coda before OpenAFS. This was not the
> only reason but an important one.

You are being quite unfair to Coda, it brings a whole lot more than just
optimistic replication. We still have,

- Disconnected operation with log optimizations.
- Whole file caching and hoarding!
- Atomic (server) operations without requiring additional file locking.
- Simpler more maintainable in-kernel module that is actually part of
Linux and various other OS kernels.
- 99% of the complexity is in a userspace binary that is using a pretty
much identical codebase across all supported platforms.

Jan

Greg Troxel

unread,
May 5, 2016, 10:21:39 AM5/5/16
to

Jan Harkes <jaha...@cs.cmu.edu> writes:

> On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
>
>> Coda's behavior of doing a store when one does
>> open-for-write/read/close is really awkward. Arguably programs should
>> not do that, but they do. So I think it's necessary to not store
>> then, even if that does result in calling in the read locks.
>> Alternatively, open-for-write can be open-for-read, and upgraded on
>> the first write, but I think just not storing is 90% of the win.
>
> This is both simple and expensive. We already are partly there because
> of lookaside caching. We just need to make sure we keep around a valid
> checksum of the last known data version for every cache file. So when a
> file is closed after the open-for-write/read/close cycle and we have to
> recompute the checksum to update it we can first check against the old
> value and if is wasn't changed not send the store.

I think it would be good to step back and think about the real
requirements. Coda seems to have adopted the notion that all file
accesses must be short-circuited in the kernel to container files as a
hard requriement. Probably that made sense in 1997, but I don't think
it does now.

It seems that such checksums could all be done locally. And, it seems
possible to just know if a write happens, either because vnops are
funneled through venus, or because the kernel interface is extended to
note that somehow.

> The only problem is that write-optimizations on the CML occur when a
> file is opened for writing so that we do not send back data that will be
> replaced soon anyway. So that fact needs to be tracked so we can still
> force writeout in case a Store CML was cancelled during the open. Minor
> detail probably not too hard, just need to make sure it isn't forgotten.

I suspect this is tricky. But I was getting lots of conflicts because
of this, when read operations were turned into writes. I worked around
it by adjusting software to be more careful about not opening for write
unless they intended to write.
signature.asc

Greg Troxel

unread,
May 5, 2016, 10:37:17 AM5/5/16
to

Jan Harkes <jaha...@cs.cmu.edu> writes:

> On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
>> I think it's critical to have a FUSE client so that coda can run
>> anywhere that supports FUSE (the high-level interface, preferably). I
>> think it's perfectly ok if performance suffers a bit for this; working
>> at 25% speed is 90% of the utility function for me, if not 95%. And
>> with modern CPUs the performane issues are not going to be a big deal;
>> glusterfs on NetBSD (userspace FS, FUSE) is doing most of 1 Gb/s. I
>> think it's fine to have in-kernel implementations for those that
>> really care about speed, but it seems the number of supported
>> platforms has dwindled to Linux and NetBSD.
>
> Fuse would be nice, but its support is very on-off across platforms and
> it will never be possible to extend the fuse api with a cross-platform
> pioctl style interface, so pioctls would have to get implemented as some
> sort of virtual filesystem (similar to /proc /sys) probably somewhere
> under the same /coda root, we already use a hidden /coda/.CONTROL for
> global pioctls, so maybe something like that could become a root
> directory for a pioctl-fs where writing/reading virtual files would
> replace the current set of ioctl operations and still maintain the
> proper kernel-level tagging of user identity, which is more important
> than ever with namespaces you cannot just connect over a tcp or unix
> domain socket and prove you are in the same filesystem namespace as your
> /coda mountpoint. Again, this is a big project.

My understanding is that FUSE works fine on Linux, FreeBSD, NetBSD,
OpenBSD, OS X, OpenSolaris, and Android. I realize that leaves out
Windows. And it leaves out iOS, but Apple's policies will keep Coda out
of the iOS App Store anyway. What platforms do you think it doesn't
work on (besides windows and iOS)? And for any of those, why is making
FUSE work harder than making a coda kernel module work?

Most filesystems have a fuse implementation, and people have figured out
how to make those work. Yes, pioctl could be some sort of magic path.
Or perhaps a plain ioctl with private codepoints can be passed through
to venus. I think this is just something to be figured out, not a
reason why it can't be done. I see what you mean about providing
identity, but one could always have the user program obtain a key or
auth token via a magic path and use that to authenticate a user/venus
channel. But magic paths seem like an ok solution.

Agreed that it's big to do this. But the other side of the coin is
implementing/porting and maintaining kernel modules for a very large
number of systems.

For me, if I can't run coda on all the systems I use, then it just
doesn't work. So I tried out unison, and I am now using Syncthing
instead. My take on requirements for coda is that being able to run it
pretty much everywhere (except Windows) is the biggest requirement, and
that security and reliability come next, and performance (assuming it's
>= 25% of native for things in the cache) is last.




signature.asc

Greg Troxel

unread,
May 5, 2016, 10:40:43 AM5/5/16
to

Jan Harkes <jaha...@cs.cmu.edu> writes:

> Anyhow, it allowed me to take a step back and look at Coda from a
> different perspective and the one thing that adds a lot of complexity
> and ultimately causes every single server-server conflict is
> replication. Now optimistic replication is totally awesome and Coda has
> proven that it can be made to work well enough to have a usable system
> for users that are not afraid to repair the occasional conflict.

The notion that server replication is causing more problems than it
solves sounds sensible. Especially if there is some way to have the
server storage replicated and a way to turn on a failover server using
the backend data.

I have only ever had one server. So my repair issues have not even been
this; I think they are just the many little bugs that have mostly gotten
fixed.
signature.asc

Greg Troxel

unread,
May 5, 2016, 10:50:07 AM5/5/16
to

u-m...@aetey.se writes:

> On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
>> I am uncomfortable about the coda security scheme being a roll-your-own
>> thing. I would want to look at Kerberos or DTLS. I realize this is
>> harder than it sounds.
>
> Besides the token size, the security does not look bad.

Last I looked, there was the possibility of some fs data to travel
unencrypted if it was not associated with a logged-in user. This is in
my view totally not ok.

> It is also an advantage, to be able to use Coda without a Kerberos
> infrastructure.

Sort of - I see it as Coda reimplementing something like Kerberos, so
you have to set that up instead.

>> Whatever the next steps, IPv6 support is a big deal.
>
> I agree. OTOH this is not a showstopper yet.

It was a bigger deal for me, because I was using IPsec to avoid what I
perceived as security issues, and then I had NAT traversal issues. But
I think we all agree it will become a bigger deal.

> As Jan commented and I agree, FUSE is unfortunately hardly viable.

I really do not understand how you can dismiss it as unviable. It seems
like Small Matter of Programming (not saying it's actually small) to
make venus talk to FUSE intead of the kernel module, to have venus
implement all the container file read/write, and to use magic paths for
control. Do you really think that wouldn't work?

> A more general alternative would be not going through the kernel
> at all (like Christer Bernérus did in ULOCoda) - which unfortunately
> has its own limitations.

That just seems sort of like doing FUSE in libc instead of via FUSE.
It's a cool hack, but it expects too much of users (to do different
things, rather than running any program that works, even if it is an old
statically linked Linux binary with an old libc running on NetBSD under
emulation.

> There is another fully user space technology which could work well (use
> of Proot) but it exists only for Linux. Given the presence of the linux
> Coda kernel module we can as well continue to use the module.

Sure - the point of FUSE is that it's quite portable. A Linux-only
solution doesn't really help.

>> think it's perfectly ok if performance suffers a bit for this; working
>> at 25% speed is 90% of the utility function for me, if not 95%.
>
> Sure, for portability I would certainly accept this.

Glad to hear it; there seems to be dogma about native-speed container
ops. As I mentioned, glusterfs is running at nearly full GbE speed, so
it's unlikely to be a real problem.

>> despite having started in the 90s
>> and continued until 2015 some time. The two reasons are having to run
>> IPsec to be comfortable with security and lack of portability.
>
> I hope you could find Coda attractive again when we improve security
> and possibly portability too. Which level of portability is crucial for
> you? To open platforms or to closed ones?

For me, what matters today is *BSD and OS X (several open and one
closed), and Android (which is sort of open and sort of closed,
depending on which version you run). I haven't wanted to run it on
Linux or Illumos, but I think it's important that it work there.


Does Coda work on anything other than Linux and NetBSD today?
signature.asc

Jan Harkes

unread,
May 5, 2016, 11:09:26 AM5/5/16
to
On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
>
> As Jan commented and I agree, FUSE is unfortunately hardly viable.

I never said it was hardly viable, just that the pioctls would have to
be handled like a a virtual FS instead of using an ioctl interface.

Jan

(starting to use list-reply instead of reply-all because all cc'd
parties are as far as I know subscribed to codalist already)

Jan Harkes

unread,
May 5, 2016, 11:21:58 AM5/5/16
to
On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
>
> But those 8-byte limitations, they become too bad now.

~/coda$ git grep RPC2_EncryptionKey | wc -l
144

That is just the places where there is a reference to the rpc2
encryption key in the source typically either as a variable definition
or as function argument. There are more places because they are also
part of other structures such as the secret and clear parts of a Coda
token. (and maybe even more)

# filtering out EncryptedSecretToken references
~/coda$ git grep [^d]SecretToken | wc -l
32
~/coda$ git grep ClearToken | wc -l
58

Now these are just places where variable or function arguments are
defined, these then lead to places where they are used and each place
needs to be checked to make sure it can safely adapt to a different
size. And you are unlikely to use variable length because this ends up
in persistent RVM memory structures and on the wire rpc messages, which
results in incompatibility between clients and servers as well as
reintializing clients, hopefully no server side rvm reinitialization.

That is a lot of changes needed, IMHO not worth immediate action at the
moment when there are clearly questions about the home-grown-edness of
the crypto implementation, and if it adequately covers all places where
file data is exposed.

Jan

Jan Harkes

unread,
May 5, 2016, 11:26:20 AM5/5/16
to
On Thu, May 05, 2016 at 01:13:53PM +0200, u-m...@aetey.se wrote:
> There is another potential weakness in the way Coda authentication is
> being used. When clients talk to servers or servers connect to each
> other, they verify that the other party belongs to the correct realm,
> but this might happen to be a different server in the same realm. I guess
> mixing the server id into the handshake would eliminate this uncertainty.

Eh? Server ids should not be exposed like that to begin with.

Aside from that a client isn't trying to connect to a server, it is
trying to bind to a volume. If you get connected to the the wrong server
(how in the world is that even a thing that would 'happen'?) it wouldn't
be able to bind to the volume anyway and so the end result is the same
without needing to put serverids in the handshake.

A client should have no need to know a server id, ever.

Jan

Jan Harkes

unread,
May 5, 2016, 11:42:30 AM5/5/16
to
On Thu, May 05, 2016 at 10:20:30AM -0400, Greg Troxel wrote:
> Jan Harkes <jaha...@cs.cmu.edu> writes:
> > On Wed, May 04, 2016 at 07:43:46PM -0400, Greg Troxel wrote:
> >
> >> Coda's behavior of doing a store when one does
> >> open-for-write/read/close is really awkward. Arguably programs should
> >> not do that, but they do. So I think it's necessary to not store
> >> then, even if that does result in calling in the read locks.
> >> Alternatively, open-for-write can be open-for-read, and upgraded on
> >> the first write, but I think just not storing is 90% of the win.
> >
> > This is both simple and expensive. We already are partly there because
> > of lookaside caching. We just need to make sure we keep around a valid
> > checksum of the last known data version for every cache file. So when a
> > file is closed after the open-for-write/read/close cycle and we have to
> > recompute the checksum to update it we can first check against the old
> > value and if is wasn't changed not send the store.
>
> I think it would be good to step back and think about the real
> requirements. Coda seems to have adopted the notion that all file
> accesses must be short-circuited in the kernel to container files as a
> hard requriement. Probably that made sense in 1997, but I don't think
> it does now.

Kernel changes are hard, even if a patch for this gets into Linux, it
will be months to years before it propagates all the way down to stable
and LTS distributions because they only backport bugfixes to old kernels.

So we would have to keep our out out-of-the-kernel buildable linux
kernel module updated which is actually much harder than it seems
because it needs to account for possible kernel API changes across
different kernel versions so that it can build against any arbitrary old
kernel a distribution might carry.

I'm sure it can be done, but adding the realm part to the file
identifiers showed me how hard it was and that was before there was
secure boot where some distro kernels only load properly signed modules...

> It seems that such checksums could all be done locally.

Yes, if the server returns a NULL checksum, the client could do the
checksum right after it fetches the file data from the server.

Jan

Jan Harkes

unread,
May 5, 2016, 12:11:08 PM5/5/16
to
On Thu, May 05, 2016 at 10:49:19AM -0400, Greg Troxel wrote:
> Last I looked, there was the possibility of some fs data to travel
> unencrypted if it was not associated with a logged-in user. This is in
> my view totally not ok.

It is encrypted but there is no shared secret between the client and the
server during the connection setup handshake, so the session key is
encrypted with a commonly known 'null key'. If you capture the INIT2
packet from the server to the client you can trivially decrypt it and
get the session key.

But.. why would anybody go through that amount of trouble if he can
connect to the server without authentication himself and get those same
files anyway? Clearly their ACL must allow System:AnyUser access,
otherwise the user would have had to be logged-in.

Jan

Greg Troxel

unread,
May 5, 2016, 1:00:49 PM5/5/16
to
Perhaps. But my security model involves the notion of limiting access
entirely to an authorized set, and I'd like that to be super clear.
Perhaps that a coda config setting that denies all unauthenticated
access.
signature.asc

Jan Harkes

unread,
May 5, 2016, 1:43:24 PM5/5/16
to
On Thu, May 05, 2016 at 10:36:30AM -0400, Greg Troxel wrote:
> Jan Harkes <jaha...@cs.cmu.edu> writes:
> > Fuse would be nice, but its support is very on-off across platforms and
>
> My understanding is that FUSE works fine on Linux, FreeBSD, NetBSD,
> OpenBSD, OS X, OpenSolaris, and Android. I realize that leaves out
> Windows. And it leaves out iOS, but Apple's policies will keep Coda out

When I last looked there were several windows fuse implementations, and
in the back of the mind I recall the OS X variant had broken because of
some kernel change.

But there were two ways to use fuse, and one, the high level api using
libfuse, which was the most portable across platforms used a separate
thread to handle each request which doesn't mesh well with Coda's LWP
threading and the low-level api was either not available for every
platform or needed platform specific tweaks, details are unclear.

I think fuse could be a cool alternative to the Coda kernel module and
probably is seeing more use and maintenance so over time it could become
the main api as long as pioctls and other coda-kernel-module specific
things (if there are any others) are dealt with.

The individual read/write accesses used to be an issue when systems were
single core and context switches were expensive. Each system call would
require saving the page table state for the user's process, then context
switching to the venus process, handling the IO, and context switching
back. And something like a write would involve the original data copy in
the app (1), copied to the kernel (2), copied in-kernel passing on the
upcall message queue (3), copied to venus (4), copied back to the kernel
for write out to the container file (5), actual copy to disk (6?).

Things have improved in modern kernels, cpu caches are larger, copies
are more efficient, context switch overhead is much improved, there is
zero-copy IO, we have multiple cores so both the app and venus can keep
running at the same time and available memory is measured in the
gigabytes instead of megabytes. We can push gigabytes per second as
individual reads or writes through a fuse filesystem, although having a
well behaved application using page-sized/page-aligned IO probably helps.

> reason why it can't be done. I see what you mean about providing
> identity, but one could always have the user program obtain a key or
> auth token via a magic path and use that to authenticate a user/venus
> channel. But magic paths seem like an ok solution.

That is basically how clog passes the obtained Coda token to venus,
using a pioctl. Or did you mean the other way around where we could pull
the Coda (or some special one-time use) token back from venus and then
use that to authenticate that user over a unix domain or tcp (https?)
connection.

> Agreed that it's big to do this. But the other side of the coin is
> implementing/porting and maintaining kernel modules for a very large
> number of systems.

I agree.

> For me, if I can't run coda on all the systems I use, then it just
> doesn't work. So I tried out unison, and I am now using Syncthing
> instead. My take on requirements for coda is that being able to run it

Both unison and syncthing try to get all clients to store a complete
copy of all the data. I guess it is like Coda without the System:AnyUser
acl and an agressive hoard setup that always tries to cache everything,
never actually tried to use Coda that way. Ofcourse syncthing chunks up
a file in 128KB blocks and only sends modified ones so it will be more
efficient at propagating updates if only parts of a file change.

Jan

Jan Harkes

unread,
May 5, 2016, 2:10:29 PM5/5/16
to
On Thu, May 05, 2016 at 12:59:37PM -0400, Greg Troxel wrote:
> Jan Harkes <jaha...@cs.cmu.edu> writes:
Well, right now we set 2 default ACLs when the root directory of a new
volume is created. "System:Administrators all System:AnyUser rl", these
are hardcoded and normally changed right after the volume is created to
allow the designated user of a volume access at which point I normally
set "System:AnyUser none". Leaving admin access is useful for helping
with the inevitable server-server conflict repair :)

For what you propose, Coda would need to introduce something like a new
System:AuthenticatedUser group that can be used instead of AnyUser. Or
maybe an even more flexible 'this is the default acl that should be set
when creating a new volume' setting. It is actually hard to do this
right at the createvol_rep scripting level because setting acls requires
access to the volume through /coda, but right after creation the volume
isn't mounted anywhere, and the VRDB/VLDB databases may not even be
synced to all servers yet so even if we force a temporary mount the
mountpoint may not resolve right away.

Jan

Jan Harkes

unread,
May 5, 2016, 2:17:04 PM5/5/16
to
On Thu, May 05, 2016 at 02:09:36PM -0400, Jan Harkes wrote:
> when creating a new volume' setting. It is actually hard to do this
> right at the createvol_rep scripting level because setting acls requires
> access to the volume through /coda, but right after creation the volume
> isn't mounted anywhere, and the VRDB/VLDB databases may not even be
> synced to all servers yet so even if we force a temporary mount the
> mountpoint may not resolve right away.

Aw, now I remember why we used to need the System:AnyUser ACL on the
root of a new volume. Before realms, the /coda mountpoint would be the
root directory of the first created volume. But to authenticate we
clog needed access to /coda/.CONTROL, which was not possible without
AnyUser access for unauthenticated users. So there was a bootstrapping
chicken and egg issue when we didn't set that ACL by default.

But because of realms, we don't have to care anymore because /coda is a
directory invented by venus to show realm mountpoints that will allow
access even to unauthenticated users.

So we can safely remove the System:AnyUser default acl when creating a
new volume root because the admin can always set it when he creates the
new mount point.

Jan

u-m...@aetey.se

unread,
May 6, 2016, 1:50:35 AM5/6/16
to
On Thu, May 05, 2016 at 09:17:50AM -0400, Jan Harkes wrote:
> > The key in the database is also short
> > and is being used without strengthening, for backward compatibility.
>
> No, it is using the exact same strenghtening as the Coda token because
> it happens under the covers during the RPC2 new connection handshake.
>
> https://github.com/cmusatyalab/coda/blob/master/lib-src/rpc2/rpc2-src/rpc2a.c#L233

Indeed it is rehashed, nice. Thanks for pointing this out.
Still, this is 8 bytes.

Rune

0 new messages