PouchDB over Bluetooth? Or local peer to peer replication with PouchDB.

337 views
Skip to first unread message

Yaron Goland

unread,
Sep 9, 2015, 7:50:21 PM9/9/15
to pou...@googlegroups.com

The Thali project is an open source effort to create a peer to peer web. [1] Right now we are focused on ‘local’ peer to peer which means two or more people are in the same room, within radio range of their devices (Android to Android or iOS to iOS) but have no formal Internet. We just made our first pre-alpha milestone – we call it story 0 (see [2]) and it does something I thought this group would at least find amusing.

 

Our architecture is that we use Cordova to wrap a Thali plugin which uses a JXcore Cordova plugin that runs full node.js in the background on both Android and iOS. On top of that node.js instance we run PouchDB. But because we are doing “local” P2P with no formal Internet we actually move data using Bluetooth on Android and a combination of Bluetooth and Wi-Fi (via the Multi-peer connectivity framework) on iOS. Neither of those stacks supports TCP/IP.

 

To work around this we have our native code expose a TCP/IP front end over the native radio stacks and then we have the Node.js code talk to that TCP/IP front end. But for various reasons it turns out that we can really only support a single TCP/IP connection at a time at the native layer and let’s face it, PouchDB really wants to open more than one HTTP connection which requires more than one TCP/IP connection.

 

To fix that we built (using https://www.npmjs.com/package/multiplex) a TCP/IP multiplexer in Node. It allows PouchDB to open as many HTTP connections as it wants, we then mux all those TCP/IP connections onto a single TCP/IP connection which then talks to the native layer which then streams it over Bluetooth/Multi-Peer Connectivity where it is received on the other side, streamed back into a single TCP/IP connection which is then de-muxed into multiple TCP/IP connections which then connect to the remote PouchDB server.

 

All of this just so we can run PouchDB over Bluetooth. J

 

If you want to play with the code I would suggest reading [2] or if you are brave (and have either two iOS devices or two Android devices) you can head over to [3] and play with the sample app.

 

               Hope you at least found this amusing,

 

                              Thanks,

 

                                             Yaron

 

[1] http://thaliproject.org/

[2] http://www.goland.org/story0/

[3] https://github.com/thaliproject/postcardapp

Nolan Lawson

unread,
Sep 11, 2015, 11:47:12 AM9/11/15
to PouchDB, yar...@microsoft.com
Yaron,

I think your work is amazing and I am beyond ecstatic that y'all are using PouchDB to move in this direction.

Pouch/Couch were designed from the ground up to work well as a decentralized database. It's been an aspiration of CouchDB's since the early days - you can see hints of it in books like the O'Reilly "CouchDB: the Definitive Guide," published way back in 2010. CouchDB was never meant to live only as an Erlang database.

Your team is the first to take CouchDB's p2p dreams and make it a reality. Congratulations. :)

- Nolan

Yaron Goland

unread,
Sep 11, 2015, 2:10:48 PM9/11/15
to Nolan Lawson, PouchDB

Our goal is to make the dream real! But there are serious additions we have to make to PouchDB in order to realize this dream. One of the big ones is ACL support. This is actually really, really tough. It’s not just a question of having ACLs, that isn’t too hard, it’s how do the ACLs get defined and updated? This leads to all sorts of really strange scenarios.

 

I’ll take a really simple one. Imagine we have a discussion thread and it consists of a record that declares the existence of the thread and defines who should be able to see the thread. We then have a bunch of posts in the thread.

 

Now imagine that user A syncs with user B. It turns out that user B has a thread that user A doesn’t have permission to see. So during the sync user B’s PouchDB instance uses the ACL functionality to figure out that user A shouldn’t see those entries and doesn’t list them in the changes feed. User A successfully syncs what it is allowed to sync and records in user B’s _local the sequence ID that User A sync’d up to.

 

Now imagine that at some point later the permissions on the thread are changed (e.g. there is a new revision of the record that declares the existence of the thread and defines who can see it) and user A is added.

 

So now, in theory, if user A were to ask user B for the entries related to the thread then the ACLs would allow it. But user A wouldn’t even know to ask. All future synchs between A and B would start with the last sync A successfully got up to which is a sequence ID that is past those older discussion entries that A now has retroactive permission to see.

 

So we have to have logic that recognizes this situation and essentially nukes user A’s _local entry with the sequence ID so that next time user A syncs they will do a full re-synch from scratch and thus see those entries.

 

Then there are about 10,000 different kinds of race conditions. For example, imagine user A is synching with user B and as part of the sync user A does a pull replication and receives records that are part of a thread that user A doesn’t recognize because it hasn’t yet received the record that declares the threads existence. In that case the ACLs will reject the thread entries because there is no permission for user B to create them. In this case the real problem is that we pull down multiple different records in parallel and depending on the ordering we could get entries in a thread before we get the record that declares the thread. To deal with this scenario we have to make sure that we only process ACLs on records in strict linear sequence ID order as seen from the remote party (with the assumption that whomever created the thread guarantees that they will always record the record that declares the thread before creating records of entries in the thread). But obviously we absolutely must allow for parallel record download or our perf will be horrible.  So this then begs questions like: are records in a bucket guaranteed to be entered in the local DB’s sequence log in the same sequence they were on the remote database? Or can records downloaded in the same bucket be written in any particular order? If the former then we can probably just use the changes feed to know when to process a record. If the later then we’ll have to have a hook in the replication engine to let us know when a bucket has been downloaded and then figure out what the original order of the records in the bucket were so we process them for ACL purposes in the right ordering.

 

And so on and so forth. We have similar issues with dealing with quota management as well as local notifications (e.g. user A gets a record from B that user C is allowed to see so now user A has to advertise that it has data for user C).

 

So making the P2P vision come true is going to require a lot of very hard work. We’ll be coming to this mailing list pretty frequently to ask for advice. We obviously don’t intend to put any of this functionality as core parts of PouchDB. Rather we will submit PRs for extension points (such as the bucket issue I described above) that will allow us to write PouchDB modules that can then layer on this functionality.

 

But we’ll need help to make sure we don’t make a mess of it.

 

               Thanks,

 

                              Yaron

Reply all
Reply to author
Forward
0 new messages