We've been talking to users and one thing that comes up is "how do I delete data from the device that I don't need anymore?" The short answer is
purge, but this is an area where we could use some more sugar. Here are my scattered thoughts on the general requirements around purge. I'm posting them here because I know there are some folks on this list who have more experience using purge in application code than me, so please do reply with anything you'd like to see enhanced.
There are a bunch of use cases where users could end up with more data on the device than they should. Think about a Yelp style thing where it'd sync documents for businesses near your current location, but you travel a lot. Without purge you'd still have documents in there from your last trip to London. Currently app developers can manually clean things up, but that's not fun. So there's a lot we can do to help the developer deal with device storage limits. Here are a few ideas, just to sketch a direction. There are open Github issues for some of these, others may be bad ideas... the goal here isn't to share implementation plans, it's to get feedback so we can see what would help you the most in real world apps.
We could keep an LRU and occasionally sweep documents we haven't read in forever.
We also need a mechanism for transparently loading documents (or attachments) from the cloud on request. So eg an app tries to load a doc from the local CBL, but it is not there. The developer has configured a cloud fallback option, so if we're online, CBL attempts to fetch the data from the cloud before returning anything to the client.
Once we have the ability to fallback to a sync gateway, we get more flexibility to purge data from the device. So we can have an autopurge feature that removes large attachments that haven't been viewed in a long time. And if the user requests it again, it will just reload from the cloud. This kind of reload is what I mean by a weak reference.
We also may want some kind of per-document metadata to prioritize sync and storage. Eg if you set doc._sync_priority = 0, it always gets synced before documents that have the default sync_priority, and if want a document to only sync over wifi by default, you could tag it with a higher number, maybe sync_priority = 10 would be the cell-data cut-off, and sync_priority > 100 would mean don't sync, just lazy load / weak reference.
I use 0 as the sync level with the most priority (and 1 as the default priority) because otherwise you get an arms race with everyone trying to be more important. Sync Priority 0 would be useful for foundational elements that are required to draw the initial UI. It would frequently be coupled with Purge Priority 0 (never purge this document) especially in cases where parts of an application's UI chrome are distributed via Sync.
Which brings me to Purge Priority, another kind of metadata we could track per document. A document with purge_priority = 0 will never be autopurged, so you'd tag foundational data with that. There could be ephemeral documents () where you can purge almost immediately, and a range of documents in the middle. For instance you'd rather purge comments on a business in the Yelp-style app, than purge the business address and phone number. You'd rather purge chat logs than contact lists, etc.
For a sensor app, you'd be able to purge documents as soon as you replicated them upstream. So we'd want a way to configure that.
So far I have been talking about special per-document fields that can configure sync behavior. But maybe it'd be better to do it per-channel... I won't sketch that here, but
Once we get autopurge/lazy load nailed down maybe the next horizon is using queries to do targeted sync. So maybe you fire a query to Couchbase Server and get a set of documents that match, along with a summary. And you can use the summary to display UI while you wait for the documents to sync.
As we head down this path we end up emphasizing the programming model and the low-latency advantage, over the offline capability...
I'm also curious how this sort of lazy load / weak reference stuff can play in a p2p scenario. Even further out there is using query-based sync in a p2p app.