PouchDB Direction

358 views
Skip to first unread message

Dale Harvey

unread,
Oct 18, 2015, 9:14:54 AM10/18/15
to pou...@googlegroups.com
I came to realise the other day that PouchDB in its (my) original vision is pretty much complete, it aimed to be a virtual replica of the CouchDB interface that worked in the browser and for a while now that has been (almost) true.

So after the 5.0.0 release, I figure now was a good time to have a discussion about where PouchDB is and where it is heading. We all have our own ideas about what should be done and I think its a good idea to write them down and share them so hopefully they can be done in a compatible way.

So here are some of my goals / ideas.

1. Implement purge - I kept saying "almost" finished, I think this is the last feature to "finish" PouchDB, its tricky to get right, but important for mobile users. I would like to get some working version of purge done without a groundup rewrite.

2. Close all bugs - I have been categorising things as 'bugs' and 'enhancements', I would really like to get to the point where we are releasing PouchDB with no known bugs. (defining what exact is a pouchdb bug is not going to be a perfect process as far as I can tell)

3. Cleanup - With PouchDB being 'almost finished', I think now is a great time to comb through the code base and refactor any not so great code code, if there is magical code that isnt clear how it works (changes code! most of constructor etc) it should be made simpler as well as make api's more consistent (adapter.js etc)

4. Promote - At least I have always still been cagey about promoting PouchDB and making it totally inviting for users, stuff like try-pouchdb, pouch.host / pouchbase, paste.pouchdb.com, can become better integrated parts of the project more tutorials and better visibility of the ones that exist

5. pouchdb-find - Map reduce queries have long been a source of suffering for CouchDB users, lets see what capabilities are missing from pouchdb-find (skip / limit?) and get this merged into core

6. Fully streaming replication, _bulk_get is a vast improvement, however if we follow on from https://github.com/nolanlawson/pouchdb-replication-stream and see if we can get our default replication to be a straight stream, its going to perform much better and open up more features for users

7. Along with pouchdb-find one of the major sources of fustration is per document validation, per user databases are nice for a very specific use cases but any other use case is currently prohibitively complicated, this is going to need a lot of experimentation mostly done out of pouchdb core, but I think as we promote pouchdb to more users, this is a story that will and needs to be fleshed out

8. idb-next - Our underlying storage model is vastly unsuited to how we handle data, it makes all reads / writes much slower and secondary indexes an order of magnitude slower, I am experimenting a new ground up indexeddb adapter that will hopefully work across safari, be less code and make secondary indexes super fast

9. Conflict free replication - This is how most other sync solutions work, the replicator is required to resolve the conflict before writing, it would vastly simplify storage requirements (we would only ever need to store current rev) and potentially be far easier for users to comprehend (nobody handles conflicts), we can write a pouchdb replicator that works like that.

They are listed in somewhat order of size + priority, but not totally, some things may end up being dependent on others.

What are your goals and plans for PouchDB

Cheers
Dale

step...@shimaore.net

unread,
Oct 18, 2015, 3:06:29 PM10/18/15
to pou...@googlegroups.com
Hi Dale,

I mostly use PouchDB within Node.js, so not sure how well this folds
into the remainder of PouchDB's goals, but here goes:

- I use PouchDB as my only API for CouchDB nowadays. CouchDB forward-
compatibility ("2.0... and beyond!") is important. (Well, assuming
this remains in "core" PouchDB, obviously.)

- Building LevelDown every time one installs PouchDB takes time while
the applicaton might actually not need it. So more generally speaking,
one extra goal could be trimming dependencies and allowing the developer
to provide dependencies when the application needs them.

Obviously the fact that one can code an application with PouchDB's API
and decide _later_ whether to use CouchDB or a local (file-based)
database, is the nicest part -- allowing for Hoodie-like/offline-first
development without realizing it.


Off your list I'd say the issues that affect newcomers (overcoming the
"one database per user" pattern; new query languages) should be priorities.
Enhancements to the internals of PouchDB are definitely worthwhile, but
architectural patterns are the ones that will carry on (and might make
it back into CouchDB and the rest of the family, and therefor provide
answers for an even larger community).
S.

--
tel:+33643482771
http://stephane.shimaore.net/

Alexander Gabriel

unread,
Oct 18, 2015, 4:12:02 PM10/18/15
to pou...@googlegroups.com
Hi Dale

First of all: Thanks a LOT for this outstanding tool(s). And your dedication to helping every one asking for help.

Your list looks great.

The two points I would prioritize because they bite my applications most:
8. idb-next: faster secondary indexes would be GREAT
6. Fully streaming replication

Cheers
Alex




--
You received this message because you are subscribed to the Google Groups "PouchDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pouchdb+u...@googlegroups.com.
To post to this group, send email to pou...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pouchdb/20151018190627.GA28504%40shimaore.net.
For more options, visit https://groups.google.com/d/optout.

Alexander Gabriel

unread,
Oct 18, 2015, 4:13:07 PM10/18/15
to pou...@googlegroups.com
... I had better include Nolan in my thanks too

Nolan Lawson

unread,
Oct 20, 2015, 2:18:05 PM10/20/15
to PouchDB
Big round of applause for everything you've done, Dale. I've rarely seen an open-source project that's managed as well as PouchDB, and I definitely wouldn't have gotten so involved in open-source if you hadn't been so supportive and helpful right off the bat.

I think after 5 years of work, we have one of the best-tested, best-documented, and friendliest JS projects out there. The code is so solid that new bugs are almost always user errors, weird environments, plugins, integration (e.g. SQLite Plugin), etc.

I wholeheartedly agree with your post but have a few things to add; see responses below.


    1. Implement purge - I kept saying "almost" finished, I think this is the last feature to "finish" PouchDB, its tricky to get right, but important for mobile users. I would like to get some working version of purge done without a groundup rewrite.

Yep, this is pretty much the last big feature before we're "feature complete."


    2. Close all bugs - I have been categorising things as 'bugs' and 'enhancements', I would really like to get to the point where we are releasing PouchDB with no known bugs. (defining what exact is a pouchdb bug is not going to be a perfect process as far as I can tell)

Yep.


    3. Cleanup - With PouchDB being 'almost finished', I think now is a great time to comb through the code base and refactor any not so great code code, if there is magical code that isnt clear how it works (changes code! most of constructor etc) it should be made simpler as well as make api's more consistent (adapter.js etc)

The replication code is also stateful and hairy as all getout. :P


    4. Promote - At least I have always still been cagey about promoting PouchDB and making it totally inviting for users, stuff like try-pouchdb, pouch.host / pouchbase, paste.pouchdb.com, can become better integrated parts of the project more tutorials and better visibility of the ones that exist

I'm going to be on the JavaScript Jabber podcast next week to talk about PouchDB. I also definitely think there need to be more blog posts and tutorials out there. Patricia Garcia's talk at JSConf EU (https://www.youtube.com/watch?v=1sLjWlWvCsc) highlighted this; she said something like "this is cutting-edge tech, so there aren't 50 tutorials out there explaining what to do." PouchDB *needs* those 50 tutorials, especially if it's going to attract the attention of junior devs and early coders.


    5. pouchdb-find - Map reduce queries have long been a source of suffering for CouchDB users, lets see what capabilities are missing from pouchdb-find (skip / limit?) and get this merged into core

pouchdb-find will probably hit 1.0 within a month, thanks mostly to Garren's work.


    6. Fully streaming replication, _bulk_get is a vast improvement, however if we follow on from https://github.com/nolanlawson/pouchdb-replication-stream and see if we can get our default replication to be a straight stream, its going to perform much better and open up more features for users

Yup, I think this is part of a larger discussion that needs to happen with the CouchDB folks. Jan said he liked my pouchdb-replication-stream, but didn't like how it handles attachments (base64, yeah, binary is better). You pointed out that it could be even more efficient by diffing. Both of those are great ideas; I'd love to see an implementation :).


    7. Along with pouchdb-find one of the major sources of fustration is per document validation, per user databases are nice for a very specific use cases but any other use case is currently prohibitively complicated, this is going to need a lot of experimentation mostly done out of pouchdb core, but I think as we promote pouchdb to more users, this is a story that will and needs to be fleshed out

To me, this is pretty much the biggest failing of CouchDB's authentication system. I think it's silly that you can *almost* implement a full authentication system with pure CouchDB, but then you're stuck if you want per-user read/write permissions, which is like 95% of everybody's use case.

But IMO it's mostly fixed once couch-per-user is in core. The only improvement would be couch-per-role (also much needed for many use cases). It also could be solved by external projects like Hoodie, pouch.host, superlogin, etc.


    8. idb-next - Our underlying storage model is vastly unsuited to how we handle data, it makes all reads / writes much slower and secondary indexes an order of magnitude slower, I am experimenting a new ground up indexeddb adapter that will hopefully work across safari, be less code and make secondary indexes super fast

Much needed, would love an overhaul of IndexedDB that *actually* works cross-browser. Keep an eye on the WebKit commit log; Brady Eidson has actually been doing a lot of work on IDB in the last few weeks, e.g. http://trac.webkit.org/changeset/191210. So if we're lucky, Apple will fix most of their glaring issues and we don't have to write a ton of workarounds.


    9. Conflict free replication - This is how most other sync solutions work, the replicator is required to resolve the conflict before writing, it would vastly simplify storage requirements (we would only ever need to store current rev) and potentially be far easier for users to comprehend (nobody handles conflicts), we can write a pouchdb replicator that works like that.

This can be solved in plugin land IMO. We need better automated conflict resolution; pouch-resolve-conflicts is a good start, but we need something *way* simpler: https://github.com/jo/pouch-resolve-conflicts. And yes, we can write a custom replicator and distribute that as a plugin; see pouchdb-full-sync for an example of one: https://github.com/nolanlawson/pouchdb-full-sync

I agree that most users don't want to think about conflicts, but to me it is a virtue of PouchDB that we strive to educate users about the reality of distributed systems, and at least give them the *option* to handle conflicts.


    What are your goals and plans for PouchDB

My list:

1) Performance

I'm really unsatisfied with secondary indexes right now. I still think that ideally it needs to be solved before purge() is solved; otherwise we're going to have to write something very complicated that deals with dependentDbs and weird message passing between DBs; I would prefer if adapters just had some kind of low-level _createIndex() API or something that let you store basic key/values and then iterate over them, or modified the doc-store to have secondary keys other than the _id. This could be implemented in abstract-mapreduce, and then shared between pouchdb-find and mapreduce.

2) Plugins, plugins, plugins

As usual I have plenty of ideas for cool plugins. Some random ones off the top of my head:

pouchdb-webrtc (successor to PeerPouch, but based on worker-pouch/socket-pouch)
pouchdb-graphql (GraphQL/Relay/Flux is all the hype right now; I think we could do something ala ember-pouch to integrate with this)
pouchdb-progress-indicator (sync progress is hard, 'nuff said)

3) More user-friendly sugar APIs

I've always felt that both Pouch and Couch's API is way too complicated for beginners (they have a *really* hard time grokking 409s and _revs). It's also not well-suited to be the whole kit-n-kaboodle for a BaaS, which makes it hard for PouchDB to compete with the likes of closed-source centralized systems like Firebase and Parse (who I see as our primary competition, frankly).

I think Hoodie has the best handle on how to solve this problem, so if I have time and motivation I would probably start contributing to Hoodie more, and help build up a nice suite of user-friendly tools around that.

4) More outreach

As I mentioned above, the biggest thing the PouchDB community is missing right now, is tutorials, blog posts, videos, "try PouchDB," etc.

One of the reasons CouchDB "lost" to Mongo is that it was just way easier to get started with Mongo. Developers like to believe that they base their opinions on carefully studied research, but in fact mostly they base it off of 1) hype, 2) name recognition, and 3) how easily they can get started and feel empowered. #3 tends to feed into #1 and #2, because developers don't have time to try everything, so they briefly test-drive some tool and then sing its praises to their friends and coworkers.

PouchDB needs help in all three, and we're at a deficit because we are a community-driven, unfinanced open-source project. Meteor, Firebase, and Parse can all afford to write a blog post per week and promote themselves on social media; we have to go the community route and rely on word-of-mouth and groundswell. No biggie; that's how jQuery, lodash, and other heavy-hitters got started.

Anyway, congrats again to everyone on an awesome project. Dale, Calvin, Nick, Marten, Tomasz, and everybody else who contributed: amazing work, you are all some of my favorite people to work with, and let's keep it up. :)

- Nolan

Nate Dudenhoeffer

unread,
Nov 12, 2015, 9:29:20 AM11/12/15
to PouchDB
First off, I'll say thanks for all the work you have put in. I am a new user of PouchDB, and it seems really solid, and the docs are generally very good. I also think its great you are having this discussion about project direction. I wanted to offer a few thoughts on outreach as a new user. I suspect a large portion of new users will come without any experience with CouchDB (as I did), and there is a pretty steep learning curve there. 

I think there is room for improvement in the docs for discussion of map/reduce and secondary indexes. Two of the stumbling blocks I hit were not understanding what the "value" property means in the result of a query, figuring out how to properly define a reduce function (or use an internal one). My main problem with the reduce function was that I didn't realize I needed to define it both as a "reduceFunction" in the view, then also set "reduce: true" in the query. This SO answer finally straightened it out for me: http://stackoverflow.com/a/27576198/480303

I hope these pointers are helpful. I don't intend to complain, I just thought I would point these out, because they cost me a few hours and if these had been the first things I tried with Pouch, I probably would have looked for another database solution.

Regards,
Nate

Nolan Lawson

unread,
Nov 21, 2015, 1:04:21 PM11/21/15
to PouchDB
Hi Nate,

Thanks for your feedback! :) I think in general our approach moving forward is to replace map/reduce entirely with pouchdb-find, because map/reduce is simply too complicated for beginners. (Heck - it's complicated for experts too; I certainly struggled with it when I first started with CouchDB.)

Thanks for the link, though! We could definitely work to improve the current map/reduce docs.

Cheers,
Nolan

Yaron Goland

unread,
Dec 1, 2015, 12:12:22 PM12/1/15
to PouchDB

As a side note, Thali has been super quiet on this list recently because we have been heads down getting our first POC out the door with our partner, Rockwell Automation. But PouchDB is the core of our entire P2P system. If you look carefully at our new logo you will notice that PouchDB is included, see https://github.com/thaliproject/thali/blob/gh-pages/assets/logos/thali-icon-logos.svg. I'm actually really happy to see many of our concerns about PouchDB addressed in the list below.


Perf is our number one issue. We often have to push over limited data channels (Bluetooth tops out at 1 Mbps) so perf matters. All of our scenarios (and I do mean *all*) end up needing a ton of attachments. So attachment perf is a really big deal for us.


Right now we are trying to work with per-DB ACLs rather than per-record ACLs but we already know that won't work long run.


We don't have a lot of use of secondary indexes right now but we know that is a point in time issue, eventually we will need it.


Did I mention perf?


Another issue just waiting to blow up is (shudder) per record signing. We are heading toward mesh style scenarios where we need to be able to validate who added a record to a particular DB so we need to sign that record. This is super screaming ugly tricky. It either requires canonicalization of JSON and then storing the signature (probably as an attachment just to simplify things) or it requires some way to bring in a JSON structure "as is" and keep it "as is" (perhaps as an attachment to the parsed JSON?) so we can validate it later if necessary (possibly just during synch). There are other possibilities like doing the signing in the context of synch using some kind of stream format that we then make people persist but this has a lot of fairly ugly storage implications and while phones do have quite a bit of storage it isn't infinite.


Oh and perf.


Anyway, just a thought. 😊


        Yaron



Sent from Outlook


From: pou...@googlegroups.com on behalf of Nolan Lawson
Sent: Saturday, November 21, 2015 10:04:20 AM
To: PouchDB
Subject: Re: PouchDB Direction
 
--
You received this message because you are subscribed to the Google Groups "PouchDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pouchdb+u...@googlegroups.com.
To post to this group, send email to pou...@googlegroups.com.

Nolan Lawson

unread,
Dec 6, 2015, 4:09:17 PM12/6/15
to PouchDB
Hey Yaron,

Yes, PouchDB ❤️ Thali, and we're glad y'all love us too. :)

Sounds like someone from your team would be a good candidate to implement atts_since for better attachment replication performance? :D

Another current issue with attachments as written is that the replication protocol is not terribly efficient at sending them. I meant to add some kind of binary support to pouchdb-replication-stream, but never got around to it. Probably the best solution at this point would be a fork of pouchdb-replication-stream that uses an efficient binary format rather than a string/base64 format. (It would probably have to be a fork, unless you can think of a backwards-compatible way to support the existing pouchdb-replication-stream ecosystem.)

Cheers,
Nolan


Yaron Goland

unread,
Dec 7, 2015, 2:27:27 PM12/7/15
to pou...@googlegroups.com

We really want to start making contributions to PouchDB. At this point we are still fighting with the native libraries to get P2P to work properly. Once that is under control then we have to do a pretty serious update on our core Node.js libraries based on everything we have learned from the POC. [1] Once that is under control then we can look at PouchDB.

 

Atts_since is probably the top of the list. After that we need to do some benchmarking. Right now our plans are to only support pull replication, at least for a start. It makes security easier to deal with. And I believe the existing PouchDB replication code already supports pulling attachments down as standalone GETs so we won’t have encoding issues. We also support multiple simultaneous HTTP connections even over the non-TCP transports. So I would expect the perf to be pretty good. I’m not sure how much better it would get with pouchdb-replication-stream?

 

               Thanks,

                              Yaron

 

[1] http://thaliproject.org/DeveloperHandbook/#read has the current stack of specs and there are several more to come.

--
You received this message because you are subscribed to the Google Groups "PouchDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pouchdb+u...@googlegroups.com.
To post to this group, send email to pou...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages