Does CouchDB support HTTP pipelining?

Jens Alfke

unread,

Oct 30, 2012, 10:47:55 AM10/30/12

to us...@couchdb.apache.org

Does CouchDB 1.2 (or rather, MochiWeb) support HTTP pipelining?

I’m trying to improve replication throughput when TouchDB pulls from CouchDB; performance is often bottlenecked by GETting every revision with an individual request. I turned on pipelining support in the iOS HTTP framework, but didn’t see any improvement, so I’m wondering if it’s a CouchDB limitation.

—Jens

Paul Davis

unread,

Oct 30, 2012, 2:20:23 PM10/30/12

to us...@couchdb.apache.org

Its supported but only so much as to say that it won't break things.
Mochiweb uses a single Erlang process per socket which means that it
will handle each request serially. Technically it'd be possible to
notice latency differences for cheap requests, but I bet the rev
lookup requests are too dominated by the actual disk lookups for that
to be noticeable.

Jens Alfke

unread,

Oct 30, 2012, 3:15:11 PM10/30/12

to us...@couchdb.apache.org

On Oct 30, 2012, at 11:20 AM, Paul Davis <paul.jos...@gmail.com> wrote:

> Technically it'd be possible to
> notice latency differences for cheap requests, but I bet the rev
> lookup requests are too dominated by the actual disk lookups for that
> to be noticeable.

Well, I’ve found that getting docs in bulk (POSTing doc IDs to _all_docs?include_docs=true) is _much_ faster than making individual GET requests, so I don’t think the disk lookup is the limiting factor. But unfortunately I can’t do that for all the documents because _all_docs doesn’t allow me to get the revision histories (it ignores ?revisions=true).

—Jens

Paul Davis

unread,

Oct 30, 2012, 4:15:17 PM10/30/12

to us...@couchdb.apache.org

That's an intriguing datapoint because _all_docs?include_docs=true is
the same algorithmic complexity as issuing a larger number of GET
requests. That would suggest that something in the HTTP layer is
adding significant overhead to individual requests which is definitely
possible and something I've been meaning to investigate whenever I can
find the time.

Jens Alfke

unread,

Oct 30, 2012, 4:54:30 PM10/30/12

to us...@couchdb.apache.org

On Oct 30, 2012, at 1:15 PM, Paul Davis <paul.jos...@gmail.com<mailto:paul.jos...@gmail.com>> wrote:

That's an intriguing datapoint because _all_docs?include_docs=true is
the same algorithmic complexity as issuing a larger number of GET
requests. That would suggest that something in the HTTP layer is
adding significant overhead to individual requests

Interesting. I could try to write a test case in Ruby or Python — something that would first fetch a large number of docs as individual GETs, then fetch the same docs in a single _all_docs.

I’m assuming that the ?revisions=true option doesn’t add a huge amount of overhead, since the revision tree is already contained in the document’s b-tree node, right? So it would just require converting the revision's history into JSON and transmitting that JSON.

—Jens

Adam Kocoloski

unread,

Oct 30, 2012, 4:56:36 PM10/30/12

to us...@couchdb.apache.org

On Oct 30, 2012, at 4:54 PM, Jens Alfke <je...@couchbase.com> wrote:

> I’m assuming that the ?revisions=true option doesn’t add a huge amount of overhead, since the revision tree is already contained in the document’s b-tree node, right?

That's correct, no need for an extra disk lookup or anything like that.

Paul Davis

unread,

Oct 30, 2012, 5:53:37 PM10/30/12

to us...@couchdb.apache.org

On Tue, Oct 30, 2012 at 4:54 PM, Jens Alfke <je...@couchbase.com> wrote:
>
> On Oct 30, 2012, at 1:15 PM, Paul Davis <paul.jos...@gmail.com<mailto:paul.jos...@gmail.com>> wrote:
>
> That's an intriguing datapoint because _all_docs?include_docs=true is
> the same algorithmic complexity as issuing a larger number of GET
> requests. That would suggest that something in the HTTP layer is
> adding significant overhead to individual requests
>
> Interesting. I could try to write a test case in Ruby or Python — something that would first fetch a large number of docs as individual GETs, then fetch the same docs in a single _all_docs.
>

That could be useful but the bigger chunk of work here will be in
setting up and running the profiling for short lived processes.

> I’m assuming that the ?revisions=true option doesn’t add a huge amount of overhead, since the revision tree is already contained in the document’s b-tree node, right? So it would just require converting the revision's history into JSON and transmitting that JSON.
>
> —Jens

Yeah, the hardest of bit of all this would just be adding the plumbing
to get that option down to the appropriate open_doc calls.

Jens Alfke

unread,

Oct 30, 2012, 6:51:46 PM10/30/12

to us...@couchdb.apache.org

On Oct 30, 2012, at 1:54 PM, Jens Alfke <je...@couchbase.com<mailto:je...@couchbase.com>> wrote:

Interesting. I could try to write a test case in Ruby or Python — something that would first fetch a large number of docs as individual GETs, then fetch the same docs in a single _all_docs.

I have a quick Ruby script now that first calls _all_docs to get all the doc IDs in a specific database, then gets all the docs one at a time with individual GETs, then gets them all in bulk by posting their IDs to _all_docs?include_docs=true.

The difference in performance is pretty huge, with the bulk mode being about 30x faster, both for a database on localhost and for a remote one (I used my Cloudant instance.)

However, this isn’t a fair test because it’s only sending one GET at a time, so the latency is killing performance. I’ll fix up the script to run four threads in parallel and see how much that helps.

—Jens

Jeremy Taylor

unread,

Oct 30, 2012, 7:43:59 PM10/30/12

to us...@couchdb.apache.org

+1 for _all_docs with ?revisions=true

For the types of information I'm interested in, being able to decode the
revision history is invaluable. I would even go as far as saying that
compaction is a tragedy.

Jeremy

Jan Lehnardt

unread,

Oct 31, 2012, 12:43:39 PM10/31/12

to us...@couchdb.apache.org

+1

https://issues.apache.org/jira/browse/COUCHDB-1584

Reply all

Reply to author

Forward