Improving (pull) replication performance / throughput

34 views
Skip to first unread message

Johan Pelgrim

unread,
Jun 5, 2014, 5:09:43 PM6/5/14
to mobile-c...@googlegroups.com
Hi there,

I have a simple test setup with a single Sync Gateway and single Couchbase Server on my local machine where I (pull) replicate around 20000 documents to my device. I'm noticing a performance of around 90-100 document syncs per second (I read these are in batches of 100 somewhere?) which results in a full replication of these 20000 documents in about 3.5 minutes. Are there ways to improve the replication performance, or increase the throughput somehow? Or is the 100 documents per second simply something we should live with?

Kind regards,

Johan

Andrew Reslan

unread,
Jun 5, 2014, 5:13:48 PM6/5/14
to mobile-c...@googlegroups.com
Is your client running on iOS or Android?

Can you give more details about the size of the documents, are they just JSON or do they have attachments?

Andy


Johan Pelgrim

unread,
Jun 5, 2014, 5:26:12 PM6/5/14
to mobile-c...@googlegroups.com
Hi Andrew,

My client is running on Android. The documents are just JSON. I've imported 20K rows of Dutch Postal Code documents (this dataset is 471K rows long). Here's one of them:

{
  "id": "395614",
  "postcode": "7940XX",
  "postcode_id": "79408888",
  "pnum": "7940",
  "pchar": "XX",
  "minnumber": "4",
  "maxnumber": "12",
  "numbertype": "mixed",
  "street": "Troelstraplein",
  "city": "Meppel",
  "city_id": "1082",
  "municipality": "Meppel",
  "municipality_id": "119",
  "province": "Drenthe",
  "province_code": "DR",
  "lat": "52.7047653217626",
  "lon": "6.1977201775604",
  "rd_x": "209781.52077777777777777778",
  "rd_y": "524458.25733333333333333333",
  "location_detail": "postcode",
  "changed_date": "2014-04-10 13:20:28",
  "doctype": "postcode"
}

P.S. If we're talking performance anyway, the Sync Gateway import itself, via PUT requests on port 4985 of the Sync Gateway Admin REST API, took me more than two hours, at 3 imports per second. Of course, al single server / single thread etc., but still. So I was quite happy that the replication took 'only' 3,5 minutes. Although I feel (and hope) the latter could be quicker, hence this post ;-)

Cheers,

Johan

Andrew Reslan

unread,
Jun 6, 2014, 6:43:06 AM6/6/14
to mobile-c...@googlegroups.com
For bulk loading into sync_gateway you might see some performance improvement using the _bulk_docs REST API call, here is a simplified CURL example for three documents (note I used the data sample id for doc _id, but if you don't supply this, sync_gateway will generate a doc _id):

curl -X POST -H "Content-Type: application/json"  http://localhost:4984/db/_bulk_docs -d '{"docs":[{"_id": "395614","postcode": "7940XX","postcode_id": "79408888"},{"_id": "395615","postcode": "7940XX","postcode_id": "79408889"},{"_id": "395616","postcode": "7940XX","postcode_id": "79408890"}]}'



With the sync_gateway response being:


[{"id":"395614","rev":"1-05f0f1bc377c413076b6b687a7fc77a8"},{"id":"395615","rev":"1-ef00f67e65fa0667f833d3fcb9b6bb9f"},{"id":"395616","rev":"1-379f9bdd754ccb0c2d768bf0709b7c0d"}]


There is a CouchDB Wiki page giving more examples for _bulk_docs here

Reply all
Reply to author
Forward
0 new messages