Very poor validation performance

90 views
Skip to first unread message

aclelland

unread,
Nov 4, 2013, 10:00:24 AM11/4/13
to bigcou...@googlegroups.com
Hi,

I'm currently looking into BigCouch as a way to manage a relatively small cluster of servers. 

My current setup is
Zone #1
   4 Nodes
Zone #2.
   2 Nodes
Each node is an 8 core VM with 8GB of RAM.

My database was created with
N=3 (2 copies in Z1 and a single copy on Z2)
Q=16
Z=2

Initially things were looking promising. Using a simple load test which consisted of a PUT, GET and then DELETE I was able to get around 1300 requests per second. The problem happens when I add my validation function to the database. The function is simple and looks like this: 

----------------------------
designDoc.validate_doc_update = function ( newDoc, oldDoc, userCtx ) {

        if ( newDoc._deleted ) {
                return;
        }

        var docIsNew = ( oldDoc == null );

        // check that data type is the same - only for updates, not new or deleted docs
        if ( !docIsNew && oldDoc.dataType != newDoc.dataType ) {
                throw( { error: 'document type cannot be changed' } );

        if ( !isNumberPositive( newDoc.book_version) ) {
                throw( { error: 'wrong book version' } );
        }

        // new or deleted docs need no further checking
        if ( docIsNew ) {
                return;
        }

                     if ( !isNumberPositive( newDoc.article_version) ) {
                            throw( { error: 'wrong article version' } );
                      } 

        function isNumberPositive( value ) {
                value = parseInt( value );
                if ( isNaN( value ) || value < 0 ) {
                        return false;
                } else {
                        return true;
                }
        }


----------------------

As soon as I add it I see my requests per minute drop to 300-400/s. As far as I can see this really shouldn't have too much of an effect since the Javascript isn't doing much other than checking if two numbers are positive. I add the validation through couchapp:
couchapp push validator.js 127.0.0.1:5984/example_project

If anyone can help by pointing out something I'm doing wrong or a more efficient way of doing these types of checks that would really help me.

Thanks,
Alan

Mike Miller

unread,
Nov 4, 2013, 3:57:21 PM11/4/13
to bigcou...@googlegroups.com
Hi Alan,

First, glad that you're trying out different options.  For document read/write benchmarks, I'll note that some basic tuning should get you up to about 5k writes/sec in a naive benchmark.  You can get there by:

1) Looking into the bulk_docs and all_docs APIs for bulk writes/reads.  This will get you a 5x in general.
2) Concurrency is king.  3 nodes generally give the best throughput around 20-50 concurrent requests, depending upon workload.
3) Persistent HTTP connections (e.g. python requests) minimize overhead
4) Async -- make sure you only block at the app level when you have to.

Now, for your validate doc update (VDU) question.  Indeed VDUs will slow things down. How much they shouldslow things down, I'm not sure.  For regular document writes everything happens in-process in native erlang (once the JSON is decoded into an erlang term).  VDUs require you to reach out to Javascript, which is out-of-process and also requires extra JSON marshalling. Now, you may be able to tune the JS run-time a bit, but it's always going to be slower.  If you require VDUs, I wonder if you could reach out to erlang for the speedup you require?

-M


--
You received this message because you are subscribed to the Google Groups "BigCouch Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigcouch-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

aclelland

unread,
Nov 5, 2013, 3:39:09 AM11/5/13
to bigcou...@googlegroups.com
Hi Mike,

Thanks for the reply, the web service which has access to the couch cluster receives lots of small requests very frequently from browsers and sends them to the database. I believe this means that I'm not really able to make much use of the bulk API since I cannot guarantee that a client will always reach the same web server I couldn't guarantee that the requests would be added to the database in the correct order. I do believe that the "batch=ok" option might help here since the PUTS get queued at the database level.

I'm currently using persistent connections from the browser to the web server and from the web server to couch.The performance benefits really shine through at higher connection rates.

That's what I was thinking regarding the JS performance, I think it's also hinted at in the performance documentation. We're going to investigate doing the validation at the web server or cluster load balancer level which should speed things up considerably. 

On other question, are there any recommended guidelines for the number of shards? With 4 shards I was seeing much lower performance than I am with 32 shards but I sometimes see the couchdb service crash if I do heavy testing with 64 shards.

Thanks,
Alan
Reply all
Reply to author
Forward
0 new messages