I'm experimenting with CouchDB 1.2.0 (default settings) on FreeBSD 9 and
Erlang r14b on an Intel Xeon X5670 @ 2.93GHz.
I've inserted some (about 50000) simple (one key and a value) entries to
a test database with increasing IDs (1,2,3...) and try to get them with
a simple GET on localhost.
What I see in wireshark (to exclude all parsing overhead):
07:54:21.410845 HTTP GET /test/1
07:54:21.411955 HTTP/1.1 200 OK
07:54:21.509766 the JSON data
So getting an exact document took .098921 seconds (nearly 98.9
milliseconds) on a completely idle machine.
Any subsequent queries are in the order of the above response time,
which is just slow.
Is this what CouchDB and Erlang capable of, or something is wrong in my
setup? I haven't turned compression off, BTW, but will measure its effect.
Thanks,
In the mean time, I've found that somewhere in time CouchDB/HTTPd turned
TCP NODELAY to off, so
socket_options = [{nodelay, true}]
gives: 2.47 ms response time, which is a major increase.
I could lower that down to 2.1 ms by switching to
null_authentication_handler, which is not good, but better.
On query performance: when I fetch the same documents (one by one, ID
number one to the last) from three different machines on four threads on
each of them (so 12 concurrent HTTP GETs can be on the wire), I can
saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one
core) to 100% CouchDB and can get about 1700 query/sec performance.
These are just plain HTTP GETs, so no JSON parsing is involved.
Switching to persistent connections gives 2200 query/sec (again, CouchDB
maxes the CPU out).
I hope some day CouchDB will be able to deliver performance too.
You may want to describe your use case i.e. what you are trying to accomplish to allow the community to provide informed comment on your observations.
Thanks
Mike
What I need is a multi-site replicated document DB (well, most of the
time a key-value DB would also be fine, but CouchDB views are very handy
for the rest, which spares me to build my own indexes) where I can read
and write all instances every time and the last modification wins -whole
document-.
Also, I don't like read repair, the DB should log the changes and
replicate them (having last update conflict resolution is fine as said)
to the others when they can be reached.
For this specific application the read/write ratio is very high, like
5M:1 or higher.
So CouchDB is a perfect fit, my only problem is it (for this particular
case the read performance) should be somewhat faster. Also, a different
API would be good, with the attributes of -say- LDAP:
- binary for quick processing
- multiple outstanding queries on the wire
HTTP is easy to use, but I guess it adds somewhere 30-50% of the current
processing time (are there any exact measurements maybe?).
I really think that april 1 post about switching to Java would bring
more boost (at least raw performance wise, from other perspectives,
maybe Erlang is a good fit).
Waiting some hundred milliseconds for the GCs is fine with me if they
don't happen too often. ;)
There's a thread from last month talking about slow reads, but I
cannot find it in the archives
(http://mail-archives.apache.org/mod_mbox/couchdb-user/201204.mbox/browser)
Basically, one of the bottlenecks is the erlang-format <-> JSON
conversion. You could try to bypass it for a read and send the HTTP
response with the raw erlang document to see if there's a real
improvement in the speed.
--
Matthieu RAKOTOJAONA
Also, HTTP pipelining allows you to submit multiple outstanding queries on the wire and is fully supported in CouchDB. Admittedly finding clients that do it well is not always an easy task. Regards,
Adam
Yes, the cause is in the final setup, CouchDB will have a power of that,
so I would like to benchmark what kind of performance it can deliver.
Am I right when I assume that HTTP pipelining won't give anything
serious over multiple persistent connections (and no pipelining on them)?
(I've measured non persistent and persistent: 1700 QPS vs 2200, each of
them gave 100% CPU usage from CouchDB)
Adam