couchdb returning empty response

323 views
Skip to first unread message

Tim Tisdall

unread,
Aug 14, 2012, 10:38:08 PM8/14/12
to us...@couchdb.apache.org
I'm still having problems with couchdb, but I'm trying out different
things to see if I can narrow down what the problem is...

I stopped using fsockopen() in PHP and am using curl now to hopefully
be able to see more debugging info.

I get an empty response when sending a POST to _bulk_docs. From the
couch logs it seems like the server restarts in the middle of
processing the request. Here's what I have in my logs: (I have no
idea what the _replicator portion is about there, I'm currently not
using it)


[Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] 'POST'
/app_stats_test/_bulk_docs {1,0} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
{'Content-Length',"2802300"},
{'Content-Type',"application/json"},
{'Host',"localhost:5984"}]
[Wed, 15 Aug 2012 02:27:30 GMT] [debug] [<0.1255.0>] OAuth Params: []
[Wed, 15 Aug 2012 02:27:45 GMT] [debug] [<0.115.0>] Include Doc:
<<"_design/_replicator">> {1,
<<91,250,44,153,
238,254,43,46,
180,150,45,181,
10,163,207,212>>}
[Wed, 15 Aug 2012 02:27:45 GMT] [info] [<0.32.0>] Apache CouchDB has
started on http://127.0.0.1:5984/


In my code logs I have the following by running curl in verbose mode:

* About to connect() to localhost port 5984 (#0)
* Trying 127.0.0.1... * connected
* Connected to localhost (127.0.0.1) port 5984 (#0)
> POST /app_stats_test/_bulk_docs HTTP/1.0
Host: localhost:5984
Accept: */*
Content-Type: application/json
Content-Length: 2802300

* Empty reply from server
* Connection #0 to host localhost left intact
curl error: 52 : Empty reply from server



I also tried using HTTP/1.1 and I get an empty response after
receiving only a "100 Continue", but the end result appears the same.

-Tim

Paul Davis

unread,
Aug 14, 2012, 11:41:07 PM8/14/12
to us...@couchdb.apache.org
If you have a request that triggers this, a good way to catch it is like such:

$ /usr/local/bin/couchdb # or however you start it
$ ps ax | grep beam.smp # Get the pid of couchdb
$ gdb
(gdb) attach $pid # Where $pid was just found with ps. Might
throw up an access prompt
(gdb) continue
# At this point, run the command that makes couchdb reboot in a
# different console. If it happens you should see Gdb notice the
# error. Then the following:
(gdb) t a a bt

And that should spew out a bunch of stack traces. If you can get that
we should be able to fairly specifically narrow down the issue.

Tim Tisdall

unread,
Aug 16, 2012, 10:27:25 AM8/16/12
to us...@couchdb.apache.org
Okay, I'm completely unfamiliar with gdb but I tried and failed to get
the stacktraces. Here's what happened...

I'm able to do everything up to attaching to the process and then
things go wrong...

(gdb) attach 29967
Attaching to process 29967
Reading symbols from /usr/lib/erlang/erts-5.8/bin/beam.smp...(no
debugging symbols found)...done.
Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[ *SNIP* ]
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from
/usr/local/lib/couchdb/erlang/lib/snappy-1.0.3/priv/snappy_nif.so...done.
Loaded symbols for
/usr/local/lib/couchdb/erlang/lib/snappy-1.0.3/priv/snappy_nif.so
0x00007f01e153c3e3 in select () from /lib/libc.so.6


If I try to see the process with ps in another terminal it now says:
29967 pts/3 Zl 0:00 [beam.smp] <defunct>

At this point couchdb is no longer responding so I'm not able to run
my script to try to get that stack trace. I also tried typing
"continue" into gdb and the process stays "defunct".

Do I need to install the debugging versions of all those libraries in
order for this to work?

-Tim

Tim Tisdall

unread,
Aug 16, 2012, 11:16:19 AM8/16/12
to us...@couchdb.apache.org
I tried running my code again and got a bunch of these entries in my logs:

[Thu, 16 Aug 2012 14:53:03 GMT] [error] [<0.20.0>] {error_report,<0.9.0>,
{<0.20.0>,std_error,
"File operation error: eacces.
Target: ./mochiweb_response.beam. Function: get_file. Process:
code_server."}}

here's the file listing for the file (as you can see it's readable by all):

-rw-r--r-- 1 root staff 1420 Aug 7 21:11
/usr/local/lib/couchdb/erlang/lib/mochiweb-1.4.1/ebin/mochiweb_response.beam

Tim Tisdall

unread,
Aug 16, 2012, 11:23:14 AM8/16/12
to us...@couchdb.apache.org
I don't know if this would help at all, but here are the steps I did
to install CouchDB:

- downloaded and uncompressed source in /usr/src/
- apt-get install erlang-dev libmozjs-dev libicu-dev erlang-eunit
erlang-inets erlang-os-mon
- ./configure --localstatedir=/var --sysconfdir=/etc
- make
- make install
- useradd -d /var/lib/couchdb couchdb
- chown -R couchdb: /var/{lib,log,run}/couchdb /etc/couchdb
- chmod 0770 /var/{lib,log,run}/couchdb /etc/couchdb
- update-rc.d couchdb defaults # start couchdb on system start

Tim Tisdall

unread,
Aug 16, 2012, 3:19:01 PM8/16/12
to us...@couchdb.apache.org
Paul, did you ever solve the eaccess problem you had described here:
http://mail-archives.apache.org/mod_mbox/couchdb-user/201106.mbox/%3C4E0B304...@lymegreen.co.uk%3E
I found that post from doing Google searches for my issue.

On Tue, Aug 14, 2012 at 11:41 PM, Paul Davis
<paul.jos...@gmail.com> wrote:

Dave Cottlehuber

unread,
Aug 16, 2012, 3:44:45 PM8/16/12
to us...@couchdb.apache.org
On 16 August 2012 17:16, Tim Tisdall <tis...@gmail.com> wrote:
> I tried running my code again and got a bunch of these entries in my logs:
>
> [Thu, 16 Aug 2012 14:53:03 GMT] [error] [<0.20.0>] {error_report,<0.9.0>,
> {<0.20.0>,std_error,
> "File operation error: eacces.
>
> Target: ./mochiweb_response.beam. Function: get_file. Process:
> code_server."}}
>
> here's the file listing for the file (as you can see it's readable by all):
>
> -rw-r--r-- 1 root staff 1420 Aug 7 21:11
> /usr/local/lib/couchdb/erlang/lib/mochiweb-1.4.1/ebin/mochiweb_response.beam
>

As you noticed, this is file permissions related. So check all of
those, esp permissions on the couchdb .beam files.

If you're logged in as that user, does pushd
/usr/local/lib/couchdb/erlang/lib/mochiweb-1.4.1/ebin && cat
mochiweb_response.beam > /dev/null work?

I usually do an

$ sudo -u couchdb couchdb -i

to start couchdb in interactive mode before switching to daemon usage.

Maybe a recursive chmod / chown on/usr/local/lib/couchdb is in order?

If you let me know what arch/platform this is on, I'll check tomorrow
on a fresh install if that doesn't fix it, using your notes provided.

A+
Dave

Paul Davis

unread,
Aug 16, 2012, 4:06:49 PM8/16/12
to us...@couchdb.apache.org
Never figured that issue out other than every single time its
mentioned its a red herring. I think its just Erlang doing "WARNING:
Everything is fine." type of logging.

Odd that beam dies when gdb attaches to it. Not sure if that's
important or not. I've definitely never seen such a thing.

Dave Cottlehuber

unread,
Aug 16, 2012, 4:07:53 PM8/16/12
to us...@couchdb.apache.org
Having said all that, I don't see how couch could start with these
permissions, certainly not to the point of doing a bulk upload. So we
must be running out of some resource. Are you able to provide the json
privately to us for a look?

A+
Dave

Tim Tisdall

unread,
Aug 16, 2012, 4:08:57 PM8/16/12
to us...@couchdb.apache.org
Okay, I double checked every file under /usr/local/lib/couchdb and
every one has read access to all. None of the .beam files need
execute permission, right?

Besides, the server starts up and runs fine with no eaccess errors
until several thousand queries are made. Also, after it crashes and
gives those errors, it automatically restarts it continues running
with no problem. I'm thinking it's something deeper in the kernel
that temporarily stops access to those files and then goes back to
normal... I'm running everything on a virtualized server on VPS.net.
However, I've also had the couchdb server restart without those
eaccess errors.

Will running couchdb in interactive mode give me any more logging
information that what goes into couch.log?

Tim Tisdall

unread,
Aug 16, 2012, 4:10:28 PM8/16/12
to us...@couchdb.apache.org
Things seemed to run despite those errors, but when the logging is in
"debug" I can see that the server does restart after outputting those
eaccess errors. I see it in my application as getting a blank
response from the server.

Tim Tisdall

unread,
Aug 16, 2012, 4:18:37 PM8/16/12
to us...@couchdb.apache.org
Sorry, forgot to mention earlier that I'm running Debian 6.0. I'm not
sure the arch. as it's a virtualized server... /proc/cpuinfo says
"Intel(R) Xeon(R) CPU E5520"

Yes, I've had issues before with running out of resources because the
virtualized instance is rather small. Usually that results in the
whole machine crashing, though. When couchdb goes down, there is
still available swap space and lots of HD space (I need lots of empty
space to do compaction on the db). Also, couchdb is able to restart
with no issues. ... I just tried looking through system logs for any
kind of resource issue and couldn't find anything.

-Tim

Robert Newson

unread,
Aug 17, 2012, 4:33:31 AM8/17/12
to us...@couchdb.apache.org

I've seen couchdb start despite the eacces errors before and tracked it down to the current working directory setting. It seems that the cwd is searched first, and then erlang looks elsewhere. So, if our startup script doesn't change it to somewhere that the couchdb user can read, you get spurious eacces errors.

Don't ask me how I know this.

B.

Tim Tisdall

unread,
Aug 17, 2012, 10:09:39 AM8/17/12
to us...@couchdb.apache.org
I thought I added that to the init script before when you mentioned
it, but I checked and it was gone. I added a "cd ~couchdb" in there
and now I no longer get eaccess errors, but the process still crashes
with very little information:

[Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] 'POST'
/app_stats_test/_bulk_docs {1,0} from "127.0.0.1"
Headers: [{'Accept',"*/*"},
{'Content-Length',"3902444"},
{'Content-Type',"application/json"},
{'Host',"localhost:5984"}]
[Fri, 17 Aug 2012 14:01:44 GMT] [debug] [<0.1372.0>] OAuth Params: []
[Fri, 17 Aug 2012 14:02:16 GMT] [debug] [<0.115.0>] Include Doc:
<<"_design/_replicator">> {1,
<<91,250,44,153,
238,254,43,46,
180,150,45,181,
10,163,207,212>>}
[Fri, 17 Aug 2012 14:02:17 GMT] [info] [<0.32.0>] Apache CouchDB has
Someone mentioned seeing the JSON that I'm submitting... Wouldn't
mal-formed JSON throw an error?

-Tim

Robert Newson

unread,
Aug 17, 2012, 10:34:30 AM8/17/12
to us...@couchdb.apache.org

Replicating the _replicator db is problematic (i.e, not possible) due to the restrictions introduced by the 'systems db' feature combined with the validation functions inside _design/_replicator.

B.

CGS

unread,
Aug 17, 2012, 10:35:30 AM8/17/12
to us...@couchdb.apache.org
Hi,

Do you have somehow special characters (non-latin1 ones) in your JSON? That
error looks strangely close to trying to transform a list of unicode
characters into a binary. I might be wrong though.

CGS

Zera Holladay

unread,
Aug 17, 2012, 11:41:42 AM8/17/12
to us...@couchdb.apache.org
Try running couch with strace. You might get lucky and find the
error/hint causing the problem if it is system related. Filter on
non-zero return codes except read and writes if the output is too
messy, like:

$ strace -f ./bin/couchdb 2>&1 | egrep -v '0$'

-zh

Tim Tisdall

unread,
Aug 17, 2012, 12:43:53 PM8/17/12
to us...@couchdb.apache.org
I have no idea what that line about _replicator is about... I'm not
using it and it's empty except for _design/_replicator . Why would
there be an entry about that database in there and what does it mean?

Tim Tisdall

unread,
Aug 17, 2012, 12:52:51 PM8/17/12
to us...@couchdb.apache.org
I do have UTF8 characters in the JSON, but isn't that acceptable? I
have no problem retrieving UTF8 encoded content from the server and I
have a bunch of it saved in there already too.

On Fri, Aug 17, 2012 at 10:35 AM, CGS <cgsmc...@gmail.com> wrote:

Robert Newson

unread,
Aug 17, 2012, 1:03:03 PM8/17/12
to us...@couchdb.apache.org

Does app_stats_test contain a document called _design/_replicator or is a document with that id in the body of your bulk post?

B.

Tim Tisdall

unread,
Aug 17, 2012, 1:13:48 PM8/17/12
to us...@couchdb.apache.org
No. All my ids (except for design documents) are strings containing
integers. Also, none of my design documents are called anything like
"_replicator". The only thing with that name is in the _replicator
database which I'm not doing anything with.

Why does it say "Include Doc"? And what's that series of numbers
afterwards? That log message seems to consistently occur just before
the log message about the server starting. Is that just a normal
message you get when the server restarts and you have logging set to
"debug"?

Tim Tisdall

unread,
Aug 17, 2012, 1:30:49 PM8/17/12
to us...@couchdb.apache.org
Okay, so it always states that _replicator line any time I manually
restart the server. I think it's just a standard logging message when
the level is set to "debug".

CGS

unread,
Aug 17, 2012, 9:33:33 PM8/17/12
to us...@couchdb.apache.org
I managed to reproduce the error:

[Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] OAuth Params: []
[Sat, 18 Aug 2012 00:58:37 GMT] [debug] [<0.114.0>] Include Doc:
<<"_design/_replicator">> {1,
<<91,250,44,153,
238,254,43,46,

180,150,45,181,

10,163,207,212>>}
[Sat, 18 Aug 2012 00:58:37 GMT] [info] [<0.32.0>] Apache CouchDB has
started on http://0.0.0.0:5984/

...and I think I identified also the problem: too long/large JSON.

Here is how to reproduce the error:

1. CouchDB error level: debug
2. an extra-huge JSON file: echo -n "{\"docs\":[{\"key\":\"1\"}" >
my_json.json && for var in $(seq 2 2000000) ; do echo -n
",{\"key\":\"${var}\"}" >> my_json.json ; done && echo -n "]}" >>
my_json.json
3. attempting to send it with curl (requires to have database "test"
already existing and preferably empty):

curl -X POST http://127.0.0.7:5984/test/_bulk_docs -H 'Content-Type:
application/json' -d @my_json.json > /dev/null
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 33.2M 0 0 100 33.2M 0 856k 0:00:39 0:00:39 --:--:--
0
curl: (52) Empty reply from server

Erlang shell report for the same problem:

=INFO REPORT==== 18-Aug-2012::03:12:57 ===
alarm_handler: {set,{system_memory_high_watermark,[]}}

=INFO REPORT==== 18-Aug-2012::03:12:57 ===
alarm_handler: {set,{process_memory_high_watermark,<0.149.0>}}
/usr/local/lib/erlang/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has
closed.Erlang has closed

Tim, try to split your JSON in smaller pieces. Bulk operations tend to use
a lot of memory.

The _design/_replicator error comes with multipart file set by cURL by
default in such cases. Once a second piece is sent toward the server, the
crash is registered. The first piece report looks like:

[Sat, 18 Aug 2012 00:57:38 GMT] [debug] [<0.170.0>] 'POST' /test/_bulk_docs
{1,1} from "127.0.0.1"

I hope this info may help.

Tim Tisdall

unread,
Aug 18, 2012, 3:15:14 PM8/18/12
to us...@couchdb.apache.org
So, it's possible that couchdb is running out of memory when
processing a large JSON file? From my last example I gave, the JSON
file is 3.9Mb which I didn't think was too big, but I do only have
~380Mb of RAM. However, I am able to do several thousand similar
_bulk_doc updates of around the same size before I see the error...
are memory leaks possible with erlang? Also, why is there nothing in
the logs about running out of memory? (shouldn't that be something
the program is able to detect?)

I switched over to using _bulk_doc's because the database grew way too
fast if I did only 1 update at a time. I'm doing about 5000 - 200000
document updates each time I run my script so I've been doing the
updates in batches of 150.

-Tim

CGS

unread,
Aug 19, 2012, 6:30:42 AM8/19/12
to us...@couchdb.apache.org
On Sat, Aug 18, 2012 at 9:15 PM, Tim Tisdall <tis...@gmail.com> wrote:

> So, it's possible that couchdb is running out of memory when
> processing a large JSON file?


Definitely.


> From my last example I gave, the JSON
> file is 3.9Mb which I didn't think was too big, but I do only have
> ~380Mb of RAM. However, I am able to do several thousand similar
> _bulk_doc updates of around the same size before I see the error...
> are memory leaks possible with erlang?


It looks more like a RAM limitation per process. There may be a memory
leak, but I am not sure.


> Also, why is there nothing in
> the logs about running out of memory? (shouldn't that be something
> the program is able to detect?)
>

It seems CouchDB doesn't catch this type of warnings.


>
> I switched over to using _bulk_doc's because the database grew way too
> fast if I did only 1 update at a time. I'm doing about 5000 - 200000
> document updates each time I run my script so I've been doing the
> updates in batches of 150.
>

I don't know about your requirements, but I remember a project in which I
created a round-robin to buffer and feed the docs to CouchDB. In that
project I had to find an optimization in between the number of slices and
the number of docs I could store for being able to feed to CouchDB in order
to minimize the insertion time. Maybe this idea will help you in your
project as well.

CGS

Robert Newson

unread,
Aug 19, 2012, 8:15:40 AM8/19/12
to us...@couchdb.apache.org
3.9Mb isn't large enough to trigger memory issues on its own on a node with 380M of ram. Can you use 'top' or 'atop' to see what memory consumption was like before the crash? Erlang/OTP does usually report out of memory errors when it crashes (to stderr which doesn't hit the .log file, iirc).

B.

Tim Tisdall

unread,
Aug 19, 2012, 4:00:04 PM8/19/12
to us...@couchdb.apache.org
stderr shows this when I hit an empty response:

heart_beat_kill_pid = 17700
heart_beat_timeout = 11
Killed
heart: Sun Aug 19 18:23:54 2012: Erlang has closed.
heart: Sun Aug 19 18:23:55 2012: Executed "/usr/local/bin/couchdb -k".
Terminating.
heart_beat_kill_pid = 18390
heart_beat_timeout = 11
Killed
heart: Sun Aug 19 18:35:18 2012: Erlang has closed.
heart: Sun Aug 19 18:35:18 2012: Executed "/usr/local/bin/couchdb -k".
Terminating.
heart_beat_kill_pid = 18438
heart_beat_timeout = 11


So, it looks like the OS is killing the process because it's running
out of memory. I can see in syslog that the oom-killer is killing
processes at exactly the same time. What's strange, though, is
there's no mention of oom-killer killing couchdb. There's only
mentions of other processes being killed.

CGS

unread,
Aug 19, 2012, 4:48:27 PM8/19/12
to us...@couchdb.apache.org
couchdb -k = kill couch and restart it

Robert Newson

unread,
Aug 19, 2012, 5:17:16 PM8/19/12
to us...@couchdb.apache.org

Is one of those "other processes" called "heart", by any chance?

B.

Tim Tisdall

unread,
Aug 20, 2012, 11:47:21 AM8/20/12
to us...@couchdb.apache.org
No, heart remains running. I suppose that's why couchdb nearly
immediately comes back up after being killed. Looking at the logs
again I am seeing oom-killer killing couchdb, so I guess there's a few
things killing it due to lack of memory.
Reply all
Reply to author
Forward
0 new messages