php-ext-couchbase-1.1.5 sporatically coredumping on getMulti

94 views
Skip to first unread message

Taylor Fort

unread,
Aug 7, 2013, 9:50:32 PM8/7/13
to couc...@googlegroups.com
We're testing couchbase php sdk with couchbase 2.0.1 community edition (6 boxes).  I'm noticing the following symptoms pop up on our apache webs where a child process will SegFault (11) and drop these cores.  Most of them look almost identical in gdb, the r->the_request doesn't really point to any single piece of code.  On one box, I've had 3 core dumps over ~20k requests in the last half hour.  I don't see the same behavior on another box that handled ~7k requests over the last half hour (possibly hitting limitations?), our legacy boxes are handling ~42k requests over the same time frame.  All of our boxes are RHEL6.  


#0  0x00007f779e682cfc in php_couchbase_get_callback (instance=<value optimized out>, cookie=0x7f77b387c968, error=LCB_KEY_ENOENT, resp=0x7fff608742b0)
    at php-ext-couchbase-1.1.5/get.c:20
#1  0x00007f779e4619bf in ?? () from /usr/lib64/libcouchbase.so.2
#2  0x00007f779e458fce in ?? () from /usr/lib64/libcouchbase.so.2
#3  0x00007f778694eb44 in event_base_loop () from /usr/lib64/libevent-1.4.so.2
#4  0x00007f779e464998 in lcb_wait () from /usr/lib64/libcouchbase.so.2
#5  0x00007f779e684c14 in php_couchbase_get_impl (ht=<value optimized out>, return_value=<value optimized out>, return_value_ptr=0x0, this_ptr=<value optimized out>, 
    return_value_used=1, multi=1, oo=1, lock=0, touch=0, replica=0) at php-ext-couchbase-1.1.5/get.c:351
#6  0x00007f779e67d3ee in zim_couchbase_getMulti (ht=<value optimized out>, return_value=<value optimized out>, return_value_ptr=<value optimized out>, this_ptr=<value optimized out>, 
    return_value_used=<value optimized out>) at php-ext-couchbase-1.1.5/apidecl.c:653
#7  0x00007f77a35fa0c8 in zend_do_fcall_common_helper_SPEC (execute_data=<value optimized out>) at /usr/src/debug/php-5.3.3/Zend/zend_vm_execute.h:316
#8  0x00007f77a35d1400 in execute (op_array=0x7f77b32dd508) at /usr/src/debug/php-5.3.3/Zend/zend_vm_execute.h:107
#9  0x00007f77a35abb3d in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /usr/src/debug/php-5.3.3/Zend/zend.c:1194
#10 0x00007f77a3559da8 in php_execute_script (primary_file=0x7fff60876d10) at /usr/src/debug/php-5.3.3/main/main.c:2261
#11 0x00007f77a3634a85 in php_handler (r=0x7f77b344ac28) at /usr/src/debug/php-5.3.3/sapi/apache2handler/sapi_apache2.c:669
#12 0x00007f77adea9b00 in ap_run_handler ()
#13 0x00007f77adead3be in ap_invoke_handler ()
#14 0x00007f77adeb8a30 in ap_process_request ()
#15 0x00007f77adeb58f8 in ?? ()
#16 0x00007f77adeb1608 in ap_run_process_connection ()
#17 0x00007f77adebd807 in ?? ()
#18 0x00007f77adebdb1a in ?? ()
#19 0x00007f77adebe79c in ap_mpm_run ()
#20 0x00007f77ade95900 in main ()


anyone seen anything like this or have a glaringly obvious insight?  been googling through forums, bug reports, releases and haven't found anything concrete.  

Taylor Fort

unread,
Aug 8, 2013, 12:38:06 PM8/8/13
to couc...@googlegroups.com
Also... here's my relevant php -i info

couchbase

couchbase support => enabled
version => 1.1.4dp1
libcouchbase version => 2.0.7
json support => yes
fastlz support => yes
zlib support => yes
igbinary support => no

Directive => Local Value => Master Value
couchbase.compression_factor => 1.3 => 1.3
couchbase.compression_threshold => 2000 => 2000
couchbase.compressor => none => none
couchbase.config_cache => no value => no value
couchbase.durability_default_poll_interval => 100000 => 100000
couchbase.durability_default_timeout => 40000000 => 40000000
couchbase.instance.persistent => On => On
couchbase.restflush => On => On
couchbase.serializer => php => php
couchbase.view_timeout => 75 => 75

Matt Ingenthron

unread,
Aug 8, 2013, 1:19:05 PM8/8/13
to couc...@googlegroups.com
Hi Taylor,


From: Taylor Fort <taylo...@gmail.com>
Reply-To: "couc...@googlegroups.com" <couc...@googlegroups.com>
Date: Wednesday, August 7, 2013 6:50 PM
To: "couc...@googlegroups.com" <couc...@googlegroups.com>
Subject: php-ext-couchbase-1.1.5 sporatically coredumping on getMulti

We're testing couchbase php sdk with couchbase 2.0.1 community edition (6 boxes).  I'm noticing the following symptoms pop up on our apache webs where a child process will SegFault (11) and drop these cores.  Most of them look almost identical in gdb, the r->the_request doesn't really point to any single piece of code.  On one box, I've had 3 core dumps over ~20k requests in the last half hour.  I don't see the same behavior on another box that handled ~7k requests over the last half hour (possibly hitting limitations?), our legacy boxes are handling ~42k requests over the same time frame.  All of our boxes are RHEL6.  

Some things like this have happened sporadically from what I know, but it sounds like you're hitting them more regularly than we have.  They are likely problems with the callback in the underlying CCBC.  According to one of the other guys, we've seen some situations where callbacks seem to be skipped, but we've not had good data on what's happening yet.

Can you clarify, what is running on the "legacy boxes" and why is the load different?

I filed https://www.couchbase.com/issues/browse/PCBC-244 to track this.  Is there any chance you can attach one of these cores and give us specifics about the OS version?  This may help.

Also, if it is a callback problem, it may be possible to workaround it before a fix is available by not using the get multi directly, but just doing a series of gets.  I don't know that for certain, but thanks for bringing this up since we'll probably be able to do more to find it.


--
You received this message because you are subscribed to the Google Groups "Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email to couchbase+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Taylor Fort

unread,
Aug 8, 2013, 2:46:38 PM8/8/13
to couc...@googlegroups.com

Hey Matt,

We're in a transitional period of testing couchbase against live traffic.  We're just lower the new servers weights.  We see the coredumps as we increase the amount of traffic to those newer servers.  It's simply a load balancing config.

I do notice all our cores have a frame 0 with the callback.  Most of the time the error is LCB_KEY_ENOENT.  Could improper exception handling (potentially on our side) cause core dumps on high loads? 

Looping through gets vs multi gets would potentially hurt our response time correct?  Or is the internal c lib doing this on its own anyway?  Im worried about round trip , sequential gets causing a substantial lift to our response.

Also after looking at ini config for couchbase,  ive added cache config value for the topology caching but I dont think that will impact this issue.

--Taylor

You received this message because you are subscribed to a topic in the Google Groups "Couchbase" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/couchbase/t69Qqdl2ClQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to couchbase+...@googlegroups.com.

Taylor Fort

unread,
Aug 8, 2013, 10:47:31 PM8/8/13
to couc...@googlegroups.com
So... I've been trying everything imaginable to solve or at least ease this issue.  I stumbled across a blog post on the couchbase site that talked about the config_cache option in the .ini, which I found digging around the git repo.  I figured this was going to be best practices now anyway, so might as well implement it in the mean time regardless.  Set the config_cache to /tmp and restarted httpd.  I have not seen a coredump since and have tested the newly configured boxes with 4x's our normal traffic qps, which happily chug along.  *knock on wood* I can't really explain why this works or fixed my issue, but... it did?

--Taylor 


On Wednesday, August 7, 2013 6:50:32 PM UTC-7, Taylor Fort wrote:

M. Nunberg

unread,
Aug 8, 2013, 10:57:36 PM8/8/13
to couc...@googlegroups.com
The case seems to be that you are receiving spurious callbacks from
libcouchbase. This is a known issue that has come up from people who are
using specifically the *Multi operations on various SDKs (this has been
seen in Python as well).

The issue seems to be here:

diff --git a/src/handler.c b/src/handler.c
index 8ef7e81..1041594 100644
--- a/src/handler.c
+++ b/src/handler.c
@@ -279,11 +279,11 @@ int lcb_lookup_server_with_command(lcb_t instance,
lcb_server_t *server;
lcb_size_t nr;
lcb_size_t ii;
- lcb_size_t offset = 0;

for (ii = 0; ii < instance->nservers; ++ii) {
server = instance->servers + ii;
if (server != exc) {
+ lcb_size_t offset = 0;
while ((nr = ringbuffer_peek_at(&server->cmd_log,
offset,
cmd.bytes,


If you're feeling adventurous to try it out then you should apply this
to libcouchbase. This fix will be included in an upcoming release.

While the config_cache isn't directly related to this issue, it does
reduce overall load on the system -- and it seems this issue we're
seeing here seems to be happening more frequently under load.

Regards, Mark

Taylor Fort

unread,
Aug 9, 2013, 2:03:16 AM8/9/13
to couc...@googlegroups.com
I'll wait for an official release.  What's your typical release cycle?  As needed, or is there some sort of time table for the fix?

Thanks for looking into it!

--Taylor



For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to a topic in the Google Groups "Couchbase" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/couchbase/t69Qqdl2ClQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to couchbase+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
~Taylor Fort

Sergey Avseyev

unread,
Aug 19, 2013, 1:39:35 PM8/19/13
to Couchbase Google Group
We've released libcouchbase 2.1.0 recently, so you can go and update
libcouchbase package on your box. Please reply here about results

--
Sergey Avseyev
>>> an email to couchbase+...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Couchbase" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/couchbase/t69Qqdl2ClQ/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> couchbase+...@googlegroups.com.
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> --
> ~Taylor Fort
>
> --
> You received this message because you are subscribed to the Google Groups
> "Couchbase" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to couchbase+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages