SegFault in Crawler Part

Qingchen Dang

unread,

May 31, 2021, 1:02:38 AM5/31/21

to memcached

Hi,

I am implementing a framework based on Memcached. There's a problem that confused me a lot. The framework basically change the eviction policy, so when it calls to evict an item, it might not evict the tail item at COLD LRU, instead it will look for a "more suitable" item to evict and it will reinsert the tail items to the head of COLD queue.

It mostly works fine, but sometimes it causes a SegFault when reinsertion happens very frequently (like in almost each eviction). The SegFault is triggered in the crawler part. As attached, it seems when the crawler loops through the item queue, it reaches an invalid memory address. The bug happens after around 50000000~10000000 GET/SET (9:1) operations. I used Memaslap for testing.

Could anyone give me some suggestions of the reasons which cause such error?

Here is the gdb messages:

Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7ffff4d6c700 (LWP 36414)]

do_item_crawl_q (it=it@entry=0x55555579e7e0 <crawlers+12320>)

at items.c:2015

2015 it->prev->next = it->next;

(gdb) print it->prev

$5 = (struct _stritem *) 0x4f4d6355616d5471

(gdb) print it->prev->next

Cannot access memory at address 0x4f4d6355616d5479

(gdb) print it->next

$6 = (struct _stritem *) 0x7a59324376753351

(gdb) print it->next->prev

Cannot access memory at address 0x7a59324376753361

(gdb) print it->nkey

$7 = 0 '\000'

(gdb)

Here is the part that triggers the error:

2012 assert(it->next != it);

2013 if (it->next) {

2014 assert(it->prev->next == it);

2015 it->prev->next = it->next;

2016 it->next->prev = it->prev;

2017 } else {

2018 /* Tail. Move this above? */

2019 it->prev->next = 0;

2020 }

(I'm also confused why the assert function in line 2014 does not give error?)

Thank you very much for helping!

Best,

Qingchen

Message has been deleted

dormando

unread,

Jun 1, 2021, 2:36:09 AM6/1/21

to memcached

try '-o no_lru_crawler' ? That definitely works.

I don't know what you're doing since no code has been provided. The locks
around managing LRU tails is pretty strict; so make sure you are actually
using them correctly.

The LRU crawler works by injecting a fake item into the LRU, then using
that to keep its position and walk. If I had to guess I bet you've
"evicted" the LRU crawler, which then immediately dies when it tries to
continue crawling.

On Mon, 31 May 2021, Qingchen Dang wrote:

> Furthermore, I tried to disable the crawler with the '- no_lru_crawler' command parameter, and it gives the same error. I wonder why it does not disable
> the crawler lru as it supposes to do.

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to memcached+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/1398d377-06b8-4a43-8811-f299d044d055n%40googlegroups.com.
>
>

Qingchen Dang

unread,

Jun 1, 2021, 11:08:19 PM6/1/21

to memcached

Thank you very much! Yes your guess is correct, I forgot the possibility of evicting a crawler item :(

Furthermore, I have a similar problem as this post: https://github.com/memcached/memcached/issues/467

I gave a very limited memory usage to Memcached to test eviction and it does cause the similar error.

When I use Memtier_Benchmark, the error looks like:

[RUN #1] Preparing benchmark client...

[RUN #1] Launching threads now...

error: response parsing failed.

server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory storing object

error: response parsing failed.

server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory storing object

error: response parsing failed.

[RUN #1 17%, 0 secs] 1 threads: 87137 ops, 87213 (avg: 87213) ops/sec, 65.66MB/sec (avg: 65.66MB/sec

[RUN #1 36%, 1 secs] 1 threads: 179012 ops, 91864 (avg: 89540) ops/sec, 69.87MB/sec (avg: 67.76MB/sec

[RUN #1 56%, 2 secs] 1 threads: 279971 ops, 100947 (avg: 93343) ops/sec, 76.76MB/sec (avg: 70.76MB/sec

[RUN #1 75%, 3 secs] 1 threads: 375715 ops, 95732 (avg: 93941) ops/sec, 72.87MB/sec (avg: 71.29MB/sec

[RUN #1 92%, 4 secs] 1 threads: 462054 ops, 93910 (avg: 93935) ops/sec, 71.41MB/sec (avg: 71.31MB/sec

[RUN #1 92%, 4 secs] 1 threads: 462054 ops, 0 (avg: 92431) ops/sec, 0.00KB/sec (avg: 70.17MB/sec)

[RUN #1 92%, 5 secs] 1 threads: 462054 ops, 0 (avg: 90975) ops/sec, 0.00KB/sec (avg: 69.06MB/sec)

[RUN #1 92%, 5 secs] 1 threads: 462054 ops, 0 (avg: 89564) ops/sec, 0.00KB/sec (avg: 67.99MB/sec)

When I use Memaslap, it looks like

set proportion: set_prop=0.10

get proportion: get_prop=0.90

<12 SERVER_ERROR out of memory storing object

<10 SERVER_ERROR out of memory storing object

<12 SERVER_ERROR out of memory storing object

<7 SERVER_ERROR out of memory storing object

The unmodified Memcached gives errors less frequently than Memcached with my eviction framework (especially using Memtier_Benchmark), so I wonder the reason. I read your post message in the above link, but I am still confused about why memory limitation affect Memcached's usage. Could you give a more detailed explanation? If I have to give limited memory, is there a way to avoid this issue?

Thank you very much for helping!

Best,

Qingchen

dormando

unread,

Jun 1, 2021, 11:22:52 PM6/1/21

to memcached

You can't evict memory that's being used to load data from the network.
So if you have a low amount of memory and run a benchmark doing a bunch of
parallel writes you're going to be sad.

> To view this discussion on the web visit https://groups.google.com/d/msgid/memcached/af1ed252-952d-49ef-868c-5d134af847a5n%40googlegroups.com.
>
>

Reply all

Reply to author

Forward