My list is empty. But how?

375 views
Skip to first unread message

ge...@rxmg.com

unread,
Aug 1, 2018, 12:46:55 PM8/1/18
to Redis DB
I'm a lead PHP engineer overseeing some improvements to a system that formerly used an arcane concoction of randomize functions to pick from a set of things and distribute things to one of the randomly-picked items in an array. 

I re-architected the randomization functions to use a Redis list and an RPOPLPUSH operation with the same key twice (to make a cycling ring of items), because in effect, the distributions for our business model are not and should not be random at all, but highly controlled.

Anyway, so we're deployed on AWS using Redis in ElastiCache. We have a 50/50 split distribution in Redis list:

1 lpush('fruit', 'apple')
2 lpush('fruit', 'banana')

When we want the next fruit:

3 $nextfruit = $redis->rpoplpush('fruit', 'fruit');
// $nextfruit is 'banana'

3 $nextfruit = $redis->rpoplpush('fruit', 'fruit');
// $nextfruit is 'apple'

and so on.

Here's where it gets really bizarre.

Two days in a row, I've come into work to a "it doesn't work again" bug report and find out the list is empty. 

The key is still there, but there's no items in it anymore. 

I am 99.9% certain that no code in our entire system is actually supposed to be writing to this key - I manually lpush'd the values into it. The only operation is the RPOPLPUSH as illustrated above.

As soon as I run lines 1 and 2 again, everything is fine and dandy.

Any clue what could be causing this?

I tried a (blocking) BRPOPLPUSH but it took like 20 seconds to complete and we have not even placed the system under any kind of load so I switched it back to a non-blocking operation. Only a handful of users are actually performing the function that calls the code where the RPOPLPUSH takes place, so I find it really hard to believe it's even a concurrency issue, but still,  I was under the impression that "all operations are atomic."

It would seem to reason that (in other crappier data stores) that if the POP is occurring, but maybe the PUSH isn't, that could "empty the list"... but that's not possible in Redis lists, right?
 

>>> Redis::info()
=> [
     "Server" => [
       "redis_version" => "3.2.4",
       "redis_git_sha1" => "0",
       "redis_git_dirty" => "0",
       "redis_build_id" => "0",
       "redis_mode" => "standalone",
       "os" => "Amazon ElastiCache",
       "arch_bits" => "64",
       "multiplexing_api" => "epoll",
       "gcc_version" => "0.0.0",
       "process_id" => "1",
       "run_id" => "aa30fb0d9d22003abe84c88288281e4ae8bb466e",
       "tcp_port" => "6379",
       "uptime_in_seconds" => "40503574",
       "uptime_in_days" => "468",
       "hz" => "10",
       "lru_clock" => "6413282",
       "executable" => "-",
       "config_file" => "-",
     ],
     "Clients" => [
       "connected_clients" => "85",
       "client_longest_output_list" => "0",
       "client_biggest_input_buf" => "0",
       "blocked_clients" => "0",
     ],
     "Memory" => [
       "used_memory" => "6355376",
       "used_memory_human" => "6.06M",
       "used_memory_rss" => "22966272",
       "used_memory_rss_human" => "21.90M",
       "used_memory_peak" => "1090178424",
       "used_memory_peak_human" => "1.02G",
       "used_memory_lua" => "36864",
       "used_memory_lua_human" => "36.00K",
       "maxmemory" => "3461349376",
       "maxmemory_human" => "3.22G",
       "maxmemory_policy" => "volatile-lru",
       "mem_fragmentation_ratio" => "3.61",
       "mem_allocator" => "jemalloc-4.0.3",
     ],
     "Persistence" => [
       "loading" => "0",
       "rdb_changes_since_last_save" => "46398029",
       "rdb_bgsave_in_progress" => "0",
       "rdb_last_save_time" => "1492636364",
       "rdb_last_bgsave_status" => "ok",
       "rdb_last_bgsave_time_sec" => "-1",
       "rdb_current_bgsave_time_sec" => "-1",
       "aof_enabled" => "0",
       "aof_rewrite_in_progress" => "0",
       "aof_rewrite_scheduled" => "0",
       "aof_last_rewrite_time_sec" => "-1",
       "aof_current_rewrite_time_sec" => "-1",
       "aof_last_bgrewrite_status" => "ok",
       "aof_last_write_status" => "ok",
     ],
     "Stats" => [
       "total_connections_received" => "5993993",
       "total_commands_processed" => "381545021",
       "instantaneous_ops_per_sec" => "16",
       "total_net_input_bytes" => "21032777213",
       "total_net_output_bytes" => "61084348387",
       "instantaneous_input_kbps" => "0.83",
       "instantaneous_output_kbps" => "4.58",
       "rejected_connections" => "0",
       "sync_full" => "0",
       "sync_partial_ok" => "0",
       "sync_partial_err" => "0",
       "expired_keys" => "4062869",
       "evicted_keys" => "0",
       "keyspace_hits" => "15367057",
       "keyspace_misses" => "298696121",
       "pubsub_channels" => "0",
       "pubsub_patterns" => "0",
       "latest_fork_usec" => "0",
       "migrate_cached_sockets" => "0",
     ],
     "Replication" => [
       "role" => "master",
       "connected_slaves" => "0",
       "master_repl_offset" => "0",
       "repl_backlog_active" => "0",
       "repl_backlog_size" => "1048576",
       "repl_backlog_first_byte_offset" => "0",
       "repl_backlog_histlen" => "0",
     ],
     "CPU" => [
       "used_cpu_sys" => "28082.15",
       "used_cpu_user" => "33219.75",
       "used_cpu_sys_children" => "0.00",
       "used_cpu_user_children" => "0.00",
     ],
     "Cluster" => [
       "cluster_enabled" => "0",
     ],
     "Keyspace" => [
       "db0" => [
         "keys" => "3776",
         "expires" => "4",
         "avg_ttl" => "37148",
       ],
     ],
   ]

ge...@rxmg.com

unread,
Aug 1, 2018, 12:55:02 PM8/1/18
to Redis DB
Oh, I forgot to mention - we're using predis/predis ^1.1 and laravel/framework 5.5.*.

Itamar Haber

unread,
Aug 1, 2018, 1:22:23 PM8/1/18
to Redis DB
Hello Geoff,

Very weird indeed. Empty keys are not supposed to be possible in Redis, and `RPOPLPUSH` is indeed atomic. Perhaps this should be pursued with the kind folks at AWS, as this is too puzzling.

Cheers,

On Wed, Aug 1, 2018 at 7:55 PM, <ge...@rxmg.com> wrote:
Oh, I forgot to mention - we're using predis/predis ^1.1 and laravel/framework 5.5.*.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber
Technicalist Evangenly

Phone: +972.54.567.9692

Redis Labs


Nick Farrell

unread,
Aug 1, 2018, 4:37:34 PM8/1/18
to Redis DB
Do you store any other data in this redis database?

I suspect your entire database is being wiped by something else you are running, or the AWS instance is being restarted and wasn't persisted properly.

I don't use AWS but you should be able to enable statement logging somehow and monitor the actual commands being sent. Worst case by running redis cli in slave mode.

geoff hoffman

unread,
Aug 2, 2018, 3:14:29 PM8/2/18
to redi...@googlegroups.com
On Wed, Aug 1, 2018 at 1:37 PM, Nick Farrell <nicholas...@gmail.com> wrote: 
Do you store any other data in this redis database?

I suspect your entire database is being wiped by something else you are running, or the AWS instance is being restarted and wasn't persisted properly.



Definitely other stuff in there. Maybe it's being rebooted.  I thought that persist was the default for Redis in ElastiCache and don't think we changed it.



On Wed, Aug 1, 2018 at 10:21 AM, Itamar Haber <ita...@redislabs.com> wrote:

Very weird indeed. Empty keys are not supposed to be possible in Redis, and `RPOPLPUSH` is indeed atomic. Perhaps this should be pursued with the kind folks at AWS, as this is too puzzling.


I'll investigate further and submit a ticket. 

Thanks for the confirmation guys.

 

hva...@gmail.com

unread,
Aug 4, 2018, 8:03:25 PM8/4/18
to Redis DB
The key is still there, but there's no items in it anymore.

I do believe that when you come back to your Redis instance and check for the key, you see a key by the same name, but I have a very hard time believing the key is a list with no elements in it.  I believe it's another type of key, like an ordinary string.

The next time this happens, would you connect redis-cli directly to the Redis instance and first type "TYPE keyname" and post the results?

Another thing I would suggest is that you try with Redis installed on an EC2 server.  See if the problem occurs there, and only move it to Elasticache afterward.  If the problem appears in one installation and not in the other, you've gained important knowledge about the source of the trouble and indications on how to fix it.

Geoffrey Hoffman

unread,
Aug 4, 2018, 9:35:54 PM8/4/18
to redi...@googlegroups.com
It’s a list, and it’s empty. I ran a get on it and it failed with wrong key type. I ran lrange keyname 0 -1 and it returned []

Has to be something in my code emptying it or refilling it with null or some such.

hva...@gmail.com

unread,
Aug 4, 2018, 10:07:36 PM8/4/18
to Redis DB
I don't understand.  Are you saying it's a list with no elements at all, or a list with an empty string in element 0?

Geoffrey Hoffman

unread,
Aug 5, 2018, 12:15:35 AM8/5/18
to redi...@googlegroups.com
The key is there (not expired), it is of type “list” and there is nothing in it.

As if both items have been popped and not replaced to the end of the list.

hva...@gmail.com

unread,
Aug 6, 2018, 1:01:02 PM8/6/18
to Redis DB
I'm just a Redis user, not a guru or a RedisLabs employee, so the best advice I can offer is to try this on a Redis installed on an EC2 server.  As a side note, AWS just announced ElastiCache is now available with Redis v4.0.  Perhaps this puzzling problem won't appear with a newer version (or a non-ElastiCache one).

ge...@rxmg.com

unread,
Aug 21, 2018, 3:16:08 PM8/21/18
to Redis DB
I am coming back to this original thread, because I am now noticing that on another ElastiCache Redis 3.2.4 in Amazon, I am seeing keys without expirations, get evicted, or disappear for no reason I can identify, as well. So on one server I have a LIST that is becomming empty without any identifiable cause, and on another server I have keys going missing for no apparent reason. Here is the example, I am doing a KEYS *pattern* on a specific tracking code, and we launched the code June 22, so I should have at least 1 week of JUNE, 4 weeks of July data/keys, and all of August keys for this code. However I only have about the last 7 days. 

I am creating these each day that our code runs, with a single INCR for the counts and a LPUSH for the lists of emails that come into our website. No expiration on the keys.


 1) "rxst:p:460024:good:20180814:emails"
 2) "rxst:p:460024:good:20180819"
 3) "rxst:p:460024:good:20180821:emails"
 4) "rxst:p:460024:good:20180821"
 5) "rxst:p:460024:good:20180815:emails"
 6) "rxst:p:460024:good:20180816"
 7) "rxst:p:460024:good:20180819:emails"
 8) "rxst:p:460024:good:20180818"
 9) "rxst:p:460024:good:20180820"
10) "rxst:p:460024:good:20180817"
11) "rxst:p:460024:good:20180816:emails"
12) "rxst:p:460024:good:20180815"
13) "rxst:p:460024:good:20180817:emails"
14) "rxst:p:460024:good:20180814"
15) "rxst:p:460024:good:20180818:emails"
16) "rxst:p:460024:good:20180820:emails"


That's all I get. 


# Server
redis_version:3.2.4
redis_git_sha1:0
redis_git_dirty:0
redis_build_id:0
redis_mode:standalone
os:Amazon ElastiCache
arch_bits:64
multiplexing_api:epoll
gcc_version:0.0.0
process_id:1
run_id:5fea7439e2ca207ca0630ed454a0995f7ec4a523
tcp_port:6379
uptime_in_seconds:901350
uptime_in_days:10
hz:10
lru_clock:8148180
executable:-
config_file:-

# Clients
connected_clients:87
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:143221536
used_memory_human:136.59M
used_memory_rss:149532672
used_memory_rss_human:142.61M
used_memory_peak:143867152
used_memory_peak_human:137.20M
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:3461349376
maxmemory_human:3.22G
maxmemory_policy:volatile-lru
mem_fragmentation_ratio:1.04
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:3848226
rdb_bgsave_in_progress:0
rdb_last_save_time:1533973486
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:68704
total_commands_processed:18720635
instantaneous_ops_per_sec:23
total_net_input_bytes:1023210230
total_net_output_bytes:1349178057
instantaneous_input_kbps:1.24
instantaneous_output_kbps:0.12
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:35542
evicted_keys:0
keyspace_hits:1460591
keyspace_misses:13077268
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:700277
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1852
repl_backlog_histlen:698426

# CPU
used_cpu_sys:955.06
used_cpu_user:711.02
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=658484,expires=8,avg_ttl=1060583


I asked amazon about it. Their reply was they can't see the data in our cluster for security, but:


Rebooting a redis cluster will remove all the keys present in the cache, it will never partially remove some keys. I checked the Elasticache and found that no reboot action has been performed on the node.


So the question is now even larger - where is my data going? I am really having a hard time understanding these bizarre inconsistencies.



Reply all
Reply to author
Forward
0 new messages