Redis RDB dump file is corrupted during restart redis node

3,140 views
Skip to first unread message

MANISH PATEL

unread,
Jul 15, 2020, 3:50:57 AM7/15/20
to Redis DB

Hi Team,

Redis RDB dump file is getting corrupted during restart of node and node is not coming up.
I am seeing following error,

1:M 06 Jul 2020 20:58:25.882 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 06 Jul 2020 20:58:25.882 # Server initialized
1:M 06 Jul 2020 20:58:25.882 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 06 Jul 2020 20:58:32.891 # Short read or OOM loading DB. Unrecoverable error, aborting now.
1:M 06 Jul 2020 20:58:32.891 # Internal error in RDB reading function at rdb.c:2124 -> Unexpected EOF reading RDB file

1:C 06 Jul 2020 20:59:01.761 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 06 Jul 2020 20:59:01.761 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1, just started
--------------------------------------------------------------------------------
--- RDB ERROR DETECTED ---
[offset 234881379] Invalid LZF compressed string
[additional info] While doing: read-object-value
[additional info] Reading key '<KEY_NAME>'
[additional info] Reading type 4 (hash-hashtable)
[info] 28 keys read
[info] 0 expires
[info] 0 already expired
--------------------------------------------------------------------------------

Any help will be appreciated.

Thank you and Regards,
Manish

Oran Agra

unread,
Jul 16, 2020, 12:55:41 AM7/16/20
to Redis DB
Hi Manish,
Few random question to try to understand your situation:
Do you mean that happens consistently (every time you reboot)? or did it happen just once?
Are you trying to figure out how to prevent it from happening or trying to find a way to recover your data?
Can you check if the file is just full of zeros from a certain point onward?

I can't think of any way to recover the data. maybe if it's just one lost byte then you can figure it out with a lot of work, but i have a feeling it's just full of zeros or garbage (one way to tell is that the last byte needs to be 0xff).
If it happens consistently then there's probably something wrong with your OS, maybe a bad mount option.

Sorry i can't really offer any help.
    Oran.

MANISH PATEL

unread,
Jul 16, 2020, 4:26:55 AM7/16/20
to Redis DB
Hi Oran,

This issue is not occurring every time. it is seen ~10% time during restart/upgrade of redis nodes. for example, if I restart 6 nodes, this issue might occur on any one node. further, I could see that node with this issue has bigger rdb file size then it's masters/slaves node.
I want to prevent this situation. even now I tried with disabling rdbcompression (rdbcompression no) then I am getting following kind of errors,

----------------------------------------------------------------------
--- RDB ERROR DETECTED ---
[offset 35095977] Internal error in RDB reading function at rdb.c:213 -> Unknown length encoding 2 in rdbLoadLen()

[additional info] While doing: read-object-value
[additional info] Reading key '<KEY_NAME>'
[additional info] Reading type 4 (hash-hashtable)
[info] 1 keys read

[info] 0 expires
[info] 0 already expired
---------------------------------------------------------------------
2nd case:
1:M 15 Jul 2020 10:36:35.353 # Internal error in RDB reading function at rdb.c:498 -> Unknown RDB string encoding type 60
--------------------------------------------------------------------

I don't see any error during shutting down node.
-----------------------------------------------------------------
1:signal-handler (1594885458) Received SIGTERM scheduling shutdown...
1:S 16 Jul 2020 07:44:18.329 # User requested shutdown...
1:S 16 Jul 2020 07:44:18.329 * Saving the final RDB snapshot before exiting.
1:S 16 Jul 2020 07:44:20.990 * DB saved on disk

1:S 16 Jul 2020 07:44:20.991 # Redis is now ready to exit, bye bye...
---------------------------------------------------------------

How can we resolve this issue.
Is there any possibility of data encoding issue.

Nice if anyone can help me on this.


Thank you and Regards,
Manish


Benjamin Sergeant

unread,
Jul 16, 2020, 1:08:07 PM7/16/20
to redi...@googlegroups.com
Can you upgrade to redis-6.0.5 (latest) and see if the problem go away ?

Could it be that you have a redis version N that tries to read a .rdb created by a redis version N + 1 ? still I would expect that to work.


-- 
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redis-db/21f9fc72-8dbc-4eca-a55a-b454a948d8d7n%40googlegroups.com.

MANISH PATEL

unread,
Jul 16, 2020, 1:36:02 PM7/16/20
to redi...@googlegroups.com
Hi,

We are using redis 5.0.7 version. We are just redeploying the same version on the kubernetes(openshift) environment So that it will restart all pods/nodes of slaves at almost the same time(to reproduce the issue).

As we are seeing below error, Is there any possibility of data encoding issues? 
1:M 15 Jul 2020 10:36:35.353 # Internal error in RDB reading function at rdb.c:498 -> Unknown RDB string encoding type 60

Is there any RDB file corruption related fix done on 6.0.5 release.

Thanks & Best Regards,
MANISH PATEL


MANISH PATEL

unread,
Jul 24, 2020, 3:20:03 AM7/24/20
to Redis DB
Hi Team,

I updated redis cluster to 6.0.5 release but still I am seeing that RDB file is getting corrupted during restart sometime.

I have 9 instances cluster setup on openshift environment with 3 masters and 6 slaves(3 shards and 2 slaves for each shard) and RDB file is mounted on NFS.

Mostly, we are observing that RDB file is getting corrupted on one of slaves instances while shutdown.

Are we missing any configuration or any other issue.

It would great if someone could help here.

Greg Andrews

unread,
Jul 24, 2020, 4:54:15 AM7/24/20
to Redis DB
The advice here is to avoid NFS when benchmarking Redis, but I would extend that to normal operation as well.  Redis saving RDB is not merely a "stream" of data, it's a firehose.  As this mailling list post shows, Redis is working at RAM speeds, pushing a flood of bytes into a HD subsystem that's many, many times slower than memory, and NFS is even slower than that.  This stresses the components providing the filesystem, consuming their available bandwidth.  If you're mounting the NFS filesystem via UDP rather than TCP, the risks of data corruption climb higher.  My advice is to not use NFS for RDB files.  Write to local disk and create a cron job to copy to an NFS filesystem for safekeeping if you want.

Any time there's data corruption in a file that a long-running daemon opens and writes data into, I start looking for multiple daemons who are configured to write to the same file rather than each daemon writing to different files.  That would be my other suggestion about the cause.

MANISH PATEL

unread,
Jul 24, 2020, 5:49:47 AM7/24/20
to redi...@googlegroups.com
Hi Greg Andrew, All,

Thank you very much for your detailed response.

Yes.I can understand that there will be a latency to write cache data to RDB on NFS mount. but one thing I am not able to understand from the log that it was logging that DB(RDB) saved to the provided path(NFS) but still the RDB file was corrupted. what would be a root cause for this case.
-----------------------------------------------------------------
1:signal-handler (1594885458) Received SIGTERM scheduling shutdown...
1:S 16 Jul 2020 07:44:18.329 # User requested shutdown...
1:S 16 Jul 2020 07:44:18.329 * Saving the final RDB snapshot before exiting.
1:S 16 Jul 2020 07:44:20.990 * DB saved on disk

1:S 16 Jul 2020 07:44:20.991 # Redis is now ready to exit, bye bye...
---------------------------------------------------------------

Now, we tried to mount the RDB file on the host node itself on openshift where redis cluster instances are deployed(total 9 instances). but still RDB file was getting corrupted.

Is it fine to have redis cluster by disabling RDB itself(no AOF also). is there any major disadvantage by disabling RDB.



Thanks & Best Regards,
MANISH PATEL


Xingbo Wang

unread,
Jul 24, 2020, 8:51:10 AM7/24/20
to redi...@googlegroups.com
Hi Manish

What is the size of the RDB file and overall expected data size? Was the corruption happening at the beginning or middle or end of the RDB file? Was it like file got truncated? Given you mentioned that the node will take a snapshot then reboot. Is the OS get rebooted or just the Redis process? If it is OS reboot, Is it possible that some of the data in file system cache didn’t get flushed before reboot, hence lose of data? Could you try to flush the file system cache before reboot? Is it possible to share the RDB file? Though if it contains sensitive data, I could understand not sharing it. But maybe you could try on your test environment.

Thanks 
Shawn

--
Best Regards,

Xingbo

a mrpre

unread,
Jul 24, 2020, 8:52:01 AM7/24/20
to Redis DB
Please uploading the RDB file if possible. you can try the to DEL the sensitive/secret key of RDB. I want debug it under my env.


在 2020年7月15日星期三 UTC+8下午3:50:57,MANISH PATEL写道:

MANISH PATEL

unread,
Jul 24, 2020, 12:05:42 PM7/24/20
to redi...@googlegroups.com
Hi,

RDB file size is ~1.5 GB. The issue is happening in the middle only. One observation I have is that RDB file size is more than where it is working properly after restart. For example, actual RDB size is 1.5 GB but where the issue is coming that node RDB size is 2.5 GB before shutdown. One more thing is that the key where this issue comes is having less data(content) when I tried to search from RDB for the key. Issue is coming for any key of hset randomly.
Redis RDB file is mounted on the host node of openshift cloud setup. So RDB file is stored on host node before shutdown and kills the pod, then new pod comes after restart and reuse the same RDB file which was mounted on host node.

I will try to load test cache data on RDB and share later when the issue is reproduced.



Thanks & Best Regards,
MANISH PATEL


Benjamin Sergeant

unread,
Jul 24, 2020, 1:15:49 PM7/24/20
to redi...@googlegroups.com
https://github.com/rustudorcalin/deploying-redis-cluster

I am trying to set this up, also on open shift (3.11) / have you seen this deployment style ?
(but I don't know if I will get an NFS drive or not)

  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi


Reply all
Reply to author
Forward
0 new messages