Hi all, I'm encountering a RDB issue that I don't understand. Would greatly appreciate assistance if anyone has any ideas?
My Redis instance started reporting an error last night: `MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk.`.
Trying to restart the server, or run `redis-check-rdb dump.rdb` both produce an error:
```
[offset 0] Checking RDB file dump.rdb
[offset 26] AUX FIELD redis-ver = '5.0.8'
[offset 40] AUX FIELD redis-bits = '64'
[offset 52] AUX FIELD ctime = '1681088401'
[offset 67] AUX FIELD used-mem = '1359262528'
[offset 83] AUX FIELD aof-preamble = '0'
[offset 85] Selecting DB ID 0
--- RDB ERROR DETECTED ---
[offset 51432] Internal error in RDB reading offset 0, function at rdb.c:2080 -> Ziplist integrity check failed.
[additional info] While doing: read-object-value
[additional info] Reading key '<redacted>'
[additional info] Reading type 14 (quicklist)
[info] 87 keys read
[info] 1 expires
[info] 0 already expired
46161:C 10 Apr 2023 11:42:54.008 # Terminating server after rdb file reading failure.
```
I have automated daily backups, so figured I'd just restore one - but the same issue seems to be present in all backups going back at least 3 months. Am retrieving older backups from storage, but will take some time.
Some quick info:
- Running Redis 7.0.5 (`v=7.0.5 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=d76e64d63dff22a5`)
- Running single Redis instance on one dedicated server. No Cluster, nor Sentinel.
- `uname -a`: `Linux ensso1 4.15.0-208-generic #220-Ubuntu SMP Mon Mar 20 14:27:01 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux`
- Dumps created using `BGSAVE`, and waiting till `LASTSAVE` has updated.
- Some relevant `redis.conf` options:
- stop-writes-on-bgsave-error yes
- rdbcompression yes
- rdbchecksum yes
- appendfsync everysec
- rdb-save-incremental-fsync yes
- Dump files are ~740MB large.
- System has plenty of free memory (52G free of 62G total)
- System has plenty of free disk space (123G free of 197G total for /var/lib/redis)
- System has had no recent hardware or software changes
- Hard disks in RAID1, `mdadm` reporting all disks healthy.
- All Redis `make test` tests pass
- `redis-server --test-memory 62000` passes, let it run for several hours.
-
https://github.com/xueqiu/rdr parses the dumps without any complaints.
-Backups were created as follows:
- `BGSAVE` is run
- `LASTSAVE` is checked periodically until it shows an updated value
- dump.rdb is then compressed, encrypted, and sent to S3.
Since the S3 dumps are decrypting successfully, I believe it's safe to conclude that the file integrity is good.
I've also tried copying the dump files to a different system, but loading also fails there with the same Ziplist integrity error.
My main objectives in order are:
1. Try successfully restore a dump, even if some data is lost.
2. Understand what went wrong, and adjust my backup scheme accordingly.
I've tried searching online for info on RDB errors/corruption, but couldn't find much relevant info. My impression from the "Redis persistence" docs, et al. is that the backup procedure above should be pretty solid. Am I missing something obvious here?
Will update if I reach any conclusions on that.