IngestExternalFile() breaks GetUpdatesSince() due to "gap in sequence numbers"

28 views
Skip to first unread message

Jan Steemann

unread,
May 17, 2022, 11:29:36 AM5/17/22
to rocksdb
Hi everyone,
sorry for the spam, I just opened an issue in the RocksDB Github issue tracker with the same title:

The full details about the problem can be found in that Github issue, but the bottom line is that when using a combination of snapshots and IngestExternalFile, RocksDB seems to bump the WAL sequence number so that there can be gaps in the WAL file sequence numbers. This will break GetUpdatesSince(), which checks that the sequence numbers in the WAL are contiguous. GetUpdatesSince() and the TransactionLogIterator then fail with "Corruption: gap in sequence numbers" errors.

It seems to me that this combination of features (snapshots, IngestExternalFile(), GetUpdatesSince()) is not often used, so maybe it is just unsupported.

We would actually like to move some of our code parts to using IngestExternalFile(), given its better performance and other advantages over multiple WriteBatch::Puts. But currently it looks like this would be unsafe.
Are there ways to make RocksDB _not_ bump the sequence number when calling IngestExternalFile()? I found that there is a flag "snapshot_consistency" in the ingestion options, but it doesn't seem to remove the issue completely.

Any guidance on how to move on from here would be really helpful.
Thanks!
J

Yanqin Jin

unread,
May 17, 2022, 11:53:07 AM5/17/22
to Jan Steemann, rocksdb
Hi Jan,
Thanks for reporting. I will need to look at the details, but reading through this email, I think one clarification may be needed.

there can be gaps in the WAL file sequence numbers

It is totally fine for WAL to have non-contiguous sequence number. Applications can write with disableWAL​ enabled and disabled on a per-write basis. Even transactions that commit without prepare can skip writing to the WAL, see the WriteCommittedTxn::CommitWithoutPrepareInternal()​ for details, though I am not entirely sure whether this is a good idea for transaction.

Currently, GetUpdatesSince()​ is a DB​ api, not transactionDB. I personally find it a little misleading that its argument type is TransactionLogIterator​ which seems to imply transaction...

Thanks
Yanqin

From: roc...@googlegroups.com <roc...@googlegroups.com> on behalf of Jan Steemann <jan.st...@gmail.com>
Sent: Tuesday, May 17, 2022 8:29 AM
To: rocksdb <roc...@googlegroups.com>
Subject: IngestExternalFile() breaks GetUpdatesSince() due to "gap in sequence numbers"
 
--
You received this message because you are subscribed to the Google Groups "rocksdb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rocksdb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rocksdb/1d212731-4446-48a7-b0c7-94b11add5b78n%40googlegroups.com.

Jan Steemann

unread,
May 17, 2022, 11:59:10 AM5/17/22
to rocksdb
Hi Yanqin,
thanks for checking!
The problem I have is that our application relies on GetUpdatesSince(), which returns a TransactionLogIterator. And the TransactionLogIterator will _internally_ verify that the sequence numbers are contiguous.
It has 3 places in which it can trigger an error on non-contiguous sequence numbers:

$ grep -n -A 1 "Gap in sequence n" db/transaction_log_impl.cc
134:            "Gap in sequence number. Could not "
135-            "seek to required sequence number");
--
156:        "Gap in sequence number. Could not "
157-        "seek to required sequence number");
--
266:    current_status_ = Status::NotFound("Gap in sequence numbers");
267-    // In seq_per_batch_ mode, gaps in the seq are possible so the strict mode

And the problem is that I don't want to modify the TransactionLogIterator implementation just to ignore any such gaps. I would assume these checks are there for a reason.
Thanks!
J
Reply all
Reply to author
Forward
0 new messages