tombstone problem

83 views
Skip to first unread message

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 5, 2024, 11:07:46 AMFeb 5
to ScyllaDB users
We have a problem with the tombstone in the database.

We can trigger it in a reproducible way by the following way:
Step 1: Insert 200000 key/values.
Step 2: Delete 100000 keys selected at random in those.
Step 3: Then read the remaining data.

The tombstone problem occurs with the third step. Note that
we are using the "query_paged" command to read the data.

If this can help, we can provide a short rust program that triggers
the problem.

  Mathieu

Avi Kivity

<avi@scylladb.com>
unread,
Feb 5, 2024, 11:21:39 AMFeb 5
to scylladb-users@googlegroups.com
What is the problem you see?

What ScyllaDB version are you using?
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/9c04afa6-b46e-45b5-ad40-44680f5c6ccen%40googlegroups.com.

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 7, 2024, 2:46:48 AMFeb 7
to ScyllaDB users
The problem I see is the error
"Tombstones processed by unpaged query exceeds limit of 10000"
this is in direct contradiction with my use of "query_paged" in
my code.

About the version, I cannot tell since I am using the docker
and it does not have a version attached to it.

But I can see from "docker image inspect scylladb/scylla"
        "Id": "sha256:d75cb647338013684c7c098742c07fdd7f9d473d5922c60f0e9745c27b242e0f",
        "Created": "2023-12-31T09:50:22.830874168Z",

Avi Kivity

<avi@scylladb.com>
unread,
Feb 7, 2024, 11:52:25 AMFeb 7
to scylladb-users@googlegroups.com
You can run 'docker run --rm --entrypoint /usr/bin/scylla scylladb/scylla --version' to get the version.

It's recommended to pull an explicit tag, not whatever random version happens to be at the time you pull.

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 11, 2024, 12:09:23 PMFeb 11
to ScyllaDB users
The version is 5.4.1-0.20231231.3d22f42cf9c3

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 11, 2024, 12:09:28 PMFeb 11
to ScyllaDB users
The version is 

5.4.1-0.20231231.3d22f42cf9c3


On Wednesday 7 February 2024 at 17:52:25 UTC+1 Avi Kivity wrote:

Avi Kivity

<avi@scylladb.com>
unread,
Feb 11, 2024, 12:25:28 PMFeb 11
to scylladb-users@googlegroups.com
You can work around it by changing the config item query_tombstone_page_limit.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 11, 2024, 1:44:32 PMFeb 11
to ScyllaDB users
Yes, I know that.
But that is not the point.

The error message says "Tombstones processed by unpaged query exceeds limit of 10000"
But I am using "query_paged".

Therefore the error should not happen.

So either:
* The error message is incorrect.
* There is another bug in the querying system.

  Mathieu

Avi Kivity

<avi@scylladb.com>
unread,
Feb 12, 2024, 5:12:59 AMFeb 12
to scylladb-users@googlegroups.com
Well, that shouldn't happen.

It could be that the server converted a paged query to an unpaged query.


What was the query string?

Mathieu Dutour

<mathieu.dutour@gmail.com>
unread,
Feb 13, 2024, 11:27:35 AMFeb 13
to ScyllaDB users
Thank you for the agreement.

The queries used are:
"SELECT k FROM kv.pairs WHERE dummy = 0 AND k >= ? AND k < ? ALLOW FILTERING"

Before that for inserting the batch, the operations are:
"INSERT INTO kv.pairs (dummy, k, v) VALUES (0, ?, ?)"
"DELETE FROM kv.pairs WHERE dummy = 0 AND k = ?"

An integrated test that shows up exactly the problem is illustrated
It works for less than 10000 deletion and it crashes for more than 10000
deletions because of the tombstone problem.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2024, 12:04:34 PMFeb 13
to scylladb-users@googlegroups.com, bdenes@scylladb.com
Botond, how does this fit with #17241? That addressed unpaged queries but here we see the problem for paged queries.

Did we lose the paged/unpaged flag somehow?
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2024, 12:18:27 PMFeb 13
to scylladb-users@googlegroups.com, bdenes@scylladb.com
Well wireshark shows the problem, the query is actually unpaged (see
the Page Size flag).

Cassandra CQL Protocol
Version: 0x04
.... 0100 = Protocol version: 4
0000 .... = Direction: Request (0x0)
Flags: 0x00
.... ...0 = Compression: False
.... ..0. = Tracing: False
.... .0.. = Custom Payload: False
.... 0... = Warning: False
0000 .... = Reserved: 0x0
Stream Identifier: 0
.000 0111 = Opcode: QUERY (7)
Message Length: 96
Query
String Length: 75
String: SELECT k FROM kv.pairs WHERE dummy = 0 AND k >= ? AND k
< ? ALLOW FILTERING
Consistency: LOCAL_QUORUM (0x0006)
Flags: 0x11, Values, Serial Consistency
.... ...1 = Values: True
.... ..0. = Skip Metadata: False
.... .0.. = Page Size: False
.... 0... = Paging State: False
...1 .... = Serial Consistency: True
..0. .... = Default Timestamp: False
.0.. .... = Names for Values: False
0... .... = Reserved: 0x0
Value count: 2
Bytes length: 1
Bytes: 00
Bytes length: 1
Bytes: 01
Consistency: LOCAL_SERIAL (0x0009)



Command: tshark -V -i lo port 9042




So either query_paged doesn't do what one expects, or something else
went wrong.

Karol Baryla

<karol.baryla@scylladb.com>
unread,
Feb 23, 2024, 11:31:08 AMFeb 23
to ScyllaDB users
`query_paged` in Rust driver is actually a bit misleading name - I opened an issue about it now: https://github.com/scylladb/scylla-rust-driver/issues/940
The only difference between `query()` and `query_paged()` is that the latter accepts `paging_state` argument.
Whether the query is paged or not is actually controlled by `page_size` attribute on `Query` struct. Look into those methods:

```
Query::with_page_size()
Query::set_page_size
Query::disable_paging
Query::get_page_size
```

Also, do you know about `query_iter()` method on `Session`? This is easier way to perform paged selects as you won't need to fetch next pages manually, driver will take care of it.
It's also not possible to forget about enabling paging using `query_iter` because it will set the default page size (5000) if none is set.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 23, 2024, 1:48:16 PMFeb 23
to scylladb-users@googlegroups.com
It would be better to have two layers for this, a general purpose one that hides paging, and a low-level one that exposes the paging mechanics.

Meanwhile I recommend to add better names and deprecate the misleading names.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages