CREATE TABLE IF NOT EXISTS t1 (
i bigint,
g text,
h int,
m text,
PRIMARY KEY ((g, h), i) WITH compaction = {
'tombstone_threshold': '0.2',
'compaction_window_size': '1',
'compaction_window_unit': 'DAYS',
'class': 'TimeWindowCompactionStrategy'
}
AND read_repair_chance = 0
AND dclocal_read_repair_chance = 0
CREATE INDEX ON t1 (m)
i
is in combination with g
an unique numberg
there are around 200,000 unique values for g
and every g
value is an average of 24Bytesh
we created in the past for better data distribution across nodes, if i
is 0-100 than h
is 0 if i
is 1000-1999 than h
is 1. This way we have more unique partition key.Queries:# Insert with TTL, ttl is different based on the value of g (software gives the TTL value, TTL values are in the range of 300000-315360000)
INSERT INTO t1 (i, g, h, m) values (1, 'gvalue', 0, 'mvalue') USING TTL 300000
# In this case we know g and the i range and h we can calculate in the software
SELECT i, m FROM t1 WHERE g = ? AND h = ? AND i >= ? AND i <= ?
# same as above, as we know g and i and calculate h we can select m (should answer with one result or 0)
SELECT m FROM t1 WHERE g = ? AND h = ? AND i =?
# We are creating the index for m for this query, in this situation we only know `m` and need to know i,g
SELECT i, g FROM t1 WHERE m = ? limit 1
# With this query we want to search for the current lowest value of i (as there is an expiry TTL the lowest value will change in time)
SELECT i FROM t1 WHERE g=? AND h=? limit 1
h
solution a smart thing to do? Or do you see a better way?g
values so creating a table for every g
value isn’t going to work I think?lowttl
, midttl
, longttl
or do you know a better way?m
the best solution (also disk space wise)? Or can we better create a separated table with m
as the primary key and a `g`+`i` row so we can do lookups? (edited) 1. Is the h
solution a smart thing to do? Or do you see a better way?
2. I read that mixing different TTL values in the same table is not good practice? (TTL values are in the range from 300,000-315,360,000)
2.1 We have 200000 uniqueg
values so creating a table for everyg
value isn’t going to work I think?
2.2 If we can’t mix different TTL’s in one table we might be able to create a handful of tables likelowttl
,midttl
,longttl
or do you know a better way?
3. Is there a better / smarter way than we do right now?
4. Is the index onm
the best solution (also disk space wise)? Or can we better create a separated table withm
as the primary key and a `g`+`i` row so we can do lookups?
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAPz8vUr68wyYprkk5jq4XK6BY3JV_%3DzbCk7s0v9ydb%3DF7o%2Bi2Q%40mail.gmail.com.
Hi Guys,
I started a thread on Slack but got the advice to start a conversation here.
Currently we have the following in our application that we are re-building:CREATE TABLE IF NOT EXISTS t1 ( i bigint, g text, h int, m text, PRIMARY KEY ((g, h), i) WITH compaction = { 'tombstone_threshold': '0.2', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'class': 'TimeWindowCompactionStrategy' } AND read_repair_chance = 0 AND dclocal_read_repair_chance = 0 CREATE INDEX ON t1 (m)i
is in combination withg
an unique number
g
there are around 200,000 unique values forg
and everyg
value is an average of 24Bytes
h
we created in the past for better data distribution across nodes, ifi
is 0-100 thanh
is 0 ifi
is 1000-1999 thanh
is 1. This way we have more unique partition key.Queries:
# Insert with TTL, ttl is different based on the value of g (software gives the TTL value, TTL values are in the range of 300000-315360000) INSERT INTO t1 (i, g, h, m) values (1, 'gvalue', 0, 'mvalue') USING TTL 300000
Variable TTL is really a poor fit for Time Window compaction.
Moreover, the upper bound of your TTL range is 10 years, that
gives 3650 1-day windows, far too much.
# In this case we know g and the i range and h we can calculate in the software SELECT i, m FROM t1 WHERE g = ? AND h = ? AND i >= ? AND i <= ? # same as above, as we know g and i and calculate h we can select m (should answer with one result or 0) SELECT m FROM t1 WHERE g = ? AND h = ? AND i =? # We are creating the index for m for this query, in this situation we only know `m` and need to know i,g SELECT i, g FROM t1 WHERE m = ? limit 1
Hopefully, there are many m values (few collisions), otherwise
this will not distribute well.
# With this query we want to search for the current lowest value of i (as there is an expiry TTL the lowest value will change in time) SELECT i FROM t1 WHERE g=? AND h=? limit 1What is very important is that we need to be so space efficient as possible. We are inserting more than 200,000,000 rows a day so that's a lot of data.My main questions are:1. Is theh
solution a smart thing to do? Or do you see a better way?
If the number of g values won't grow beyond 200,000, keep it. If
the number of unique g value grows, I recommend to drop it for
simplicity.
200M rows/day is 2000 rows/sec on average. That's not a lot, and
even if the insert job takes just two hours out of the entire day,
that's just 20,000 rows/sec.
2. I read that mixing different TTL values in the same table is not good practice? (TTL values are in the range from 300,000-315,360,000)
It's fine for Size Tiered or Leveled, but not TWCS.
2.1 We have 200000 uniqueg
values so creating a table for everyg
value isn’t going to work I think?
No, it will blow up very quickly.
2.2 If we can’t mix different TTL’s in one table we might be able to create a handful of tables likelowttl
,midttl
,longttl
or do you know a better way?
I recommend not splitting tables artificially (it will mean you
need to query each table/index separately). Just avoid TWCS with
mixed TTLs.
3. Is there a better / smarter way than we do right now?
4. Is the index onm
the best solution (also disk space wise)? Or can we better create a separated table withm
as the primary key and a `g`+`i` row so we can do lookups? (edited)
Under the hood Scylla creates a table with m as the primary key and (g, h, i) as the clustering key, so you won't get much out of it.
Just make sure m has high cardinality.
On 25 Feb 2021, at 15:23, Avi Kivity <a...@scylladb.com> wrote:Variable TTL is really a poor fit for Time Window compaction. Moreover, the upper bound of your TTL range is 10 years, that gives 3650 1-day windows, far too much.
Hopefully, there are many m values (few collisions), otherwise this will not distribute well.
# With this query we want to search for the current lowest value of i (as there is an expiry TTL the lowest value will change in time) SELECT i FROM t1 WHERE g=? AND h=? limit 1What is very important is that we need to be so space efficient as possible. We are inserting more than 200,000,000 rows a day so that's a lot of data.My main questions are:1. Is theh
solution a smart thing to do? Or do you see a better way?
If the number of g values won't grow beyond 200,000, keep it. If the number of unique g value grows, I recommend to drop it for simplicity.
First of all thank you both for your detailed answers!
On 25 Feb 2021, at 15:23, Avi Kivity <a...@scylladb.com> wrote:Variable TTL is really a poor fit for Time Window compaction. Moreover, the upper bound of your TTL range is 10 years, that gives 3650 1-day windows, far too much.
Can this be the reason why my current ScyllaDB cluster is overloaded with compactions? As it’s running now for a few years with TWCS instead of Size Tiered or Leveled and since 6-8 months it’s really overloaded?
The 3650 windows are bad, but don't explain compaction. Maybe
variable TTL does. Raphael?
And is it possible to change it to Size or Leveled tiered in an existing environment? And what will be the impact? (As the cluster is already overloaded).
It's possible (with ALTER TABLE). The cluster will start
compacting to reshape the data to fit size-tiered, while this is
happening reads can be slower. Scylla will also require more free
space.
Yes, all `m` values are unique!Hopefully, there are many m values (few collisions), otherwise this will not distribute well.
# With this query we want to search for the current lowest value of i (as there is an expiry TTL the lowest value will change in time) SELECT i FROM t1 WHERE g=? AND h=? limit 1What is very important is that we need to be so space efficient as possible. We are inserting more than 200,000,000 rows a day so that's a lot of data.My main questions are:1. Is theh
solution a smart thing to do? Or do you see a better way?
If the number of g values won't grow beyond 200,000, keep it. If the number of unique g value grows, I recommend to drop it for simplicity.
All g values are unique and not really growing so keeping `h` is a good thing. What do you think about h value range? Is it better to have larger or smaller value for h?
Smaller ranges (more 'h') lead to better distribution, but if you
want to read all rows for a given 'g' (do you?) then it leads to
more queries. So it depends on the ratio of reads and writes. See
more below.
I can set the bucket size (for calculating h) in the software. In our current cluster we have this at 1,000 this will create:
((‘Gvalue1’, 0), 1-999) (so we have the combination of ‘Gvalue1’, ‘0’ for 1000 times then it will go to((‘Gvalue1’, 1), 1000-1999) (1000 of these)((‘Gvalue1’, 2), 2000-2999) (2000 of these)And so on.
If we use 10,000 we get:((‘Gvalue1’, 0), 1-9999) (so we have the combination of ‘Gvalue1’, ‘0’ for 10000 times then it will go to((‘Gvalue1’, 1), 10000-19999) (10000 of these)((‘Gvalue1’, 2), 20000-29999) (10000 of these)And so on.
How many rows do you have overall?
Guessing from 200M rows/day and assuming an average TTL of 2000 days, that's 4G rows. That's more than enough for any sized cluster.
200,000 (no 'h') is okay for a small-medium cluster (1000 vcpus
total) but will start showing bad distribution at larger cluster
sizes. So if you to keep the cluster size below ~10 nodes, even
removing h may work. In general I dislike introducing artificial
bucketing, but in your case I think it's fine and you can keep it.