[PATCH] cql: make CREATE INDEX IF NOT EXISTS be silent if named index exists

47 views
Skip to first unread message

Nadav Har'El

<nyh@scylladb.com>
unread,
Aug 10, 2021, 2:53:01 PM8/10/21
to scylladb-dev@googlegroups.com, Nadav Har'El
What should the following pair of statements do?

CREATE INDEX xyz ON tbl(a)
CREATE INDEX IF NOT EXISTS xyz ON tbl(b)

There are two reasonable choices:
1. The index with the name xyz exists, so the second command should do
nothing, because of the "IF NOT EXISTS".
2. The index on table(b) does *not* exist, so the command should try to
create it and when it can't (because the name xyz is already taken),
it should produce an error message.

Currently, Scylla went with choice 2, but Cassandra went with choice 1,
so in this patch we switch to choice 1 and add a cql-pytest test to
confirm the behavior of Scylla and Cassandra in this respect is
identical.

Fixes #9182.

Signed-off-by: Nadav Har'El <n...@scylladb.com>
---
test/cql-pytest/test_secondary_index.py | 22 ++++++++++++++++++++++
cql3/statements/create_index_statement.cc | 12 ++++++++----
2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/test/cql-pytest/test_secondary_index.py b/test/cql-pytest/test_secondary_index.py
index 799dc80cf..ab2eab43d 100644
--- a/test/cql-pytest/test_secondary_index.py
+++ b/test/cql-pytest/test_secondary_index.py
@@ -171,6 +171,28 @@ def test_create_index_if_not_exists(cql, test_keyspace):
cql.execute(f"DROP INDEX {test_keyspace}.abc")
cql.execute(f"DROP INDEX {test_keyspace}.xyz")

+# Another test for CREATE INDEX IF NOT EXISTS. Checks what happens if an index
+# with the given *name* already exists, but it's a different index than the
+# one requested, i.e.,
+# CREATE INDEX xyz ON tbl(a)
+# CREATE INDEX IF NOT EXIST xyz ON tbl(b)
+# Should the second command silently do nothing (because xyz already exists),
+# or try to create a table and visibly fail (because and index on tbl(b)
+# doesn't yet exist)?
+# Cassandra chose the first behavior (silently do nothing), so we should
+# probably do so too.
+# Reproduces issue #9182
+def test_create_index_if_not_exists2(cql, test_keyspace):
+ with new_test_table(cql, test_keyspace, 'p int primary key, v1 int, v2 int') as table:
+ index_name = unique_name()
+ cql.execute(f"CREATE INDEX {index_name} ON {table}(v1)")
+ # Obviously can't create a different index with the same name:
+ with pytest.raises(InvalidRequest, match="already exists"):
+ cql.execute(f"CREATE INDEX {index_name} ON {table}(v2)")
+ # "IF NOT EXISTS" sees that {index_name} already exists and does
+ # nothing. Even though it's not an index of the same column.
+ cql.execute(f"CREATE INDEX IF NOT EXISTS {index_name} ON {table}(v2)")
+
# Test that the paging state works properly for indexes on tables
# with descending clustering order. There was a problem with indexes
# created on clustering keys with DESC clustering order - they are represented
diff --git a/cql3/statements/create_index_statement.cc b/cql3/statements/create_index_statement.cc
index a36b8e2b3..8eed3f5a8 100644
--- a/cql3/statements/create_index_statement.cc
+++ b/cql3/statements/create_index_statement.cc
@@ -311,10 +311,14 @@ create_index_statement::announce_migration(query_processor& qp) const {
}
auto index_table_name = secondary_index::index_table_name(accepted_name);
if (db.has_schema(keyspace(), index_table_name)) {
- return make_exception_future<::shared_ptr<cql_transport::event::schema_change>>(
- exceptions::invalid_request_exception(format("Index {} cannot be created, because table {} already exists",
- accepted_name, index_table_name))
- );
+ if (_if_not_exists) {
+ return make_ready_future<::shared_ptr<cql_transport::event::schema_change>>(nullptr);
+ } else {
+ return make_exception_future<::shared_ptr<cql_transport::event::schema_change>>(
+ exceptions::invalid_request_exception(format("Index {} cannot be created, because table {} already exists",
+ accepted_name, index_table_name))
+ );
+ }
}
++_cql_stats->secondary_index_creates;
schema_builder builder{schema};
--
2.31.1

Nadav Har'El

<nyh@scylladb.com>
unread,
Aug 10, 2021, 3:05:53 PM8/10/21
to scylladb-dev, Eliran Sinvani, Piotr Sarna
NOTE:

Although I already created this patch, I'm not 100% sure it's a good idea. I'm starting to think maybe it's a bad idea :-(
Eliran, you brought this potential issue up - what do you think?

Yes, my patch makes our CQL implementation a tiny more compatible with Cassandra in some esoteric case.
But on the other hand, the case it makes compatible is a case where:
1. The request doesn't make sense, and is probably (?) a mistake from the user.
2. Cassandr'a behavior in this case is to do absolutely nothing, silently.

So in some sense the behavior we had before this patch - giving a reasonably good error message - perhaps it is better than Cassandra's silent-forgiveness behavior in this case? The user will see the error message and understand the request probably has some typo.

What I wonder, though, is perhaps there is some case where some code will assume Cassanra's behavior, creating an index called "xyz" unless it's already an index for anything. Could that possibly be a useful feature?


--
Nadav Har'El
n...@scylladb.com

Piotr Sarna

<sarna@scylladb.com>
unread,
Aug 11, 2021, 3:57:14 AM8/11/21
to Nadav Har'El, scylladb-dev, Eliran Sinvani
On 8/10/21 9:05 PM, Nadav Har'El wrote:
NOTE:

Although I already created this patch, I'm not 100% sure it's a good idea. I'm starting to think maybe it's a bad idea :-(
Eliran, you brought this potential issue up - what do you think?

Yes, my patch makes our CQL implementation a tiny more compatible with Cassandra in some esoteric case.
But on the other hand, the case it makes compatible is a case where:
1. The request doesn't make sense, and is probably (?) a mistake from the user.
2. Cassandr'a behavior in this case is to do absolutely nothing, silently.

So in some sense the behavior we had before this patch - giving a reasonably good error message - perhaps it is better than Cassandra's silent-forgiveness behavior in this case? The user will see the error message and understand the request probably has some typo.

What I wonder, though, is perhaps there is some case where some code will assume Cassanra's behavior, creating an index called "xyz" unless it's already an index for anything. Could that possibly be a useful feature?
I slept on it and found your arguments against Cassandra's behavior convincing. It's not about idempotence anymore if we try to create a particular index on a specific table and column, but a different index on another table or column happens to have the same name - it's more likely, as you mentioned, a typo. So maybe it's better to just document our different decision and leave the code as is.

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 11, 2021, 4:21:35 AM8/11/21
to Nadav Har'El, scylladb-dev, Eliran Sinvani, Piotr Sarna


On Tue, Aug 10, 2021, 22:05 Nadav Har'El <n...@scylladb.com> wrote:
NOTE:

Although I already created this patch, I'm not 100% sure it's a good idea. I'm starting to think maybe it's a bad idea :-(
Eliran, you brought this potential issue up - what do you think?

Yes, my patch makes our CQL implementation a tiny more compatible with Cassandra in some esoteric case.
But on the other hand, the case it makes compatible is a case where:
1. The request doesn't make sense, and is probably (?) a mistake from the user.
2. Cassandr'a behavior in this case is to do absolutely nothing, silently.

So in some sense the behavior we had before this patch - giving a reasonably good error message - perhaps it is better than Cassandra's silent-forgiveness behavior in this case? The user will see the error message and understand the request probably has some typo.

What I wonder, though, is perhaps there is some case where some code will assume Cassanra's behavior, creating an index called "xyz" unless it's already an index for anything. Could that possibly be a useful feature?

It seems like the index namespace is global.
It doesn't make sense to not return an error if the index name is already used for some other table.

--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-dev/CANEVyjtw3sBvfUvP-p001G7CxJu9rboe6_TCmETO6--sC6_6zw%40mail.gmail.com.

Nadav Har'El

<nyh@scylladb.com>
unread,
Aug 11, 2021, 4:40:52 AM8/11/21
to Benny Halevy, scylladb-dev, Eliran Sinvani, Piotr Sarna
On Wed, Aug 11, 2021 at 11:21 AM Benny Halevy <bha...@scylladb.com> wrote:


On Tue, Aug 10, 2021, 22:05 Nadav Har'El <n...@scylladb.com> wrote:
NOTE:

Although I already created this patch, I'm not 100% sure it's a good idea. I'm starting to think maybe it's a bad idea :-(
Eliran, you brought this potential issue up - what do you think?

Yes, my patch makes our CQL implementation a tiny more compatible with Cassandra in some esoteric case.
But on the other hand, the case it makes compatible is a case where:
1. The request doesn't make sense, and is probably (?) a mistake from the user.
2. Cassandr'a behavior in this case is to do absolutely nothing, silently.

So in some sense the behavior we had before this patch - giving a reasonably good error message - perhaps it is better than Cassandra's silent-forgiveness behavior in this case? The user will see the error message and understand the request probably has some typo.

What I wonder, though, is perhaps there is some case where some code will assume Cassanra's behavior, creating an index called "xyz" unless it's already an index for anything. Could that possibly be a useful feature?

It seems like the index namespace is global.
It doesn't make sense to not return an error if the index name is already used for some other table.

It took me a while to understand the double negative :-)

So if I understand you correctly, you want to return an error if the index name is used for a different table.

I'd argue that if we do this, we should also return an error if the index name is used for a different column. The point is that in Scylla's index you don't use its "name" to query it - you just do a query on the base table, and expect to be able to do WHERE restrictions on the indexed column - so if the indexed column is different, it's not the same index - even if the same was chosen.

This is why I tend to think that Scylla's current behavior - printing an error in this case - is the correct one. Piotr Sarna also seems to lean in this direction according to his last mail. I'm also waiting for Eliran's opinion - he was the first one who suggested that the current behavior might be incorrect.

I'll prepare a different patch that "enshrines" the current behavior in a test, marking it as a "cassandra bug" (the test will fail on Cassandra) and explaining why.

By the way, if you're curious why this is even important, it's not ;-) We had a much more serious bug (https://github.com/scylladb/scylla/issues/8620, which can cause crashes) which we fixed but didn't backport, so a customer recently encountered it in an older version, and looking at it led us also to this much-less-important inconsistency.

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 11, 2021, 4:58:40 AM8/11/21
to Nadav Har'El, scylladb-dev, Eliran Sinvani, Piotr Sarna
On Wed, 2021-08-11 at 11:40 +0300, Nadav Har'El wrote:
On Wed, Aug 11, 2021 at 11:21 AM Benny Halevy <bha...@scylladb.com> wrote:


On Tue, Aug 10, 2021, 22:05 Nadav Har'El <n...@scylladb.com> wrote:
NOTE:

Although I already created this patch, I'm not 100% sure it's a good idea. I'm starting to think maybe it's a bad idea :-(
Eliran, you brought this potential issue up - what do you think?

Yes, my patch makes our CQL implementation a tiny more compatible with Cassandra in some esoteric case.
But on the other hand, the case it makes compatible is a case where:
1. The request doesn't make sense, and is probably (?) a mistake from the user.
2. Cassandr'a behavior in this case is to do absolutely nothing, silently.

So in some sense the behavior we had before this patch - giving a reasonably good error message - perhaps it is better than Cassandra's silent-forgiveness behavior in this case? The user will see the error message and understand the request probably has some typo.

What I wonder, though, is perhaps there is some case where some code will assume Cassanra's behavior, creating an index called "xyz" unless it's already an index for anything. Could that possibly be a useful feature?

It seems like the index namespace is global.
It doesn't make sense to not return an error if the index name is already used for some other table.


It took me a while to understand the double negative :-)

So if I understand you correctly, you want to return an error if the index name is used for a different table.

I'd argue that if we do this, we should also return an error if the index name is used for a different column. The point is that in Scylla's index you don't use its "name" to query it - you just do a query on the base table, and expect to be able to do WHERE restrictions on the indexed column - so if the indexed column is different, it's not the same index - even if the same was chosen.

That makes sense.
One should use a different name for the different indices so that they could co-exist on the same table,
so clashing on the name warrants an error.

The bottom line is that CREATE INDEX IF NOT EXIST should succeed only if
the same index already exists.


This is why I tend to think that Scylla's current behavior - printing an error in this case - is the correct one. Piotr Sarna also seems to lean in this direction according to his last mail. I'm also waiting for Eliran's opinion - he was the first one who suggested that the current behavior might be incorrect.

I agree.

Avi Kivity

<avi@scylladb.com>
unread,
Aug 11, 2021, 5:04:02 AM8/11/21
to Nadav Har'El, scylladb-dev, Eliran Sinvani, Piotr Sarna

That's a great existential observation. What does it mean to exist?


a. An index with the same name exists

b. An index with exactly the same specification - including the name - exists.


I agree we should choose b. If anything in the request doesn't match the existing index, we need to provide an error.

Reply all
Reply to author
Forward
0 new messages