Write performance impact in scylladb with a global secondary index

375 views
Skip to first unread message

Dhiraj Prajapati

<dhiraj.prajapati@games24x7.com>
unread,
May 16, 2021, 7:07:19 AM5/16/21
to ScyllaDB users
Hi,
I have a use case where a user can register for events. And there can be multiple attendees (to the tune of millions) in a single event. I am planning to model the database table like this:

CREATE TABLE user_events (
user_id BIGINT,
event_id BIGINT,
registration_date TIMESTAMP,
.. some other metadata ..,
PRIMARY KEY (user_id, event_id)
)

Most of my queries would be give the events that a user has registered for. But I will have one more query: to get random 1000 users which have registered for a given event.
So I am planning to add a global secondary index on event_id to serve the second kind of queries.

After adding the secondary index, I have noticed that it has impacted the insert performance a lot. The insert rate has reduced to 1/6th.

Any ideas if I am doing anything wrong here?

Regards,
Dhiraj Prajapati

Tzach Livyatan

<tzach@scylladb.com>
unread,
May 16, 2021, 7:23:04 AM5/16/21
to ScyllaDB users
Please provide more details, like the full schema (including sec index), cluster size, data set size etc

If the cardinality of event_id is low (only a few ids), many of the sec index rows are in the same partition, which might cause a hot partition, and overload a core.
I would start the analysis by looking at the Advisor in the monitoring dashboard, cluster view, looking for imbalance nodes / cores.





--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/a3451027-a208-497c-8f3a-49e25e700e90n%40googlegroups.com.

Ivan Prisyazhnyy

<ivan@scylladb.com>
unread,
May 16, 2021, 7:30:31 AM5/16/21
to ScyllaDB users
I don't think you are doing anything wrong. If your Replication Factor
== 3 adding a secondary index will already reduce your baseline
throughput by that factor (3). Because indexes are also needed to be
written in a redundant manner. Then their additional overhead we
usually account for 30% in theoretical calculations is 30%. The rest
may be related to the large partitions that you are having by storing
millions of users in a single event partition.

There are could be other additional issues like high partition
contention, commit log overload on inserts, or something else. Anyway
in the long run, having partitions with millions of rows is an
anti-pattern and potentially may cause problems in the future.
> To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAEB7%3D3b_0GongXomafrF64H4DY7h1A9qHQDKDd3az%2BX7Zbuv5g%40mail.gmail.com.

Dhiraj Prajapati

<dhiraj.prajapati@games24x7.com>
unread,
May 16, 2021, 7:48:22 AM5/16/21
to scylladb-users@googlegroups.com
Hi Livyatan,
It is a 3 node cluster, with replication factor as 2.
The cardinality of event id is high. There will be many events, but the number of users in a single event can go upto 30 million.

Data size would be around 3000 million user-event combinations.

Please let me know if you need more details.

Thanks,
Dhiraj

Avi Kivity

<avi@scylladb.com>
unread,
May 16, 2021, 7:55:16 AM5/16/21
to scylladb-users@googlegroups.com, Ivan Prisyazhnyy
It's not true that a secondary index reduces performance by replication
factor. Ordinary writes are replicated by that factor, and so are
secondary index writes, but the factors to not multiply.


Secondary indexes do have high costs - an extra read and up to two extra
writes. But the cost does not scale with the replication factor compared
to an ordinary write.

Dhiraj Prajapati

<dhiraj.prajapati@games24x7.com>
unread,
May 17, 2021, 9:59:45 AM5/17/21
to ScyllaDB users
Hi all,
I added 2 more nodes in the cluster. Now it is a 5 node cluster. But I did not see any improvement in the write performance.

Theoretically, there should have been some performance improvement, correct?  

Avi Kivity

<avi@scylladb.com>
unread,
May 19, 2021, 4:29:51 AM5/19/21
to scylladb-users@googlegroups.com, Dhiraj Prajapati

If you have a hot partition, then adding nodes will not help. Look at the monitoring in shard view to see if you have a hot shard.

Reply all
Reply to author
Forward
0 new messages