Hello,
We’re currently testing Scylla for use as a pure key-object store for data blobs around 10kB - 60kB each. Our use case is storing on the order of 10 billion objects with about 5-20 million new writes per day. A written object will never be updated or deleted. Objects will be read at least once, some time within 10 days of being written. This will generally happen as a batch; that is, all of the images written on a particular day will be read together at the same time. This batch read will only happen one time; future reads will happen on individual objects, with no grouping, and they will follow a long-tail distribution, with popular objects read thousands of times per year but most read never or virtually never.
I’ve set up a small four node test cluster and have written test scripts to benchmark writing and reading our data. The table I’ve set up is very simple: an ascii primary key column with the object ID and a blob column for the data. All other settings were left at their defaults.
I’ve found write speeds to be very fast to begin with. When testing with Cassandra we found that periodically, writes would slow to a crawl for anywhere between half an hour to two hours, after which speeds recover to their previous levels. Scylla does not experience these periodic slowdowns, but it does seem to slow down over time, and, unlike Cassandra, does not seem to recover to the previous speeds. Over the course of two days of writing, the write speeds slowed by a factor of four. If this trend continues, then Scylla won't for us in production.
Read speeds have been more disappointing. Cached reads are very fast, but random read speed averages about 4 MB/sec, which is too slow when we need to read out a batch of several million objects. I don’t think it’s reasonable to assume that these rows will all still be cached by the time we need to read them for that first large batch read.
My general question is whether anyone has any suggestions for how to improve performance for our use case. More specifically:
- Is there a way to mitigate or eliminate the write speed slowing down over time that I observe?
- Are there settings I should be using in order to maximize read speeds for random reads?
- Is there a way to design our tables to improve the read speeds for the initial large batched reads? I was thinking of using a batch ID column that could be used to retrieve the data for the initial block. However, future reads would need to be done by the object ID, not the batch ID, so it seems to me I’d need to duplicate the data, one in a “objects by batch” table, and the other in a simple “objects” table. Is there a better approach than this?
Thank you!
Jonathan
We’re currently testing Scylla for use as a pure key-object store for data blobs around 10kB - 60kB each. Our use case is storing on the order of 10 billion objects with about 5-20 million new writes per day. A written object will never be updated or deleted. Objects will be read at least once, some time within 10 days of being written. This will generally happen as a batch; that is, all of the images written on a particular day will be read together at the same time. This batch read will only happen one time; future reads will happen on individual objects, with no grouping, and they will follow a long-tail distribution, with popular objects read thousands of times per year but most read never or virtually never.
I’ve set up a small four node test cluster and have written test scripts to benchmark writing and reading our data. The table I’ve set up is very simple: an ascii primary key column with the object ID and a blob column for the data. All other settings were left at their defaults.
I’ve found write speeds to be very fast to begin with. When testing with Cassandra we found that periodically, writes would slow to a crawl for anywhere between half an hour to two hours, after which speeds recover to their previous levels. Scylla does not experience these periodic slowdowns, but it does seem to slow down over time, and, unlike Cassandra, does not seem to recover to the previous speeds. Over the course of two days of writing, the write speeds slowed by a factor of four. If this trend continues, then Scylla won't for us in production.
Read speeds have been more disappointing. Cached reads are very fast, but random read speed averages about 4 MB/sec, which is too slow when we need to read out a batch of several million objects. I don’t think it’s reasonable to assume that these rows will all still be cached by the time we need to read them for that first large batch read.
My general question is whether anyone has any suggestions for how to improve performance for our use case. More specifically:
- Is there a way to mitigate or eliminate the write speed slowing down over time that I observe?
- Are there settings I should be using in order to maximize read speeds for random reads?
- Is there a way to design our tables to improve the read speeds for the initial large batched reads? I was thinking of using a batch ID column that could be used to retrieve the data for the initial block. However, future reads would need to be done by the object ID, not the batch ID, so it seems to me I’d need to duplicate the data, one in a “objects by batch” table, and the other in a simple “objects” table. Is there a better approach than this?
Thank you!
Jonathan
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/15b0145b-eb2a-4534-9dd2-cafe3e8caeb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
When you say batch, do you mean like 'plenty at once' or a real CQL batch where it's all or nothing?
Try the parallel table scan technique: http://www.scylladb.com/2017/03/28/parallel-efficient-full-table-scan-scylla/
It is probably the compaction cost. We behave better than Cassandra's spikiness but overtime the database hasmore data and need to merge growing amount of files.Since there is no delete/update, LCS strategy may be better for your case.
Most important is to provide statistics, first with the nodetool compaction history commands and later by deploying ourmonitoring stack (based on Prometheus) which will allow us to know what's going on.
Which AWS instances are you using?
If you are deploying our monitoring system as well, I would advise
sharing some metrics with us, so we can take a closer look.
Can you get us details about your hardware?
> email to scylladb-users+unsubscribe@googlegroups.com.
> To post to this group, send email to scylladb-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/scylladb-users.
> To view this discussion on the web visit
>
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "ScyllaDB users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scylladb-users/-alY011dq1w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAD-J%3DzY56v5NWXta9PbK%3DSg%2BBCTGPDCjWcC17Yn7ibwn9JATXA%40mail.gmail.com.
If you have deployed our docker images for prometheus + grafana
according to https://github.com/scylladb/scylla-grafana-monitoring,
you should be able to go to port 3000 (instead of prometheus' 9090),
and there you will find 3 dashboards per version (Cluster, Server,
I/O)
No - it's my bad. I should have been more specific.
To run out of developer mode, the setup procedure (that you have
probably done) need to create a file, /etc/scylla.d/io.conf
That is what I am after.
Also, that you have a split data / commitlog is interesting: although
we support it, we have known issues extracting optimal performance out
of that setup in some circumnstances. Your io.conf contents will shed
some light on this.
Can you get us details about your hardware?2 x E5-2660 8-core Xeons64GB RAM DDR-3 PC130010Gb internal network (SFP+)LSI 9210-8i controller (IT mode)2TB HDD for data200GB SSD for commitlogs
There may be a more "hacky" solution for your current needs. If I understand correctly, you mostly (?) care about the read performance during those read "batches", where you want to read a lot of small objects written in the same day. So one solution is to model your data differently: Don't write every object as a separate 10K paratition, but rather put all the objects of the same hour (or whatever other granularity) into separate clustering rows of the same partition.
By the way, since you are comparing Scylla's performance to Cassandra, I wonder if you have the same slow read problem also in Cassandra. I assume you do, because Cassandra would also need to seek in the disk on every read. But if you don't, we need to figure out why.
There is a trade-off here: if that is done, every row in that hour
>
> There may be a more "hacky" solution for your current needs. If I understand
> correctly, you mostly (?) care about the read performance during those read
> "batches", where you want to read a lot of small objects written in the same
> day. So one solution is to model your data differently: Don't write every
> object as a separate 10K paratition, but rather put all the objects of the
> same hour (or whatever other granularity) into separate clustering rows of
> the same partition.
will have the same partition key. That leads to bad sharding, with
very real consequences: every request in that hour will be sent to the
same node, and in Scylla's case, the same CPU.
We should look into your prometheus/grafana graphs, and see if you
have requests blocked (there is a graph for that in the per-server
dash).

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAP1KxAzQutUGBFe7ARMrixrg1rtza5qhh2hDJiS9hW4WdU%2Bneg%40mail.gmail.com.
1) Load in the system (there is a graph for that, usually in the top). It is also interesting to check the node across the CPUs in the node. For that, it is usually better to use prometheus directly (port 9090). If you tell us the name of one of your instances (hovering the mouse over the lines will tell), I can get you a query for that.
2) whether or not the SSD is at its max throughput (there are prometheus plugins to export those metrics, or you can use any other linux tool)

On 9 May 2017 at 16:07, Glauber Costa <gla...@scylladb.com> wrote:1) Load in the system (there is a graph for that, usually in the top). It is also interesting to check the node across the CPUs in the node. For that, it is usually better to use prometheus directly (port 9090). If you tell us the name of one of your instances (hovering the mouse over the lines will tell), I can get you a query for that.See attached screenshot. My instances are all named scylla01, scylla02, etc.
2) whether or not the SSD is at its max throughput (there are prometheus plugins to export those metrics, or you can use any other linux tool)According to iostat the SSDs are each averaging around 10MB/s for writes and around 5MB/s for reads, which should be nowhere near their maximums.
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAP1KxAzOxPo5UZgiqa%2BBFdpUjzpF%3Di_9XmGOcQB79TAYrHJ-Lw%40mail.gmail.com.
Please to to port 9090, and try this query:scylla_reactor_gauge_load{instance=~".*cephL01*"} (or the actual name of the instance - ~ is just codename for regex syntax)That will generate a graph, and it would be nice to look at it.
In the following page:there are instructions (in the bottom) how to upload your prometheus data to our s3 bucket. With that, we can look at all the metrics at once (including the ones that are not in the standard dashes, for more non-obvious things)If you can somehow add linux metrics to prometheus (with node_exporter, or something else), that helps as well.



On 9 May 2017 at 16:33, Glauber Costa <gla...@scylladb.com> wrote:Please to to port 9090, and try this query:scylla_reactor_gauge_load{instance=~".*cephL01*"} (or the actual name of the instance - ~ is just codename for regex syntax)That will generate a graph, and it would be nice to look at it.Graphs attached below.
In the following page:there are instructions (in the bottom) how to upload your prometheus data to our s3 bucket. With that, we can look at all the metrics at once (including the ones that are not in the standard dashes, for more non-obvious things)If you can somehow add linux metrics to prometheus (with node_exporter, or something else), that helps as well.I have Node Exporter running in Prometheus, so that won't be a problem. I'll follow the instructions and send the data to you.Thank you!
--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-users+unsubscribe@googlegroups.com.
To post to this group, send email to scylladb-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scylladb-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/CAP1KxAw4Q6rZt1cBL2RT6p8%3DxbBfiUzmH8cB72xWUrigr3BrUQ%40mail.gmail.com.