RADOS + deep scrubbing performance issues in production environment

Filippos Giannakos

unread,

Jan 27, 2014, 10:13:21 AM1/27/14

to ceph-...@vger.kernel.org, synnef...@googlegroups.com

Hello all,

We have been running RADOS in a large scale, production, public cloud
environment for a few months now and we are generally happy with it.

However, we experience performance problems when deep scrubbing is active.

We managed to reproduce them in our testing cluster running emperor, even while
it was idle.

We ran a simple rados bench test:

rados -p bench bench -b 524288 120 write

and could easily reach 230MB/Sec consistently [1].

Then, we manually initiated a deep scrub and re-ran the test.

As you can see from the results [2], the performance dropped significantly and
even paused for a few seconds.

Now imagine that behavior in a loaded cluster with thousands of VMs on top of
it. The performance drop is unacceptable for our service.

Are there any tools we are not aware of for controlling, possibly pausing,
deep-scrub and/or getting some progress about the procedure ?
Also since I believe it would be a bad practice to disable deep-scrubbing do you
have any recommendations of how to work around (or even solve) this issue ?

[1] https://pithos.okeanos.grnet.gr/public/yzq5fHNkl5OnjgLOPlRTA3
[2] https://pithos.okeanos.grnet.gr/public/OjIGAQFBGwcsBNMHtA8ir5

Kind Regards,
--
Filippos
<phili...@grnet.gr>

Sage Weil

unread,

Jan 27, 2014, 1:45:48 PM1/27/14

to Kyle Bader, Filippos Giannakos, ceph-...@vger.kernel.org, synnef...@googlegroups.com

There is also

ceph osd set noscrub

and then later

ceph osd unset noscrub

I forget whether this pauses an in-progress PG scrub or just makes it stop
when it gets to the next PG boundary.

sage

On Mon, 27 Jan 2014, Kyle Bader wrote:

> > Are there any tools we are not aware of for controlling, possibly pausing,
> > deep-scrub and/or getting some progress about the procedure ?
> > Also since I believe it would be a bad practice to disable deep-scrubbing do you
> > have any recommendations of how to work around (or even solve) this issue ?
>

> The periodicity of scrubs is controllable with these tunables:
>
> osd scrub max interval
> osd deep scrub interval
>
> You may also be interested in adjusting:
>
> osd scrub load threshold
>
> More information on the docs page:
>
> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
>
> Hope that helps some!
>
> --
>
> Kyle
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

Kyle Bader

unread,

Jan 27, 2014, 1:10:23 PM1/27/14

to Filippos Giannakos, ceph-...@vger.kernel.org, synnef...@googlegroups.com

> Are there any tools we are not aware of for controlling, possibly pausing,
> deep-scrub and/or getting some progress about the procedure ?
> Also since I believe it would be a bad practice to disable deep-scrubbing do you
> have any recommendations of how to work around (or even solve) this issue ?

Mike Dawson

unread,

Jan 28, 2014, 1:30:46 AM1/28/14

to Sage Weil, Kyle Bader, Filippos Giannakos, ceph-...@vger.kernel.org, synnef...@googlegroups.com

On 1/27/2014 1:45 PM, Sage Weil wrote:
> There is also
>
> ceph osd set noscrub
>
> and then later
>
> ceph osd unset noscrub
>

In my experience scrub isn't nearly as much of a problem as deep-scrub.
On a IOPS constrained cluster with writes approaching the available
aggregate spindle performance minus replication penalty and possibly
co-located osd journal penalty, scrub may run without any disruption.
But deep-scrub tends to make iowait on the spindles get ugly.

To disable/enable deep-scrub use:

ceph osd set nodeep-scrub
ceph osd unset nodeep-scrub

> I forget whether this pauses an in-progress PG scrub or just makes it stop
> when it gets to the next PG boundary.
>
> sage
>
> On Mon, 27 Jan 2014, Kyle Bader wrote:
>
>>> Are there any tools we are not aware of for controlling, possibly pausing,
>>> deep-scrub and/or getting some progress about the procedure ?
>>> Also since I believe it would be a bad practice to disable deep-scrubbing do you
>>> have any recommendations of how to work around (or even solve) this issue ?
>>
>> The periodicity of scrubs is controllable with these tunables:
>>
>> osd scrub max interval
>> osd deep scrub interval
>>
>> You may also be interested in adjusting:
>>
>> osd scrub load threshold
>>
>> More information on the docs page:
>>
>> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

I rarely run into a situation where 1m average of load is <0.5 on a
multi-core server running osds, so deep scrub for me is always triggered
by the 'osd scrub max interval'. I've had a bug out there to take core
count into consideration:

http://tracker.ceph.com/issues/6296

The documentation used to say the "Default is 50%" implying that this
feature should allow scrub to start with a much higher load than 0.5
will allow on multi-core systems. The documentation has changed, but the
default of 0.5 is still artificially suppressing deep-scrub from
opportunistically starting on relatively idle multi-core systems.

That being said, deep-scrub may be better served with an
osd_scrub_iops_threshold mechanism instead of (or in addition to) the
osd_scrub_load_threshold.

- Mike

Filippos Giannakos

unread,

Jan 28, 2014, 1:12:46 PM1/28/14

to Kyle Bader, ceph-...@vger.kernel.org, synnef...@googlegroups.com

Thanks Kyle but this does not solve the performance degradation when the deep
scrubbing is actually running. Plus, it can take several days to complete.

Filippos Giannakos

unread,

Jan 28, 2014, 1:13:06 PM1/28/14

to Sage Weil, Kyle Bader, ceph-...@vger.kernel.org, synnef...@googlegroups.com

On Mon, Jan 27, 2014 at 10:45:48AM -0800, Sage Weil wrote:
> There is also
>
> ceph osd set noscrub
>
> and then later
>
> ceph osd unset noscrub
>
> I forget whether this pauses an in-progress PG scrub or just makes it stop
> when it gets to the next PG boundary.
>
> sage

I bumped into those settings but I couldn't find any documentation about them.
When I first tried them, they didn't do anything immediately, so I thought they
weren't the answer. After your mention, I tried them again, and after a while
the deep-scrubbing stopped. So I'm guessing they stop scrubbing on the next PG
boundary.

I see from this thread and others before, that some people think it is a spindle
issue. I'm not sure that it is just that. Replicating it to an idle cluster that
can do more than 250MiB/seconds and pausing for 4-5 seconds on a single request,
sounds like an issue by itself. Maybe there is too much locking or not enough
priority to the actual I/O ? Plus, that idea of throttling deep scrubbing based
on the iops sounds appealing.

Filippos Giannakos

unread,

Jan 28, 2014, 1:13:25 PM1/28/14

to Mike Dawson, Sage Weil, Kyle Bader, ceph-...@vger.kernel.org, synnef...@googlegroups.com

On Tue, Jan 28, 2014 at 01:30:46AM -0500, Mike Dawson wrote:
>
> On 1/27/2014 1:45 PM, Sage Weil wrote:
> >There is also
> >
> > ceph osd set noscrub
> >
> >and then later
> >
> > ceph osd unset noscrub
> >
> In my experience scrub isn't nearly as much of a problem as
> deep-scrub. On a IOPS constrained cluster with writes approaching
> the available aggregate spindle performance minus replication
> penalty and possibly co-located osd journal penalty, scrub may run
> without any disruption. But deep-scrub tends to make iowait on the
> spindles get ugly.
>
> To disable/enable deep-scrub use:
>
> ceph osd set nodeep-scrub
> ceph osd unset nodeep-scrub
>

Yes, deep-scrubbing is much worse than scrubbing, but I think fully disabling it
is not a good option. But having days of degraded performance isn't either.
That's why I am bringing up the problem and seeking for a solid solution
regarding the matter.

Guang

unread,

Jan 28, 2014, 9:35:25 PM1/28/14

to Filippos Giannakos, Sage Weil, Kyle Bader, ceph-...@vger.kernel.org, synnef...@googlegroups.com

Glad to see there are some discussion around scrubbing / deep-scrubbing.

We are experiencing the same that scrubbing could affect latency quite a bit and so far I found two slow patterns (dump_historic_ops): 1) waiting from being dispatched 2) waiting in the op working queue to be fetched by an available op thread. For the first slow pattern, it looks like there is lock (as dispatcher stop working for 2 seconds and then resume, same for scrubber thread), that needs further investigation. For the second slow pattern, as scrubbing brings more ops (for scrubbing check), that make the op thread's work load increase (client op has a lower priority), I think that could be improved by increasing the op thread number, I will confirm this analysis by adding more op threads and turn on scrubbing on OSD basis.

Does the above observation and analysis make sense?

Thanks,

Guang

unread,

Feb 3, 2014, 8:40:07 AM2/3/14

to Filippos Giannakos, ceph-devel@vger.kernel.org Development, ceph-...@lists.ceph.com, Sage Weil, Kyle Bader, synnef...@googlegroups.com

+ceph-users.

Does anybody have the similar experience of scrubbing / deep-scrubbing?

Thanks,

Guang

Reply all

Reply to author

Forward