how long is repair supposed to take

7 views
Skip to first unread message

Michael Wohlwend

unread,
Aug 5, 2021, 5:03:22 AMAug 5
to scyllad...@googlegroups.com
Hi,

I'm wondering if the repairs I'm running are taking too long or are quite
right in the time taken...

It's a 8 node cluster, with 7TB HDD, intel xeon E3-1240 (6 cores) and 64GB
ram, 1gbit network, newest scylla

for example:

one table 234 GB , 51 SSTables , [51/4] takes about 9 hours
another table 500GB, 52 SSTables, [52/4] takes about 12 hours

repair with primaryrange true, parallelism 0, jobthreads 1

Two nodes are repairing concurrently (different tables)

Are these numbers ok? Or can I change something (despite hardware) to make
this faster?


Cheers,
Michael







Avi Kivity

unread,
Aug 5, 2021, 5:39:27 AMAug 5
to scyllad...@googlegroups.com, Michael Wohlwend, Asias He

On 05/08/2021 12.03, Michael Wohlwend wrote:
> Hi,
>
> I'm wondering if the repairs I'm running are taking too long or are quite
> right in the time taken...
>
> It's a 8 node cluster, with 7TB HDD, intel xeon E3-1240 (6 cores) and 64GB
> ram, 1gbit network, newest scylla
>
> for example:
>
> one table 234 GB , 51 SSTables , [51/4] takes about 9 hours
> another table 500GB, 52 SSTables, [52/4] takes about 12 hours


This translates to 11 MB/s. Which is quite low, but you are using HDD.


How many IOPS do your disks see when repair is running? They are
probably maxed out. Repair should be able to read sequentially, but
we've never really optimized the readers for HDD.

Michael Wohlwend

unread,
Aug 5, 2021, 6:22:09 AMAug 5
to scyllad...@googlegroups.com, Asias He, Avi Kivity
Am Donnerstag, 5. August 2021, 11:39:23 CEST schrieb Avi Kivity:
> On 05/08/2021 12.03, Michael Wohlwend wrote:
> > Hi,
> >
> > for example:
> >
> > one table 234 GB , 51 SSTables , [51/4] takes about 9 hours
> > another table 500GB, 52 SSTables, [52/4] takes about 12 hours
>
> This translates to 11 MB/s. Which is quite low, but you are using HDD.
>
>
> How many IOPS do your disks see when repair is running? They are
> probably maxed out.

hm, with dstat the read/s seldom goes over 2MB/s , wait is 0, idle is > 94

iotop -aoP show shows from time to time read values > 10MB

the file /etc/scylla.d/op_properties.yaml shows:

read_iops: 700
read_bandwidth: 450645888
write_iops:1034
write_bandwidth: 430859900









Avi Kivity

unread,
Aug 7, 2021, 12:24:44 PMAug 7
to mic...@fantasymail.de, scyllad...@googlegroups.com, Asias He

On 05/08/2021 13.22, Michael Wohlwend wrote:
> Am Donnerstag, 5. August 2021, 11:39:23 CEST schrieb Avi Kivity:
>> On 05/08/2021 12.03, Michael Wohlwend wrote:
>>> Hi,
>>>
>>> for example:
>>>
>>> one table 234 GB , 51 SSTables , [51/4] takes about 9 hours
>>> another table 500GB, 52 SSTables, [52/4] takes about 12 hours
>> This translates to 11 MB/s. Which is quite low, but you are using HDD.
>>
>>
>> How many IOPS do your disks see when repair is running? They are
>> probably maxed out.
> hm, with dstat the read/s seldom goes over 2MB/s , wait is 0, idle is > 94


What's important is IOPS, not bandwidth. wait is always 0 because we use
asynchronous I/O. What's r_await, w_await, and f_await from `iostat -x
1`? They're also visible in the Advanced dashboard.
Reply all
Reply to author
Forward
0 new messages