btrfs scrub causes kernel panic

325 views
Skip to first unread message

erikf

unread,
Feb 17, 2015, 4:30:10 PM2/17/15
to bfq-i...@googlegroups.com
Hi everyone!

I have been running Arch Linux with the ck-kernel which includes the bfq scheduler for quite some time.
Recently I noticed that running

btrfs scrub start -B /


immediately causes a kernel panic on my machines. While the ck-kernel contains a number of
patches, with the help of from the Arch forums I could track this problem to the bfq
scheduler (https://bbs.archlinux.org/viewtopic.php?id=193654).

The scrub runs just fine using the normal Arch kernel or the ck-kernel with the cfq. So the
kernel panic should not be hardware related.

The final test kernel I used only was a vanilla 3.18.7 kernel with these patches:

change-default-console-loglevel.patch
0001-block-cgroups-kconfig-build-bits-for-BFQ-v7r7-3.18.patch
0002-block-introduce-the-BFQ-v7r7-I-O-sched-for-3.18.patch
0003-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r7-for-3.18.0.patch

My test setup:

AMD A10 6700T with two SSDs as raid 1 (software raid with btrfs).

The attached kernel ouput was generated by this machine with the aformentioned kernel while running
btrfs scrub.

This problem seems to occur only with fast and / or multiple devices as I am not able to reproduce the problem
on a T60 with a single SSD (which is much slower than the AMD system).

I hope this bug report is useful to you. Please let me know if you need any further information or you want something
tested.

Best regards
Erik
pilatus-ck-test-bfq-3.18.7-2.log

Paolo Valente

unread,
Feb 19, 2015, 6:12:59 AM2/19/15
to bfq-i...@googlegroups.com
Hi,
thanks for reporting this failure. We might have found both the bug and a possible fix.

If you are willing to help us in this respect, we would like to send you, as soon as we have it ready for shipping, a fixed version of bfq, and to ask you to check whether the failure shows up again (unfortunately we did non succeed in reproducing this failure, and Arianna found a ‘candidate’ bug only by static analysis of the code).

Thanks,
Paolo
> --
> You received this message because you are subscribed to the Google Groups "bfq-iosched" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bfq-iosched...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <pilatus-ck-test-bfq-3.18.7-2.log>

erikf

unread,
Feb 19, 2015, 7:33:29 AM2/19/15
to bfq-i...@googlegroups.com
Hi Paolo,

just let me know when the fix is ready (and where I can find it) and I will happily test it.

Best regards,
Erik.

Malte Schröder

unread,
Mar 24, 2015, 3:40:18 AM3/24/15
to bfq-i...@googlegroups.com
Hi, here are dumps of another bug I can trigger with btrfs scrub on a raid5-volume. This is kernel 3.19.2 with bfq v7r7.
I can reproduce that issue easily, so I would be happy to test fixes.
bfq-bug.txt
bfq-oops.txt

Arianna Avanzini

unread,
Mar 27, 2015, 8:00:59 AM3/27/15
to bfq-i...@googlegroups.com
On Tuesday, March 24, 2015 at 8:40:18 AM UTC+1, Malte Schröder wrote:
Hi, here are dumps of another bug I can trigger with btrfs scrub on a raid5-volume. This is kernel 3.19.2 with bfq v7r7.
I can reproduce that issue easily, so I would be happy to test fixes.

Hi,

sorry for the late delay. We'll be shortly sending you the patch privately and we'd be glad if you could test it.

Thank you for your interest in BFQ,
Arianna

Malte Schröder

unread,
Apr 8, 2015, 12:32:50 PM4/8/15
to bfq-i...@googlegroups.com
Still on standby ;)

Arianna Avanzini

unread,
Apr 8, 2015, 12:35:48 PM4/8/15
to bfq-i...@googlegroups.com
On Wednesday, April 8, 2015 at 6:32:50 PM UTC+2, Malte Schröder wrote:
Still on standby ;)


Hi,

we sent you the e-mail the same day, I'll try to forward it to you since it didn't get to you apparently :-)

Thanks for letting us know and looking forward to your feedback,
Arianna

Malte Schröder

unread,
Apr 9, 2015, 12:36:16 AM4/9/15
to bfq-i...@googlegroups.com
On Wednesday, April 8, 2015 at 6:35:48 PM UTC+2, Arianna Avanzini wrote:
On Wednesday, April 8, 2015 at 6:32:50 PM UTC+2, Malte Schröder wrote:
Still on standby ;)


Hi,

we sent you the e-mail the same day, I'll try to forward it to you since it didn't get to you apparently :-)

Thanks for letting us know and looking forward to your feedback,
Arianna


Ok, after one night of testing I have mixed results.
One system survived which would fail after minutes normally.
The other system did apparently panic and did an automatic reboot. It did not leave a dump in pstore, though ... I will try again.

Arianna Avanzini

unread,
Apr 9, 2015, 5:20:43 AM4/9/15
to bfq-i...@googlegroups.com
Hi,


On Thursday, April 9, 2015 at 6:36:16 AM UTC+2, Malte Schröder wrote:

Ok, after one night of testing I have mixed results.
One system survived which would fail after minutes normally.
The other system did apparently panic and did an automatic reboot. It did not leave a dump in pstore, though ... I will try again.


Thank you for testing the patch. Please do let us know if the second system panics again after more testing, if you have the time to try it again.

Arianna 

Malte Schröder

unread,
Apr 13, 2015, 12:43:38 PM4/13/15
to bfq-i...@googlegroups.com
Hi,

On Thursday, April 9, 2015 at 11:20:43 AM UTC+2, Arianna Avanzini wrote:
Hi,

Thank you for testing the patch. Please do let us know if the second system panics again after more testing, if you have the time to try it again.


It seems using BTRFS Raid5 makes it easier to hit this issue. See attached dmesg-dump.

 
Arianna 
dmesg.txt

Arianna Avanzini

unread,
Apr 27, 2015, 5:47:49 AM4/27/15
to bfq-i...@googlegroups.com
First of all, thank you for reporting the issue and my apologies for the delay in coming back to you.
We have tried to reproduce the issue, but unfortunately without any result, so we are proceeding with static analysis of the code. As a further step to aid us in debugging we have also prepared a patch which adds on top of BFQ-v7r7 some debugging checks related to the panic you are experiencing; the patch is attached to this e-mail and should apply fine on top of BFQ-v7r7.
If you are still available to help us and have some time, could you please apply the patch to your failing system's kernel, try to reproduce the failure with the patch applied and paste here a new dmesg dump? It would be very helpful.

Thank you again,
Arianna

0001-block-bfq-add-debug-checks-on-fifo-coherency.patch

Malte Schröder

unread,
Apr 27, 2015, 12:43:53 PM4/27/15
to bfq-i...@googlegroups.com

Hi, here's the dmesg dump :)
I have the gut feeling that competing IO parallel to the scrub makes it more likely to trigger this.

/Malte
dmesg.txt
Reply all
Reply to author
Forward
0 new messages