Coreos software raid (mdadm) gives a kernel panic for a RAID-6 device

57 views
Skip to first unread message

subhajit mukherjee

unread,
Jun 13, 2016, 10:39:11 PM6/13/16
to CoreOS Dev
Once a RAID 6 device is configured on the host using mdadm tool, I create an XFS partition and write IOs to it.

I mark a disk faulty,remove it from the array and re-add it after some time. A resync is triggered. If after sometime time  I mark another node faulty and remove it I get the following panic

<1>[2237413.569101] BUG: unable to handle kernel NULL pointer dereference at 0000000000000140
<1>[2237413.578609] IP: [<ffffffffa0433a12>] raid5_set_cache_size+0x4c32/0xa280 [raid456]
<4>[2237413.587716] PGD 1e0443f067 PUD 1f72630067 PMD 0
<4>[2237413.593553] Oops: 0000 [#1] SMP
<4>[2237413.597765] Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod fuse xt_nat xfs libcrc32c veth xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter nf_nat nf_conntrack nls_ascii nls_cp437 vfat fat mousedev hid_generic usbhid hid sg br_netfilter bridge stp llc ext4 crc16 mbcache jbd2 coretemp x86_pkg_temp_thermal kvm crc32c_intel hmac drbg aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper sd_mod cryptd microcode ehci_pci igb mpt3sas ixgbe i2c_algo_bit ahci hwmon iTCO_wdt sb_edac ehci_hcd ipmi_ssif libahci firmware_class ptp ipmi_devintf iTCO_vendor_support raid_class i2c_i801 libata scsi_transport_sas usbcore mei_me edac_core lpc_ich pps_core ipmi_si i2c_core mfd_core mei usb_common scsi_mod mdio ipmi_msghandler evdev button sch_fq_codel ip_tables autofs4
<4>[2237413.696801] CPU: 0 PID: 118926 Comm: md0_raid6 Not tainted 4.2.2-coreos-r2 #2
<4>[2237413.717447] task: ffff881f7462d640 ti: ffff882016cf4000 task.ti: ffff882016cf4000
<4>[2237413.726467] RIP: 0010:[<ffffffffa0433a12>]  [<ffffffffa0433a12>] raid5_set_cache_size+0x4c32/0xa280 [raid456]
<4>[2237413.738258] RSP: 0018:ffff882016cf7b78  EFLAGS: 00010202
<4>[2237413.744638] RAX: 0000000000000003 RBX: 000000000000000e RCX: 0000000000000008
<4>[2237413.753251] RDX: 0000000000000002 RSI: 000000000000000e RDI: 0000000000000000
<4>[2237413.761887] RBP: ffff882016cf7c98 R08: 000000000000000e R09: 000000000000000a
<4>[2237413.770467] R10: 000000000000000e R11: 0000000000000000 R12: 000000000000000e
<4>[2237413.779078] R13: 000000000000000e R14: ffff8817c17f2a90 R15: ffff88102ae31800
<4>[2237413.787712] FS:  0000000000000000(0000) GS:ffff88103f600000(0000) knlGS:0000000000000000
<4>[2237413.797394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[2237413.804258] CR2: 0000000000000140 CR3: 0000001a3666c000 CR4: 00000000001406f0
<4>[2237413.812880] Stack:
<4>[2237413.815562]  ffffffff81a8a380 ffff88103f614b40 0000000000000005 0000000000011370
<4>[2237413.824798]  ffff88102ae31b18 ffffffff81090f17 0000000000000000 ffff8817c17f2af8
<4>[2237413.834069]  ffff8817c17f2c80 ffffffff00000000 ffff8820ffffffff ffff8817c17f2ad8
<4>[2237413.843219] Call Trace:
<4>[2237413.846398]  [<ffffffff81090f17>] ? wake_up_process+0x27/0x50
<4>[2237413.853269]  [<ffffffffa04349c1>] raid5_set_cache_size+0x5be1/0xa280 [raid456]
<4>[2237413.862023]  [<ffffffffa0438259>] raid5_set_cache_size+0x9479/0xa280 [raid456]
<4>[2237413.870765]  [<ffffffff810ccf80>] ? trace_event_raw_event_tick_stop+0xd0/0xd0
<4>[2237413.879373]  [<ffffffffa0319102>] md_set_array_sectors+0x202/0x210 [md_mod]
<4>[2237413.887601]  [<ffffffff810a8cc0>] ? wait_woken+0x80/0x80
<4>[2237413.894007]  [<ffffffffa0318fe0>] ? md_set_array_sectors+0xe0/0x210 [md_mod]
<4>[2237413.902531]  [<ffffffff81085fc9>] kthread+0xc9/0xe0
<4>[2237413.908445]  [<ffffffff81085f00>] ? kthread_create_on_node+0x180/0x180
<4>[2237413.916207]  [<ffffffff8152bf9f>] ret_from_fork+0x3f/0x70
<4>[2237413.922713]  [<ffffffff81085f00>] ? kthread_create_on_node+0x180/0x180
<4>[2237413.930440] Code: 00 48 8b 52 10 80 e2 10 74 0d 49 8b 56 48 80 e2 40 0f 84 b3 f3 ff ff 85 c0 0f 8e ab f3 ff ff 31 d2 eb 08 48 8b bc d5 50 ff ff ff <48> 83 bf 40 01 00 00 00 74 1c 48 8b 8f 58 01 00 00 83 e1 01 75


Any help is appreciated

Alex Crawford

unread,
Jun 14, 2016, 1:40:42 PM6/14/16
to coreo...@googlegroups.com
On 06/13, subhajit mukherjee wrote:
> Any help is appreciated

We are tracking the issue in the bugs repo [1].

-Alex

[1]: https://github.com/coreos/bugs/issues/1400
signature.asc
Reply all
Reply to author
Forward
0 new messages