I have a new node that has been stuck in the IPMI Discover state for hours. I can ssh to the node from my workstation, here is the message and dmesg logs. What should I do next?
-Soheil
This the repeating error in /var/log/messages:
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: INFO: task modprobe:1270 blocked for more than 120 seconds.
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel:     Not tainted 2.6.32-504.3.3.el6.x86_64 #1
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: modprobe     D 000000000000000d    0 1270     1 0x00000000
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: ffff880c2bb2fca8 0000000000000082 0000000000000000 ffff880c2bb2fd28
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: ffff880c2bb2fca8 ffffffffa0194a5e 00000003bef751ff ffffffff8106d175
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: ffff880c2bb2fc28 00000000fffba750 ffff880c2bf6dab8 ffff880c2bb2ffd8
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: Call Trace:
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa0194a5e>] ? i_ipmi_request+0x28e/0x680 [ipmi_msghandler]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff8106d175>] ? enqueue_entity+0x125/0x450
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa0194f65>] get_guid+0x115/0x140 [ipmi_msghandler]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff81064be5>] ? wake_up_process+0x15/0x20
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa0197dbe>] ipmi_register_smi+0x45e/0xdf0 [ipmi_msghandler]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff810874f0>] ? process_timeout+0x0/0x10
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa01a112b>] ? clear_obf+0x1b/0x20 [ipmi_si]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa01a15c1>] ? kcs_event+0x351/0x590 [ipmi_si]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa019fca3>] try_smi_init+0x523/0x890 [ipmi_si]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff8120a90b>] ? sysfs_create_file+0x2b/0x40
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa01a413e>] init_ipmi_si+0x82b/0x9fe [ipmi_si]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffffa01a3913>] ? init_ipmi_si+0x0/0x9fe [ipmi_si]
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff810bfff1>] sys_init_module+0xe1/0x250
Mar 3 02:50:47 d00-8c-fa-03-a5-9c kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Mar 3 02:51:28 d00-8c-fa-03-a5-9c ntpd[3287]: 0.0.0.0 0612 02 freq_set kernel -4.933 PPM
Mar 3 02:51:28 d00-8c-fa-03-a5-9c ntpd[3287]: 0.0.0.0 0615 05 clock_sync
This is the last thing in the /var/log/dmseg :
scsi 6:0:2:0: qdepth(32), tagged(1), simple(1), ordered(0), scsi_level(6), cmd_que(1)
iTCO_vendor_support: vendor-support=0
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh
iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS
STARTING CRC_T10DIF
sd 6:1:0:0: [sda] 2923825152 512-byte logical blocks: (1.49 TB/1.36 TiB)
sd 6:1:0:0: [sda] Write Protect is off
sd 6:1:0:0: [sda] Mode Sense: 03 00 00 08
sd 6:1:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3 sda5 sda6 sda7 sda8
sd 6:1:0:0: [sda] Attached SCSI disk
ipmi message handler version 39.2
IPMI System Interface driver.
ipmi_si: probing via ACPI
ipmi_si 00:07: [io 0x0ca2] regsize 1 spacing 1 irq 0
ipmi_si: Adding ACPI-specified kcs state machine
ipmi_si: probing via SMBIOS
ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
ipmi_si: Adding SMBIOS-specified kcs state machine duplicate interface
ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0