Greetings -
I've been trying to get my SiFive Unmatched working with newer distributions recently, and I keep running into a few problems. Let's start with the basics:
Original Unmatched board
NVMe "Primary" storage (OS lives here) which is either a Samsung 970 Pro, or KIOXIA BG5
Booting of SD card using the Freedom U SDK 2024.01
Coolermaster V550 Power Supply
Originally the system had a Samsung 970 Pro SSD and a Radeon RX460 GPU. It can boot into the OpenEmbedded OS from the SD card without issue. Booting other OSes (Arch Linux, Ubuntu, etc) would fail with errors like these:
Timed out for waiting the udev queue being empty.
[a few seconds pass]
rcu: INFO: rcu_sched self-detected stall on CPU
[ 184.605933] rcu: 2-....: (14999 ticks this GP) idle=a40c/1/0x4000000000000000 softirq=1419/1419 fqs=6906
[ 184.615584] rcu: hardirqs softirqs csw/system
[ 184.626396] rcu: number: 7501 246 0
[ 184.636602] rcu: cputime: 0 0 29992 ==> 30012(ms)
[ 184.648265] rcu: (t=15010 jiffies g=-459 q=6301 ncpus=4)
[ 184.658392] CPU: 2 PID: 119 Comm: (udev-worker) Not tainted 6.5.0-26-generic #26.1-Ubuntu
[ 184.671517] Hardware name: SiFive HiFive Unmatched A00 (DT)
[ 184.682119] epc : apply_relocate_add+0x114/0x2c4
[ 184.691853] ra : apply_relocate_add+0x80/0x2c4
[ 184.701383] epc : ffffffff80008eb0 ra : ffffffff80008e1c sp : ffffffc8008c3bb0
[ 184.713676] gp : ffffffff82260888 tp : ffffffd889481b00 t0 : ffffffff02b8813c
[ 184.725994] t1 : ffffffff02d486e2 t2 : 0000000000011ac4 s0 : ffffffc8008c3c50
[ 184.738352] s1 : ffffffc803e6f2b0 a0 : 0000000000011ac4 a1 : ffffffff02a63000
[ 184.750706] a2 : ffffffff02d486de a3 : ffffffff02d486de a4 : 00000000000327d1
[ 184.763152] a5 : ffffffc803a10c00 a6 : 000000000002d233 a7 : 0000000000000000
[ 184.775552] s2 : ffffffc803e6e230 s3 : ffffffc803e6e330 s4 : 0000000000000017
[ 184.787853] s5 : fffffffffffff000 s6 : 0000000000000014 s7 : ffffffc8038689a0
[ 184.800208] s8 : ffffffff81201518 s9 : 0000000000000033 s10: 000000000002d233
[ 184.812573] s11: 0000000000011ac4 t3 : 0000000000000100 t4 : ffffffff80008974
[ 184.824898] t5 : 00000000004bbb98 t6 : ffffffff03427008
[ 184.835194] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
[ 184.848247] [<ffffffff80008eb0>] apply_relocate_add+0x114/0x2c4
[ 184.859371] [<ffffffff800c1042>] load_module+0x488/0x8de
[ 184.869903] [<ffffffff800c16c0>] init_module_from_file+0x82/0xd4
[ 184.881183] [<ffffffff800c18a6>] sys_finit_module+0x194/0x330
[ 184.892253] [<ffffffff80ce6e08>] do_trap_ecall_u+0xd6/0x154
[ 184.903108] [<ffffffff80cf1e3c>] ret_from_exception+0x0/0x64
Frequently these would happen as the GPU is trying to load firmware, so that would fail and the card would never initialize. So I put in a new fresh KIOXIA BG5 NVMe SSD and tried to install Ubuntu 23.10 from their Live Installer SD image. That would make the same complaints, and typically power off the system before the installer would even boot far enough to bring up subiquity. So after copying the "installed" image to the NVMe drive from OpenEmbedded I could get it to boot but the same GPU errors occurred.
Then I tried booting without a GPU per the recommendation from Bill Traynor on the RISC-V Community Slack. This worked without issue, no timeouts no stall errors on Ubuntu 23.10 using kernel 6.5.0-26-generic. So I swapped out the GPU to be a Radeon 6450 which does boot, and seems to be stable. I still get udev queue error and then a CPU stall detection kernel OOPS, but the system seems to be stable and functional with a working GPU.
Both of the Radeon GPUs I have do not take external power, and I think that may be part of the problem. Unfortunately, since the RTC and the SMBus sensors are provided by the same part, I can't seem to get lm-sensors to read those values because the SMBus device is locked by the RTC driver. Therefore I can't see the bus voltages. The 6450 uses less power than the RX460. I am going to try using another device like an RX5600 XT which does have a GPU power connector, but I won't be able to try that until the card arrives next week.
In the mean time, I'm curious if there are any other suggestions on what to do. As I said, the system does seem to consistently boot now at least, it just takes several minutes for all the timeouts and OOPS messages to show and then it's stable. I'm happy to take any other suggestions to try.
-Carl