Problems with PCI GPUs on SiFive Unmatched

92 views
Skip to first unread message

Carl Perry

unread,
Apr 7, 2024, 4:07:42 PMApr 7
to RISC-V SW Dev
Greetings -
I've been trying to get my SiFive Unmatched working with newer distributions recently, and I keep running into a few problems. Let's start with the basics:
Original Unmatched board
NVMe "Primary" storage (OS lives here) which is either a Samsung 970 Pro, or KIOXIA BG5
Booting of SD card using the Freedom U SDK 2024.01
Coolermaster V550 Power Supply

Originally the system had a Samsung 970 Pro SSD and a Radeon RX460 GPU. It can boot into the OpenEmbedded OS from the SD card without issue. Booting other OSes (Arch Linux, Ubuntu, etc) would fail with errors like these:

Timed out for waiting the udev queue being empty.
[a few seconds pass]
rcu: INFO: rcu_sched self-detected stall on CPU
[  184.605933] rcu: 2-....: (14999 ticks this GP) idle=a40c/1/0x4000000000000000 softirq=1419/1419 fqs=6906
[  184.615584] rcu:         hardirqs   softirqs   csw/system
[  184.626396] rcu: number:     7501        246            0
[  184.636602] rcu: cputime:        0          0        29992   ==> 30012(ms)
[  184.648265] rcu: (t=15010 jiffies g=-459 q=6301 ncpus=4)
[  184.658392] CPU: 2 PID: 119 Comm: (udev-worker) Not tainted 6.5.0-26-generic #26.1-Ubuntu
[  184.671517] Hardware name: SiFive HiFive Unmatched A00 (DT)
[  184.682119] epc : apply_relocate_add+0x114/0x2c4
[  184.691853]  ra : apply_relocate_add+0x80/0x2c4
[  184.701383] epc : ffffffff80008eb0 ra : ffffffff80008e1c sp : ffffffc8008c3bb0
[  184.713676]  gp : ffffffff82260888 tp : ffffffd889481b00 t0 : ffffffff02b8813c
[  184.725994]  t1 : ffffffff02d486e2 t2 : 0000000000011ac4 s0 : ffffffc8008c3c50
[  184.738352]  s1 : ffffffc803e6f2b0 a0 : 0000000000011ac4 a1 : ffffffff02a63000
[  184.750706]  a2 : ffffffff02d486de a3 : ffffffff02d486de a4 : 00000000000327d1
[  184.763152]  a5 : ffffffc803a10c00 a6 : 000000000002d233 a7 : 0000000000000000
[  184.775552]  s2 : ffffffc803e6e230 s3 : ffffffc803e6e330 s4 : 0000000000000017
[  184.787853]  s5 : fffffffffffff000 s6 : 0000000000000014 s7 : ffffffc8038689a0
[  184.800208]  s8 : ffffffff81201518 s9 : 0000000000000033 s10: 000000000002d233
[  184.812573]  s11: 0000000000011ac4 t3 : 0000000000000100 t4 : ffffffff80008974
[  184.824898]  t5 : 00000000004bbb98 t6 : ffffffff03427008
[  184.835194] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000005
[  184.848247] [<ffffffff80008eb0>] apply_relocate_add+0x114/0x2c4
[  184.859371] [<ffffffff800c1042>] load_module+0x488/0x8de
[  184.869903] [<ffffffff800c16c0>] init_module_from_file+0x82/0xd4
[  184.881183] [<ffffffff800c18a6>] sys_finit_module+0x194/0x330
[  184.892253] [<ffffffff80ce6e08>] do_trap_ecall_u+0xd6/0x154
[  184.903108] [<ffffffff80cf1e3c>] ret_from_exception+0x0/0x64

Frequently these would happen as  the GPU is trying to load firmware, so that would fail and the card would never initialize. So I put in a new fresh KIOXIA BG5 NVMe SSD and tried to install Ubuntu 23.10 from their Live Installer SD image. That would make the same complaints, and typically power off the system before the installer would even boot far enough to bring up subiquity. So after copying the "installed" image to the NVMe drive from OpenEmbedded I could get it to boot but the same GPU errors occurred.

Then I tried booting without a GPU per the recommendation from Bill Traynor on the RISC-V Community Slack. This worked without issue, no timeouts no stall errors on  Ubuntu 23.10 using kernel 6.5.0-26-generic. So I swapped out the GPU to be a Radeon 6450 which does boot, and seems to be stable. I still get udev queue error and then a CPU stall detection kernel OOPS, but the system seems to be stable and functional with a working GPU.

Both of the Radeon GPUs I have do not take external power, and I  think that may be part of the problem. Unfortunately, since the RTC and the SMBus sensors are provided by the same part, I can't seem to get lm-sensors to read those values because the SMBus device is locked by the RTC driver. Therefore I can't see the bus voltages. The 6450 uses less power than the RX460. I am going to try using another device like an RX5600 XT which does have a GPU power connector, but I won't be able to try that until the card arrives next week.

In the mean time, I'm curious if there are any other suggestions on what to do. As I said, the system does seem to consistently boot now at least, it just takes several minutes for all the timeouts and OOPS messages to show and then it's stable. I'm happy to take any other suggestions to try.

  -Carl

Tommy Murphy

unread,
Apr 7, 2024, 4:33:48 PMApr 7
to Carl Perry, RISC-V SW Dev
Have you tried the SiFive forum for support on this?

Andreas Schwab

unread,
Apr 8, 2024, 3:49:35 AMApr 8
to Carl Perry, RISC-V SW Dev
On Apr 07 2024, Carl Perry wrote:

> [ 184.682119] epc : apply_relocate_add+0x114/0x2c4
> [ 184.691853] ra : apply_relocate_add+0x80/0x2c4

Try using a current kernel which has 080c4324fa5e ("riscv: optimize ELF
relocation function in riscv"). Also try using binutils master after
commit af514e5f6d1 ("RISC-V: Don't generate branch/jump relocation if
symbol is local when no-relax."). That should speed up relocation
processing considerably.

--
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Carl Perry

unread,
Apr 8, 2024, 12:24:35 PMApr 8
to RISC-V SW Dev, tommy_...@hotmail.com, Carl Perry
The SiFive forums were deprecated in favor of the RISC-V forums that now no longer exist. I do not believe anyone from SiFive is monitoring them based on the recent activity there. The RISC-V forums were deprecated in favor of the RISC-V Community Slack, and I asked there. The suggestion was to ask here.

Carl Perry

unread,
Apr 8, 2024, 12:32:50 PMApr 8
to RISC-V SW Dev, Andreas Schwab, RISC-V SW Dev, Carl Perry
Looks like that is in RC3 for 6.9 - so I will try again once that is released and I can build that on my Arch install at least. Thanks for the pointer!

Carl Perry

unread,
Apr 18, 2024, 11:20:21 AMApr 18
to RISC-V SW Dev, Carl Perry, Andreas Schwab, RISC-V SW Dev
Well, I was somewhat incorrect. This is what I get for reading patches on GitHub without expanding tags. The referenced patch was included in 6.8:

So I was able to install 6.8.6 on my machine and all the errors have been resolved. It only takes about 2min to boot, and no timeouts, soft lock errors, etc. So thank you all again, I'm glad this was an easy fix!
Reply all
Reply to author
Forward
0 new messages