Strange vector instruction when booting 1.6.2 FireMarshal linux on VCU118?

42 views
Skip to first unread message

Benjamin Ou

unread,
Nov 15, 2023, 1:39:30 AM11/15/23
to riscv-boom
Experimenting with a vector unit in BOOM and ran into a funky issue attempting to boot linux on FPGA (with default settings on everything to do with building the linux image) - the moment we hook up our unit to the LSU, the linux boot stalls. ILA debugging turns up that somehow, a vector instruction (consistently VLXEI8 - ordered index-strided vector load) snuck its way into the boot instruction stream and some poor handling on our end caused the stall. We verified this VLXEI8 instruction also shows up when booting on a vanilla 1.6.2 BOOM build, and is treated as an illegal instruction which somehow doesn't crash the boot process. We cannot find the instruction in the objdump of the linux image, nor do there seem to be any other vector instructions in the boot instruction stream. Anyone ever run into anything like this?

Jerry Zhao

unread,
Nov 15, 2023, 1:41:27 AM11/15/23
to Benjamin Ou, riscv-boom
Strange. The PC of the instruction could help determine where its from. It would be useful to know if this is in OpenSBI, the kernel, or in some early user process.

-Jerry

On Wed, Nov 15, 2023 at 5:39 PM Benjamin Ou <bislov...@gmail.com> wrote:
Experimenting with a vector unit in BOOM and ran into a funky issue attempting to boot linux on FPGA (with default settings on everything to do with building the linux image) - the moment we hook up our unit to the LSU, the linux boot stalls. ILA debugging turns up that somehow, a vector instruction (consistently VLXEI8 - ordered index-strided vector load) snuck its way into the boot instruction stream and some poor handling on our end caused the stall. We verified this VLXEI8 instruction also shows up when booting on a vanilla 1.6.2 BOOM build, and is treated as an illegal instruction which somehow doesn't crash the boot process. We cannot find the instruction in the objdump of the linux image, nor do there seem to be any other vector instructions in the boot instruction stream. Anyone ever run into anything like this?

--
You received this message because you are subscribed to the Google Groups "riscv-boom" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riscv-boom+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/riscv-boom/adaef6ab-2f8e-4f8d-8765-456733049648n%40googlegroups.com.

Benjamin Ou

unread,
Nov 15, 2023, 2:53:39 AM11/15/23
to riscv-boom
Pretty sure it's during the kernel boot; on the runs that stall, we're definitely getting past the bootloader, but nowhere near the Buildroot login prompt.

We did yank out some PCs from our ILA probes, but we had some problems collating them with the objdump addresses since we're pretty inexperienced at this. It looks like the objdump has each instruction addressed relative to zero (i.e. the first instruction under "start" is at address 0), and the addresses ranged from 0 to 0xEXXXXX, whereas the PCs we see in the hardware around the VLXEI8 instruction are around 0x80009XXX. The ones we noted down, for example:

com_pc - 0080009e88
next_pc - 0080009fec 0080009f00 0080009f08
pc_0_pc - 0080009f50 0080009efa 0080009f00

Our best guess is that the bootloader loads the linux image relative to some offset, so instead of instructions addressed 0 to 0xEXXXXX they might go from 0x80000000 to 0x80EXXXXX. We haven't gotten around to looking into what that offset might be though - I think my FPGA guy said the bootloader might drop a print message about where it's loading the linux image?

Jerry Zhao

unread,
Nov 15, 2023, 3:06:45 AM11/15/23
to Benjamin Ou, riscv-boom
Those addresses look like they come from OpenSBI. Do they not correspond to instructions from a objdump of br-base-bin?

-Jerry



Benjamin Ou

unread,
Nov 16, 2023, 1:36:18 AM11/16/23
to riscv-boom
So in the ILA, we did find that the instruction at 0080009f50 was a JALR, and that seems consistent with an objdump on br-base-bin; the instruction at that address is "ret". The PC immediately following that, however, is 0080009efa, which we found was misaligned in br-base-bin's objdump:

    80009ef0:   0ff77713                  zext.b  a4,a4
    80009ef4:   0085961b                slliw   a2,a1,0x8
    80009ef8:   0107e7b3                or      a5,a5,a6
    80009efc:   8fd1                          or      a5,a5,a2
    80009efe:   0087171b                slliw   a4,a4,0x8
    80009f02:   8f55                          or      a4,a4,a3
    80009f04:   2b31                         addiw   s6,s6,12
    80009f06:   2781                         sext.w  a5,a5
    80009f08:   463d                         li      a2,15

Landing between the "or" at 9ef8 and the "or" at 9efc. The exact instruction pattern we find for VLXEI8 varies between boot attempts, but if I'm getting the endianness right it looks like 9efa could plausibly correspond to a false instruction of 8fd10107, which...actually does decode into VLXEI8. So it seems like somehow the return address is getting corrupted and we're executing junk code in this region of br-base-bin.

The above assembly block is from the function fdt_next_tag by the way, in the file libfdt. It appears to be used in device tree traversal. Anyways, the real mystery now is how the heck the non-vector build we ran manages to make a similar error but still fully boot linux.

Md. Sadman Ferdous

unread,
Nov 16, 2023, 4:35:10 AM11/16/23
to Benjamin Ou, riscv-boom
Hello, I am new in this group. I'm a front-end RTL design functional verification engineer with 2 years of experience. I have decent knowledge of RISC-V, vector instructions, AMBA (APB, AHB, AXI), I2C, System-Verilog, UVM, python, cocotb, c++. I would like to contribute to this group's work. As a complete beginner how should I start? Is there anyone who is willing to instruct & walk me through it? TIA


With regards,
Md. Sadman Ferdous


Benjamin Ou

unread,
Nov 30, 2023, 5:32:38 PM11/30/23
to riscv-boom
Update: We've confirmed this issue exists when executing on both vanilla 1.6.2 BOOM and vanilla 1.6.2 Rocket, so it seems like it's an issue with the software.
Reply all
Reply to author
Forward
0 new messages