So in the ILA, we did find that the instruction at
0080009f50
was a JALR, and that seems consistent with an objdump on br-base-bin; the instruction at that address is "ret". The PC immediately following that, however, is
0080009efa, which we found was misaligned in br-base-bin's objdump:
80009ef0: 0ff77713 zext.b a4,a4
80009ef4: 0085961b slliw a2,a1,0x8
80009ef8: 0107e7b3 or a5,a5,a6
80009efc: 8fd1 or a5,a5,a2
80009efe: 0087171b slliw a4,a4,0x8
80009f02: 8f55 or a4,a4,a3
80009f04: 2b31 addiw s6,s6,12
80009f06: 2781 sext.w a5,a5
80009f08: 463d li a2,15
Landing between the "or" at 9ef8 and the "or" at 9efc. The exact instruction pattern we find for VLXEI8 varies between boot attempts, but if I'm getting the endianness right it looks like 9efa could plausibly correspond to a false instruction of 8fd10107, which...actually does decode into VLXEI8. So it seems like somehow the return address is getting corrupted and we're executing junk code in this region of br-base-bin.
The above assembly block is from the function
fdt_next_tag by the way, in the file libfdt. It appears to be used in device tree traversal. Anyways, the real mystery now is how the heck the non-vector build we ran manages to make a similar error but still fully boot linux.