Booting without SD

644 views
Skip to first unread message

Daniel Jiménez Mazure

unread,
Dec 15, 2020, 9:20:44 PM12/15/20
to OpenPiton Discussion
Hi everyone,

I've been trying (I'm still trying) to figure out how to boot Linux in my custom board which hasn't an SD slot.

My assumptions are the following:

1) My OpenPiton custom implementation will remove the SD `define for the bitstream to be built correctly (that's ok, no tracks about SD in my actual implementation)
2) I need a way to place the Kernel and probably a ramdisk inside the DRAM
3) The uart dmw ddr parameter should help me to do 2). Is pitonstream meant for that?
4) I need to edit the devicetree somehow because I have not SD card, I will have a ramdisk isntead of a storage device, too. This is normally done automatically and the devicetree gets embeded in the bootrom. I don't know whether I need to do some edits here.
5) My board has a PCIe interface. I could add PCIe and add it to the chipset. Then I could use a similar flow as F1 uses (using dma_ctl api) to download the Kernel into the DRAM.
6) If I do 5), I would need to add the PCIe to the devicetree, which leads me again to think I should be able to create my own bootrom.

I think I need some guidance here. Pieces are there but I don't know how to put everything together.

P.D: I'm using Alveo U280 and I think the FPGA side is under control. PR promised :).


Jonathan Balkind

unread,
Dec 16, 2020, 2:54:05 PM12/16/20
to OpenPiton Discussion
Hi Daniel,

I'd suggest taking a look at the xupp3r support. For PCIe you may find it useful to reference Grigory's more recent updates: https://github.com/grigoriy-chirkov/openpiton/commits/aws2.0

1/4) You can remove the SD from the devices_ariane.xml for the board you're using and it won't get wired up. Note that the linux driver we have assumes the device is there and doesn't look at the device tree. However, that driver is only used when you are mounting the SD, not when you're just copying a kernel into memory.
2) You can do this with PCIe or pitonstream. If you want to just dump the thing into memory and not have a "virtual" SD then you could modify Ariane's linux bootrom to remove the copy and just assume the kernel blob is there already. Otherwise you can allocate a region of memory as a "virtual" SD on the devices_ariane.xml using the <virtual/> tag (see the f1 devices_ariane.xml as an example).
3) Yes uart-dmw ddr will let you do that. It'll probably be less convenient than just using PCIe though given the size of the kernels.
5/6) The way that Grigory does this doesn't add the device to the chipset. It just puts a crossbar at the AXI memory interface and lets the DMA access the memory directly. This requires no device tree or other modifications

Hopefully some of this helps - happy to give further pointers here!

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openpiton/690d8c3c-241c-4618-8f80-081371c34426n%40googlegroups.com.

Daniel Jiménez Mazure

unread,
Dec 16, 2020, 6:13:55 PM12/16/20
to Jonathan Balkind, OpenPiton Discussion
Hi Jon,

Thanks again! This clarifies a lot. I get a sense now on how the booting is handled when there is no SD on the system. I will take a look at the Xupp3r. Thanks for the hints and where to look at. The F1 example looks promising. 

So, let me rephrase, I should, first, change device_ariane.xml to the system to consider there is a "virtual" SD card, which is just a way for the bootrom to have an address to get the kernel blob from (this would be your second option which I find cleaner). For being able to do that, I should have first placed the kernel blob in that address via pitonstream/pcie.

Have I reformulated correctly? For the PCIe, I will check again Grigory's work. Not placing the PCIe in the chipset surprises me "conceptually" speaking. Is not in the chipset because it has little sense from a top view or because it is easier to implement?

Thanks again Jon, I really appreciate your time in this matter :D

Best,

Daniel

You received this message because you are subscribed to a topic in the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openpiton/STQLcZMDOeI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openpiton+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openpiton/CAJm4tWbhFXij9Y%2BHJjO00rS9rZOZTUV4UpSiy2847%3D0RLR5ABg%40mail.gmail.com.

Jonathan Balkind

unread,
Dec 16, 2020, 6:18:26 PM12/16/20
to OpenPiton Discussion
Yeah so if you go the virtual SD route, then your memory is going to be broken up somehow into the regular main memory part and the virtual SD part. The packet filter on the chipset will route both address ranges to the memory controller. You might need to tinker a little to get the sizing of this stuff right (the XML isn't coupled to the specifics of the controller itself) and that's where the F1 and xupp3r can be good examples.

Sorry to be clear when I say that the PCIe is not "in the chipset" - I just meant that it's not connected directly to the chipset crossbars, i.e. within the P-Mesh domain. Rather, it's connected behind the memory controller in the AXI domain. In that spot you can instantiate other stuff (e.g. I've put accelerators there on a crossbar in the same way that the PCIe DMA goes in there). This organisation also maps better with the way that F1 handles things.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 2, 2021, 7:15:55 PM1/2/21
to OpenPiton Discussion
Hi again Jon, and happy new year to everyone :).

I'm making progress on this porting. After reading twice your previous suggestion, I think it will be quite easier to drop the blob into the memory and not going through the virtual SD way. I haven't yet looked to Ariane's Linux bootrom to figure out what lines I have to remove to avoid the copy. That's homework.

I'm currently stucked here:

1) Pitonstream is receiving the "DONE" from the FPGA.
2) I'm running  pitonstream -b alveou280 -f test.list -p ttyUSB2
   2.1 test.list only has hello_world.c
3) I get a test compilation fail and the content of uart_piton.log shows as the picture below:
       error: token "##" is not valid in preprocessor expresions



There are a bunch of errors on the preprocessing phase. I'm not much of a SW guy, and this messages are quite weird to me. Anyway, all those look related with the __GLIBC_USE thing. I suspect maybe I have some misconfiguration on my enviroment. I have solved other errors related with tools not being in the proper version.

I'm in Ubuntu 20.04. The OpenPiton commit where I've started from is 9c2cf06f8535d20cc36082fc3bad6f7a7fa2eb0b, in case this is useful.

Again, thanks a lot for your time.

Daniel

Daniel Jiménez Mazure

unread,
Jan 2, 2021, 8:07:23 PM1/2/21
to OpenPiton Discussion

Hi again :). Quick update:

1) I've re-checked https://github.com/Jbalkind/openpiton/blob/jbalkind-m4/README.md and no luck.

2) I've tried precomiled test:

pitonstream -b alveou280 -d system -f ./piton/design/chip/tile/ariane/ci/riscv-benchmarks.list --core=ariane --precompiled

... and I get a much more interesting TIMEOUT error. There is plenty of threads here about this timeout error so I have to study those. In a first sight, it looks like it could be related with DDR4 configuration, which at the end of the day is the main component to be ported to the new board. I will place an ILA (or chipscope) to check this. Any hint will be much apreciated anyway :D.

Best,

Daniel

Jonathan Balkind

unread,
Jan 2, 2021, 8:11:02 PM1/2/21
to OpenPiton Discussion
Hey Daniel,

This may just be the fact that pitonstream wasn't properly ported forward to python3 - could you give cherrypicking this commit a try? https://github.com/sripathi-muralitharan/openpiton/commit/e0c5837c2fb29e1547f1ff6523f4446d928c3c6b

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 3, 2021, 8:40:55 AM1/3/21
to Jonathan Balkind, OpenPiton Discussion
Hi Jon,

Thanks! I've added these changes, but errors remain the same... I'm attaching the picture, not sure if the previous one was visible.

By the way, where should I look to learn how to use pitonstream to load my own stuff (kernel blob) to DDR?

Thank you very much,

Daniel

midas_bw_cpp.PNG

Jonathan Balkind

unread,
Jan 3, 2021, 12:24:26 PM1/3/21
to OpenPiton Discussion
Hey Daniel,

Looks like you forgot to specify that the core is ariane. It's trying to run SPARC test assembly instead.

Beyond what's in the help and readme, I think there's some documentation of pitonstream in the FPGA manual on openpiton.org but I don't think it covers the new riscv stuff. I think all you should be needing to do is using -precompiled to copy the bbl/opensbi blob though.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 3, 2021, 12:43:10 PM1/3/21
to Jonathan Balkind, OpenPiton Discussion
Hi Jon,

What can I say... Yes this is something that can happen to me, I forgot the Ariane parameter and now it compiles and the hello_World example is "PASSED".

Thank you very much for your help, now you know what kind of guy you are talking to :D.

Daniel

Jonathan Balkind

unread,
Jan 4, 2021, 1:15:15 PM1/4/21
to OpenPiton Discussion
Hey, we've all been there :) also note that I think the tests wouldn't have worked without the commit I sent so I think you had a combination of two issues.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 8, 2021, 9:24:38 AM1/8/21
to OpenPiton Discussion
Hi Jon. Thanks for your support ;). I've tried to do some reserach before comming back to you. I still have several unresolved gaps :/

First, a Timeout problem when I use precompiled benchmarks (see figure attached). I can modify the hello_world example, compile and succesfully run it though (I can see my custom message comming out from the UART, good enough). Any hint where I should start looking at? I guess this is going to be an FPGA debuggin process to find where the programs are getting stalled. I guess the DDR4 integration needs to be checked, keeping in mind the hello_world runs fine.

Secondly, I would like your confirmation on this: I don't need a initramdisk because it is embedded in bbl_linux_r11, right? I'm coming from the Xilinx/ARM/yocto way to boot Linux, which mainly consist in FSBL+UBOOT+DeviceTree+Linux Kernel. If you don't have an SD parition; you may use a ramdisk so you have your storage device. You decide this in the Uboot configuration process.

I'm trying to understand what corresponds to what in RISCV. I've understood for the FSBL, we have the bootrom in Ariane, which somehow also includes the deviceTree so the Kernel Knows where are and how to use devices, including the cpu. We don't have UBoot, but we have something similar in the form of BBL (Berkley Boot Loader), which handles this second step before the Kernel comes to the picture. Then, the Kernel starts being executed, and at some point it will look for a storage device. This can be a partition on an SD or a initramdisk we have ... 1) Embedded in the kernel 2) In a known address in the DDR (former is an assumption...well, all this are assumptions). Lastly, you mentioned OpenSBI but I guess this is not how OpenPiton proceed yet.

Last few days I was thinking on how and where to load the initramdisk, and how to modify the responsible of pointing at it. Then I discovered the ramdisk can be embedded with the Kernel and I guess I don't need to do anything like that (thank god!).

Assuming I don't have any FPGA  implementation bug (which I think I have because the Timeout error), I would be two steps behind from booting Linux:

1) Comment out the copy from SD to DDR as I don't have any SD as you mentioned some posts ago. If I don't do that, I imagine I could be corrupting whatever I've placed in the memory first. I need to keep browsing because is not clear to me how to find those instructions. Ariane Bootrom files are not too much Verbose.

2) Being able to use pitonstream to copy bbl_linux_r11 into the DDR, and nothing else (no initramdisk). Pitonstream would handle the right place in memory for the bbl_linux_r11. (I can't choose the address where I load things with Pitonstream, right?) The command would be something like: 

pitonstream -b alveou280 -d system -p ttyUSB2 -f linux.list --core=ariane --precompiled

...where linux.list would content "bbl_linux_r11".

When I release the reset button, the bootrom moves on, the PC reaches where the bbl is and I can play tetris :D.

Sorry for the extension of this message. I don't want by any means abuse of your kindness nor your time. So feel free to send me to the Readme because **maybe** all I need to know is there and I've not read enough:D.

Thans a lot Jon!

Daniel
timeout.PNG

Jonathan Balkind

unread,
Jan 8, 2021, 12:08:14 PM1/8/21
to OpenPiton Discussion
Hey Daniel,

Could you try running one test per pitonstream call? Sometimes there's weird behaviour with ariane if you have multiple in the tests.txt file. Also, how did you precompile the tests?

Regarding linux, you have the right sense that the blob has an initramfs. It's a blob of bbl containing a payload where the payload is that initramfs with the kernel. Ariane's linux zsbl is just copying from the SD to DRAM, setting the right register (a1 I think? I forget) to point to the dtb (which is in the bootrom), then jumping to 0x80000000. That's where control gets handed to bbl (which acts like a v simple uboot + other m-mode firmware) which will take things further.

Your description sounds good overall so I think you got almost everything you need ready?

A few things:
- Try the r12.bin rather than the r11.bin that you mentioned
- The physical switch which flips between pitonstream and linux boot modes also changes the bootroms. For your configuration you're going to want it to switch the bootroms but not the boot mode. You'll have to trace the code for that switch a bit to make sure you do that right. Obviously once you have pcie or something you can fix that back the way it was
- The precompiled mode assumes that the file is an elf which is not the case for the bbl.bin file. Could you check whether we also give an elf in the released tarball? You'd need to use that or modify the -precompiled option to allow you to use a bare binary

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 9, 2021, 9:51:18 PM1/9/21
to Jonathan Balkind, OpenPiton Discussion
Hi Jon,

Thanks a lot for your extremely clarifying explanation.

1) Pitonstream works great, as you may have imagined, I was not pre-compiling at all. I assumed it was done magically at some point. All the c codes I'm running with Pitonstream works (so that's something... ), regardless how many of them I place in the txt file.
2) Finally I understood how to modify the bootrom. I go to openpiton/bootrom/linux on Arine's repo, remove the sd code and make. Nice! THen, when comparing Ariane's bootrom and Openpiton's, something weird (weird for me) happens. Openpiton's linux bootrom doesn't end as Ariane's:

__asm__ volatile(
            "li s0, 0x80000000;"
            "la a1, _dtb;"
            "jr s0");

This is precisely the process you described, and I can't figure out how OpenPiton handles this last step. Maybe ... the uart_boot_en is touching the proper bit? 

3) I found the right switch to select the linux bootrom on the FPGA. I've added Xilinx's VIO to act as an array of switches. I'm using uart_boot_en, uart_timeout_en (not using it for real), now the bootrom selector and finally the board reset. With this I've been able to play a lot :). 

4) The problem I'm facing now is the pitonstream loading. I don't have the elf version of the bbl_linux_r12.bin file (this answers your question, I feel useful now), and I have to dig a little bit in your suggestion about modifying the --precompiled option. As I'm not an expert with Python, is it a possibility to try to compile the source code and make it an elf? 

5) Finally, Is going pitonstream to be active after loading the bbl? So far I've been able to see the linux bootrom on the shell when I configure the proper switch and try to use pitonstream in a normal way, like I need the messages the test_start signal triggers from the UART to see what the linux bootrom produces. I guess screen /dev/ttyUSBx should also work.

Thanks a lot!

Daniel


Jonathan Balkind

unread,
Jan 9, 2021, 10:04:41 PM1/9/21
to OpenPiton Discussion
Hey Daniel,

Glad to hear you're making progress!


3) You might actually want uart_timeout_en wired up for using pitonstream for linux now that you mention it

4) Ok I'd suggest you grab the openpiton branch of ariane-sdk and build your own. Note that I think you don't want to set RISCV for this despite it being mentioned in the readme. It should produce you a bbl (elf) and a bbl.bin I think: https://github.com/pulp-platform/ariane-sdk/tree/openpiton

I think the commands to build should be:
- git submodule update --init --recursive
- make all
- make vmlinux
- make bbl.bin

5) Yeah pitonstream should remain active. I think if you kill it once the transfer completes and open screen on that tty then you should be able to interact there as normal? Honestly I don't have the highest expectations for using the UART to transfer the ~10MB for this and maintaining integrity, but I don't think this is much of a rabbit hole or anything.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 11, 2021, 4:43:13 AM1/11/21
to OpenPiton Discussion

Hi Jon,

A lot of progress, thanks to you, thank you very much!

2) At last, I understand. Things were very blury at the beginning for me, now the loop is getting closed, thanks for spending time clarifying!
3) Done.
4) It turnet out that you are providing *.bin and "a no extension" version of it, what I think is the *.elf we need (see 5).
5) We demonstrated the overall process is right! I got Linux *starting* to boot (see picture attached). Pitonstream did his work :).

The Linux booting process hangs, (I'm happy anyway), and I would say (based on your suspicions) that is because the UART transfering the kernel got somehow corruptet (it takes around 15 minutes to get the whole file loaded). The attached picture shows were it stopped:

[ ----] futex hash table entries: 256 (order:2, 16384 bytes)]

I would say this is a random point to get hang, but maybe this rings a bell for you. In any other case, I will start integrating the PCIe and stop using UART, I think I've gained the needed knowledge of the overall workflow to do that. I think I will not bother you for a while :).

Thanks a lot

Daniel
bbl.png

Daniel Jiménez Mazure

unread,
Jan 11, 2021, 7:30:53 AM1/11/21
to OpenPiton Discussion
I've checked against other kernel logs an It's like the next thing the kernel would do is to check the network drivers. I guess I've to comment out those devices I'm not using on the devices.xml file... :D

Daniel Jiménez Mazure

unread,
Jan 11, 2021, 7:37:19 AM1/11/21
to OpenPiton Discussion
devices_ariane.xml has the "net" device commented out, so I guess that's not the problem.

Jonathan Balkind

unread,
Jan 11, 2021, 12:30:44 PM1/11/21
to OpenPiton Discussion
Hey Daniel,

This is great! I thought it might have taken a little more head bashing to get there.

Honestly with bbl things are a bit of a crapshoot. That could be due to corruption of uart transfer or it could be due to bbl corrupting kernel memory (a known issue which is why we switched to opensbi). There are of course other possibilities but I'd suggest moving to the pcie issue like you mentioned so that we don't need to start debugging a uart integrity issue. Also, you started from openpiton-dev in the first place, right? Rather than the openpiton branch.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 11, 2021, 2:39:35 PM1/11/21
to Jonathan Balkind, OpenPiton Discussion
Hi Jon,

Thanks for your thoughts. Well, thanks for everything. The entire environment is just amazing. The more I use it, the more I enjoy it . 

You're gonna kill me but ... I started on the OpenPiton branch... (update Ariane commit, 23/04/2020). Maybe just switching does some magic... :D

I'm definitely taking the PCIe path. I just have one question regarding the device tree. It shows the CPU clock-frequency attribute is 50MHz, and I've set 100MHz for the chipset_clk, which ends up being the one connected to the Tile. I followed the track of how this value gets set on the dts, but I lost it after tools/perlmod/perf,1.13. I would be surprised if this value on the dtb really matters because the UART is working for sure and it is also "wrong" (50MHz).

Thanks!

Daniel

Jonathan Balkind

unread,
Jan 11, 2021, 2:49:09 PM1/11/21
to OpenPiton Discussion
Hey Daniel,

I'm really glad you're enjoying it! It's gratifying to know since most people find research environments bearable at best :)

Aha I would definitely suggest trying to pull openpiton-dev if it's not going to trample over everything. There may be some useful fixes in there (though it does also push some stuff to python3 which may induce some more tinkering).

Did you update block.list in piton/tools/src/proto? The system entry there should reflect the frequency you're using. Otherwise the device tree generation script won't get the right value (https://github.com/PrincetonUniversity/openpiton/blob/openpiton/piton/tools/bin/riscvlib.py#L43)

Actually the frequency you put in the device tree doesn't necessarily matter that much in practice. Linux isn't using its UART driver so BBL is handling that (and it might not be configuring the UART and just relying on the initial configuration from the zsbl). If the frequency is wrong for other stuff then timers will just be off so system timeouts can be wrong.

Now that I think of it, there was an issue where the RTC was running at half frequency which got solved at some point. I'm not sure whether the code you have would have that fix or not... It could also interact in a funny way if you've got the system frequency wrong otherwise.

Oh and if you ever end up in tools/perlmod tracking down a problem it's probably not there

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 12, 2021, 5:30:24 PM1/12/21
to OpenPiton Discussion
Hi Jon,

For me, using OpenPiton was (almost) plug and play!.

 I.e, protosyn: Just a couple of issues regarding tool versions and those were covered in this group. That is impressive in my humble opinion. I've been in industry for several years now, and OpenPiton managed pretty well when used in a different (newer) Vivado version. Porting from other board was straightforward thanks to the different boards you are supporting righ now. In this case, I could start from a DDR4 design, making my life a lot easier. I tried, from a working desing, define "`define AXI4_MEM" when it wasn't previosuly, and that worked the first time I generated the bitstream. On top of that, it has never been easier for me to configure a RISCV enviroment. Normally I suffer a lot when setting up all the tools, but with OpenPiton was just a matter of following the readme. I don't know... All this is to say I don't think the word "bearable" is the one I will use when I speak about this work.

... And now anyone in the team can just write a program in C, compile it and see how it is working in our FPGA. I'm sincerely impressed. And greatful for your support.

I'm curious about one thing. You say you plan to support UBoot for the future. I was wondering, why is that difficult :D. I mean, obviously it is somehow tricky because if it was not, you just put the thing there instead of BBL. After going through all the process of booting and understanding (thanks for the insights on the bootrom, devicetree), I think I'm prepared to digest a brief answer to that  ... or maybe not :).

Best,

Daniel

Jonathan Balkind

unread,
Jan 18, 2021, 12:22:42 PM1/18/21
to OpenPiton Discussion
Hey Daniel,

Productivity was a key goal from my perspective so I'm really glad to hear you were able to get things up and running quickly. The FPGA prototype side of things really comes from the hard work of Alexey Lavrov (in chief, though a number of the rest of us did make big contributions) so if you ever see him you'll want to offer him a pint ;)

There isn't any inherent difficulty to u-boot. The "upstream first" boot flow I'd really like to get to is:

0. (M mode) zero-stage bootloader reads FS from SD, copies opensbi+uboot blob from /boot
1. (M mode) opensbi does firmware things, moves to S mode, passes control to uboot blob
2. (S mode) uboot does actual bootloader things to get kernel from /boot
3. (S mode) Linux boot
4. (U mode) Linux userland

This would mean no silly partitioning of the SD and just using upstream opensbi, uboot, and kernel all packaged from the distro (meaning apt/yum could update those!). What I think is missing right now is a suitable driver for us to use our SD device. Our SD is mapped straight into memory from 0xf000000000 (up to ~32GB in size iirc) rather than having any software management component. This makes things easy (we have a really simple device driver) but doesn't really match OS/bootloader expectations. The driver is already written for Linux but not for uboot. It may also be possible to just use an existing ramdisk driver (like /dev/mem even) for this given its simplicity.

Further, the zsbl I mention above would need to be able to read the filesystem and so on. Someone has started working on using u-boot spl for this which I think could be a really good option. Additional upside here is that it will probably share the driver I just mentioned with regular uboot. We might even be able to get that upstreamed?

Hopefully we can move step by step towards this more standard approach before too long.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 24, 2021, 7:36:14 PM1/24/21
to OpenPiton Discussion
Hi Jon,

Thanks a lot for your great explanation! I would love to contribute at some point, let's see if I can find something where I'm useful.

We did it Jon, check the attached figure! Thanks a lot for your help.

OpenPitonAlveo.png

I definitely have to buy some beers for Alexey Lavrov too. Great stuff!

Honestly, I don't know exactly what made Linux boot this time. I pulled from openpiton-dev, and I was on my way to add QDMA when I tried again the pitonstream. I was expecting to check some AXI address access, but then the booting process succeeded. 

1) I did as you suggested (based on Gregory's commits): I placed a crossbar just in front of the DDR4 MIG. One for the PCIe, the other for the Zeroer.
2) I found the same issue as Raghav here: https://groups.google.com/g/openpiton/c/ZAO_X0AZ_wY/m/1-CX2HbyAQAJ. Xilinx's AXI interconnect doesn't like not receiving the valid despite the wready/awready status, so it gets stucked. I use "a block diagram" to package the AXI interconnect, the QDMA (PCIe) and the DDR4 controller. I mention this because people don't like block diagrams and I think you would like me to change this before a PR. On the other hand, I think the AXI Interconnect in his RTL form is discontinued by Xilinx, so some effort must be done to use a combination of RTL crossbars + clock converters to implement this under the mc_top.v RTL file. The IP Integrator /Block design looked to be an easy approach, but it's painful to combine it with git or a scripted regeneration process. It can be done, but you face nasty issues when upgrading to different Vivado versions.
3) I added a register stage for signal init_calib_complete_out on axi zeroer, the critical path was there after the changes I made.
4) I used the prebuilt image you provide on your site (bbl_linux_r12) as you suggest.
5) Pitonstream does some funny stuff after pulling from openpiton-dev. It shows characters as \x00 or \r\n. I'm not sure why. I have a previous repo somehow outdate and pitonstream works fine. 

Thanks!

Daniel

Jonathan Balkind

unread,
Jan 25, 2021, 5:20:58 PM1/25/21
to OpenPiton Discussion
Hey Daniel,

This is great! I'm especially glad that you followed the mandate of using tetris to demonstrate a functioning linux userland environment :)

2a. Grigory may have fixed this issue in his commits. Did you see that by any chance? Otherwise it'd be good to have a patch for a fix to that.
2b. We could also make use of PULP AXI IP (xbar, cdc) rather than having to go for something as heavy weight as a Xilinx BD? I'm not sure what Grigory has been doing. https://github.com/pulp-platform/axi
3. Flopping that sounds totally reasonable. There are a few other places in our design where I expect we could at minimum get better Vivado runtimes if we just added a few flops.
5. Python3 changed many of the input/output functions to use "bytes" objects rather than "string". You can print a bytes object but it'll have things like that. If you can narrow down the part of the code, we can replace the bytes with a .decode('utf-8') or something like that to get a string and it should look the way it used to. I don't think this should affect functionality, just prettiness of output.

Did you get the DMA/AXI environment working too? Or just the pitonstream?

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Jan 25, 2021, 11:18:18 PM1/25/21
to Jonathan Balkind, OpenPiton Discussion
Hi Jon!

Never being so happy playing Tetris. Eager to check out the PvP mode...

Yeah, QDMA (PCIe) is working, but I'm still figuring out how the *** is the driver placing the data into the DRAM. Basically, is reverting octets as you point in the F1 walkthrough. I can see through ILA how I can place the bbl into the memory, but I have to get a better understanding of what has to go where. I've used objcopy to do a blind shot, no luck so far.

I can see how pitonstream starts placing the bbl at 0x0, and I'm still working on it, but it looks like the first reading Ariane does is at 0x3fffff0? (something like that). Anyway, that's my homework. I've added another virtual button to hold Ariane in reset so I can release the system reset button, download the bbl via PCIe and then release the chip reset (uart_boot_en = '0', uart_timeout='0', linux_bootrom enabled). I'm so happy seeing how fast PCIe copies the bbl compared to pitonstream ...:D:D.

2a: Not sure... I can easily get lost with Git and branches, but the last thing I can check from axi4_zeroer.v looks like this....

assign zeroer_req_val = init_calib_complete_in
                      & (req_sent < REQUESTS_NEEDED)
                      & (outstanding < MAX_OUTSTANDING-1)
                      & m_axi_awready
                      & m_axi_wready
                      & rst_n;

... and as zeroer_reg_val ends up being the awvalid, you can get the lockdown, as it depends on awready and wready being high.

2b: The bd is not so heavy: it can have two shapes: First, as a tcl, the most elegant. Later in the regeneration process, it is transformed to a xml file that Vivado presents as a nice block diagram. Second shape: The "bd" file itself. Is the same xml mentioned earlier. You can use it again in the tcl regeneration process. Maybe just a few KB Anyway, both options get messy when it comes to Vivado upgrades. Maybe the former behaves better, as you can do the upgrading visually. The tcl can just break the tcl process if included IP are upgraded between Vivado versions. There is a version mismatch and you have problems:D.

3. Yes. I will be playing with that, let's see where I can find more suitable places for our FFs.
5. Yes, despite the visual thing, everything behaves perfectly.

Many-thanks again!

Daniel

Daniel Jiménez Mazure

unread,
Jan 25, 2021, 11:23:36 PM1/25/21
to Jonathan Balkind, OpenPiton Discussion
Oh! Yes, I will browse a little bit into pitonstream to figure out where the root of the problem is and let you know.

Daniel Jiménez Mazure

unread,
Feb 12, 2021, 5:37:04 AM2/12/21
to OpenPiton Discussion
Hi Jon!

I finally got it working via PCIe. I had also to do the objcopy reversion you describe for f1. I had to cheat a little bit in how I use the uart_boot_en signal though. I realized that depending on that signal, the address translation between the cpu and the ddr changes. And when I set uart_boot_en = '0', I had to load the bbl at address 0x8000_0000. If uart_boot_en = '1', I can load the bbl at 0x0. Well, it happens that when I load the bbl at 0x8000_0000, the booting hangs at some specific point. I was browsing a little bit the forum and I found this:


It seems to me I could be in the F1 scenario, where the DDR is larger and I'm not benefiting of the "wrapping".

Well, at the end of the day Linux is booting and this is not a blocker :). I left uart_boot_en = '1' in all places but the input of uart_top module, and life is good. 

Thanks again!

Daniel

P.S: We have a chat pending on the PR topic.

Jonathan Balkind

unread,
Feb 22, 2021, 5:47:06 PM2/22/21
to OpenPiton Discussion
Hi Daniel,

Really glad to hear that you got this working! Great news.

As for the issue you mention, RISC-V Linux convention is for memory to start at 0x80000000. Keep an eye on things if you're using 0x0 instead because it might come back to bite you at some point. You could always just modify the storage_addr_trans modules to not change the mapping for your board or add another switch option or something.

Further, you'll probably want to try the opensbi version of things to avoid Linux crashes. We've seen a lot of them due to bbl in our time and usually it's not the hardware's fault but that old bbl's bugginess.

Did you need something from me re: the PR? Honestly there's no urgency there. I haven't had the time to review or deal with other stuff around PRs for a while. If you do open one it may be some time before I can figure out what's needed to merge it in. I would still like to receive it at some point though :)

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Feb 24, 2021, 7:10:18 AM2/24/21
to OpenPiton Discussion
Hi Jon,

Thanks for the useful insigths. I will try the SBI path, no question.

Regarding the PR.. sure, no hurries at all. We will keep trying to improve our own changes in the meanwhile.

Best,

Daniel

Daniel Jiménez Mazure

unread,
Apr 13, 2021, 12:21:30 PM4/13/21
to OpenPiton Discussion
Hi again Jon,

I was wondering... this workaround you discovered for Amazon F1, when you use objcopy theoricaly to invert how the Xilinx PCIe driver is transfering the binary to the main memory:

I have a collegue playing with Rocket. He also uses PCIe to place the bbl in the main memory and let the core fetch instructions at some point. The thing is... he is not inverting the binary in the host side, and it's working :0. Is there any chance this is related to the scalar core and not to the PCIe driver?

I need to go again to the debugging process and check how PCIe is actually transfering things to the main memory. I just wonder if you have any hunch on this.

Thanks!

Daniel

Jonathan Balkind

unread,
Apr 13, 2021, 12:24:16 PM4/13/21
to OpenPiton Discussion
Hi Daniel,

I think the newer patches from Grigory for F1 include that objcopy internally so you don't have to do it manually. You probably have the link for those from some previous discussion.
The issue could be because P-Mesh is big endian internally. There's probably some weirdness because AXI is specified to be little endian.

Thanks,
Jon

Daniel Jiménez Mazure

unread,
Apr 13, 2021, 6:39:19 PM4/13/21
to OpenPiton Discussion
Hi Jon,

Yes, you are right, I could see how Grigory manipulates the pcie user app to automatically do that. I came up with a script to do the booting process automatically, changing the binary file as a first step. Something like this: 

cp $1 ${1}_original
truncate -s %8 $1
objcopy -I binary -O binary --reverse-bytes=8 $1

Then, you call the booting process which is handled (managing the CPU reset, basically) using AXI GPIO via PCI AXI lite interface (QDMA, BAR2):

#!/bin/bash

FILENAME=$1
#FILESIZE=$(stat -c%s $FILENAME)
FILESIZE=$(du -b $FILENAME | cut -f1)

echo -e "Booting using $FILENAME image file which is $FILESIZE bytes\r\n"

dma-ctl qdma08000 reg write bar 2 0x0 0x0 #Hold the CPU in reset
sleep 2 #Maybe not necessary as we removed the Zeroer FSM
dma-to-device -d /dev/qdma08000-MM-0 -s $FILESIZE -a 0x80000000 -f $FILENAME #load the bbl into main memory. You have to create the queues first!
dma-ctl qdma08000 reg write bar 2 0x0 0x1 #Release CPU reset

In this way I can use the drivers as Xilinx provides them. Nothing fancy here anyway.

What you mention about the endianess makes sense to me. It should be something like that, as we have the counter-example using the same HW, same PCIe configuration for the Rocket design.

Thanks a lot for the hints.

Best,

Daniel

Jim Andronikou

unread,
Jul 25, 2024, 8:58:55 AM7/25/24
to OpenPiton Discussion
Hello Daniel

I am trying something similar to what you have done. I want to load the bbl.bin in a VCU128, so I have to do this through pitonstream or OpenOCD. For some reason OpenOCD does not work correctly, so I am trying to use pitonstream.
How exactly did you compile the bbl.bin?
I am placing it into diag folder and I use --precompile, but this does not work. It seems like it doesn;t find the bbl even if it's inside the test.list file, which I use in pitonstream command.

Thanks a lot
Dimitris

Jonathan Balkind

unread,
Jul 29, 2024, 10:19:45 AM7/29/24
to OpenPiton Discussion
I wouldn't expect bbl.bin to be found because I believe the search mechanism is based on sims. If I remember correctly, sims will look for files which end in .riscv when -precompiled is set. You may also have to place the file under piton/design/chip/tile/ariane/tmp/riscv-tests/build/benchmarks/ (https://github.com/PrincetonUniversity/openpiton/blob/openpiton/piton/tools/src/sims/sims%2C2.0#L2170-L2173) or add another -asm_diag_root= to your sims arguments.

Thanks,
Jon

Jim Andronikou

unread,
Jul 30, 2024, 12:57:41 PM7/30/24
to OpenPiton Discussion

Thank you very much for the reply Jon.

I have done these. I have transfered the bbl.bin to piton/design/chip/tile/ariane/tmp/riscv-tests/build/benchmarks/ with name bbl.riscv. The pitonstream finds the bbl, but it doesn't load it to memory. It looks like it doesn't even detect its storage size which should be > 1. Then I see the bootROM output which is ok.
Why is this happening? Am I missing something?

Screenshot 2024-07-30 150217.png

Thanks.
Dimitris

Jonathan Balkind

unread,
Jul 30, 2024, 1:01:00 PM7/30/24
to OpenPiton Discussion
My recollection is that the mapping for RISC-V programs is simple because we don't need to remap addresses (unlike with the SPARC core which required addresses to be remapped) and so that's why there would only be 1 section. I think that might also be why it's saying 1 block of storage.

As for why you're not seeing output, I'm not immediately sure. I'd suggest you load up some ILAs to dig in and see if the cores are starting up and getting out of the bootrom or not. Do other tests (e.g. hello_world_many.c) run successfully?

Thanks,
Jon

Jim Andronikou

unread,
Jul 31, 2024, 5:41:11 AM7/31/24
to OpenPiton Discussion
Hello Jon

Yes other tests like hello_world run successfully. 
I did some research and I found that the problem starts from ./piton/tools/bin/rv64_img. In this file there is the
"${RV64_TARGET_TRIPLE}-objcopy -I elf64-littleriscv -O binary diag.exe diag.o" which does not work for bbl.bin (I noticed that diag.o has zero size, also mem.image's size is not even close to 15MB of bbl). I tried to change the "-I elf64-littleriscv" with "-I binary", and then I saw that diag.o was created. Also now it uses more blocks of storage and mem.image contains the actual bbl hex data. The problem now is somewhere in the scripts. because I get this:Screenshot 2024-07-30 200930.png

So, should I use the elf image instead of bin? Or is there any way to use the binary image?

Jonathan Balkind

unread,
Jul 31, 2024, 5:59:37 AM7/31/24
to OpenPiton Discussion
Aha yes you need the elf. You could also just be cheeky and rewrite rv64_img temporarily to remove/edit the objcopy if you felt like it :)

Jim Andronikou

unread,
Jul 31, 2024, 8:46:28 AM7/31/24
to OpenPiton Discussion
Hello Jon

Thanks for the reply!
I finally managed to boot Linux in VCU128 using pitonstream. 
  1. I used the bbl.bin in piton/design/chip/tile/ariane/tmp/riscv-tests/build/benchmarks/ with name bbl.riscv
  2. Then I changed the piton/tools/bin/rv64_img and /piton/tools/src/proto/image2stream.py
    1. In  rv64_img change "${RV64_TARGET_TRIPLE}-objcopy -I binary -O binary diag.exe diag.o"
    2. In  image2stream add " addr = int(addr)  " to " strFromAddr  " like this
    3. "def strFromAddr(addr, width):
          addr = int(addr)
          s = str(hex(addr))
          h = s[2:]
          return (int(width)-len(h))*'0' + h"
After this you will be able to boot bbl.bin

Jonathan Balkind

unread,
Jul 31, 2024, 8:48:18 AM7/31/24
to OpenPiton Discussion
Excellent! Just a warning that when we and other users did this in the past, it wasn't uncommon for there to be some corruption because the image is quite large. Usually JTAG ended up being much more reliable when it was available.

Reply all
Reply to author
Forward
0 new messages