Query on Shared memory access

198 views
Skip to first unread message

uma k k

unread,
Nov 19, 2024, 1:50:32 AM11/19/24
to OpenPiton Discussion
Hi,

We wanted to implement a shared memory which should be accessed by multiple threads(multiple read and write txn) in RISC V Ariane core. Are there any memory related tests in the environment? If not, where should I create the test, what are its dependency files to run the memory test?

Thanks,
Uma

Jonathan Balkind

unread,
Nov 19, 2024, 12:33:11 PM11/19/24
to OpenPiton Discussion
Main memory is shared among all the cores, so you don't need to do anything in particular. If you check hello_world_many.c (https://github.com/PrincetonUniversity/openpiton/blob/openpiton/piton/verif/diag/c/riscv/ariane/hello_world_many.c), argv[0][0] contains the current core's ID and you can use that to set the behaviour for a specific thread. You can see amo_cnt being used as a shared variable which is global to all the threads via the use of the `static` keyword. You can similarly allocate an array and use that array across multiple threads.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "OpenPiton Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openpiton+...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openpiton/9a3135b2-d7a1-45c2-93f8-affee8fae9d0n%40googlegroups.com.

uma k k

unread,
Dec 9, 2024, 2:57:05 PM12/9/24
to OpenPiton Discussion
Hi Jon,

The hello_world_many code is executing sequentially. Is there any way I can make it to run parallelly? Like all harts should be executing simultaneously without waiting for one hart to finish. (I need a MultiHART functionality). Is it possible in C or do I have to use assembly?

Thanks
Uma 

Jonathan Balkind

unread,
Dec 9, 2024, 5:34:47 PM12/9/24
to OpenPiton Discussion
The test is already parallel and all harts are executing in parallel. The atomics are used to sequentialise the specific I/O operations which cannot be done in parallel. You can just directly write code in the main() and it will be run by all of the cores, provided that you set -x_tiles=M -y_tiles=N for your MxN chip at build and run time. You'll also want to use -finish_mask=1111 (as many 1s as there are cores) and likely set a higher -rtl_timeout=

Thanks,
Jon

uma k k

unread,
Dec 20, 2024, 2:47:49 PM12/20/24
to OpenPiton Discussion
Hi Jon,

Thanks for replying. Is it possible to add random delays to each hart?  When I used the rand keyword, it threw an error. There are no supporting libraries for that. Are there any other ways to do it? How to check the simulation time for each hart?

Thanks,
Uma K K

uma k k

unread,
Dec 20, 2024, 2:47:55 PM12/20/24
to OpenPiton Discussion
Each hart should generate a random number per run. It should be different every single time I run. I have used the pseudo-random number but it is taking the core ID everytime. So it is not random. Or is it possible to generate random numbers externally (e.g., in the Linux command line) and pass them as an array into the program?

On Tuesday, December 10, 2024 at 4:04:47 AM UTC+5:30 jbal...@ucsb.edu wrote:

Jonathan Balkind

unread,
Dec 20, 2024, 2:51:29 PM12/20/24
to OpenPiton Discussion
Hi Uma,

The RTL simulations are generally deterministic but if what you want is for the cores to wake up at different times, you could potentially use a verilog random mechanism to change the wakeup times in ariane_verilog_wrap.sv. At the moment it has a standard timer of a certain number of cycles, so you could add a random delay after that.

If you need randomness inside the program, you're going to be fairly limited by the bare metal software environment. You could just find a random number generator from elsewhere but to seed it you'll probably have to build that into your program when you compile it.

Thanks,
Jon

uma k k

unread,
Jan 3, 2025, 6:28:17 PMJan 3
to OpenPiton Discussion
Hi Jon, 

Thanks. I want each hart to print its ID into a separate file. Is it possible? When I use atomicity, the output is printed in order. However, I want to demonstrate parallelism by having each hart independently write to its respective file without relying on atomicity. (When harts executing parallelly, they need not start in order right? I want to show that by printing it in separate files along with sys time). I have tried using fopen and fclose methods, but I am encountering errors. It seems I might be missing some libraries. I am not able to include string.h. May I know where should I include the header files for these operations if I am going to use them?  

Thanks,
Uma

Jonathan Balkind

unread,
Jan 6, 2025, 1:19:25 PMJan 6
to OpenPiton Discussion
To be clear, the output you see in fake_uart.log is not the core writing to a file via a library function. It is the raw I/O stores done to the UART address, printed using verilog $fwrite() in fake_uart.v https://github.com/PrincetonUniversity/openpiton/blob/openpiton/piton/verif/env/manycore/fake_uart.v#L55

Simulation is only providing you with a bare metal software environment. Functions like open/close/read/write are not implemented. You have to come up with your own solutions for this kind of thing that are specific to your environment and needs.

One option if you have the skills to use it would be to check out newlib dramfs that's part of blackparrot's SDK but I don't know the details and would not be able to provide support https://github.com/black-parrot-sdk/black-parrot-sdk

To be honest I'm not entirely clear why you need parallel file I/O. You should be able to just put bytes in independent buffers in parallel and then print those buffers sequentially after the test is complete.

Thanks,
Jon

uma k k

unread,
Jan 26, 2025, 11:26:33 PMJan 26
to OpenPiton Discussion
  I have written code that involves writing to and reading from cache memory. Which signals should I observe in the waveform to determine whether the cache is being updated? Additionally, how can I ensure that the values written to the cache are visible and correctly stored?  

Jonathan Balkind

unread,
Jan 27, 2025, 6:01:46 PMJan 27
to OpenPiton Discussion
Cache coherence is maintained by the hardware. If your code is written correctly there shouldn't be any question about whether the cache system is updated. What specifically are you trying to observe? There are three levels of cache so just saying "the cache" isn't specific enough for me to give advice.

Ky L

unread,
Feb 3, 2025, 12:08:52 PMFeb 3
to OpenPiton Discussion
Hi Dr. Balkind,

I'm currently at UCSC and am very intrigued by OpenPiton+Ariane's potential for parallel computing in biomolecular engineering.
One aspect that I hope to one day tackle is visualization.

I'm thinking that it's related to main memory, and I'm wondering if you would give some insights on how to add a
Linux Framebuffer (and possibly at least 16-bit SVGA output) to OpenPiton+Ariane's FPGA architecture.

I believe that an open-source FPGA parallel architecture with Linux to calculate and generate frames would be a blessing for many researchers.

Thank you so much,
Kyra

Jonathan Balkind

unread,
Feb 3, 2025, 12:35:02 PMFeb 3
to OpenPiton Discussion
Yes we have! Back when we were using SPARC as our primary ISA, we adapted an open-source framebuffer and were able to expose it as a fbdev in Linux. We could cat/echo bitmaps into /dev/fb0 and they'd show up on screen. Unfortunately SPARC had some alignment constraints that made it a little difficult to make work more broadly. At some point I tried bringing the code forward onto RISC-V but there were some other issues and I didn't have time to probe them more deeply. These days there's also the potential to use something like the Vortex/Skybox RISC-V GPU :)

Thanks,
Jon

Ky L

unread,
Feb 4, 2025, 1:14:47 PMFeb 4
to OpenPiton Discussion

Thanks Dr. Balkind. That's very exciting news.

The Vortex RISC-V GPU is a very nice design and 32 bits in a core at that with more modern standards implemented.
This is very good, but, at first glance, it seems to be "in-chip" centric (presumably to be as fast as possible).

However, I still very much like the "network-on-chip" that OpenPiton has pioneered. To me, this concept can
be scaled across an endless matrix of chips (FPGA or ASIC or both) to make an incredibly massive parallel machine
albeit somewhat at the cost of speed.

But, like our slow massively parallel brains, I don't believe that speed is everything. And, as the individual chips get better
designed, the parallel power of the whole network design of chips is amplified even in pieces or sections at a time.

If you could spare a little time, would you point to the code sections and files that you had left for the Linux Framebuffer
and how to re-enable them and perhaps some tips on debugging ?

Thank you so much,
Kyra


Jonathan Balkind

unread,
Feb 4, 2025, 1:20:18 PMFeb 4
to OpenPiton Discussion
Hi Ky,

I don't have the code to hand (it's somewhere but not exactly sure where off the top of my head) and this isn't near the top of my priority list at present. That said, given your interest I will have it in mind if I get some spare cycles at some point.

Also we have some folks looking at integrating Vortex into OpenPiton so people can potentially have the benefit of both :)

Thanks,
Jon

Ky L

unread,
Feb 6, 2025, 12:13:28 AMFeb 6
to OpenPiton Discussion

Thanks Dr. Balkind.

I was able to find your MICRO 2024 workshop paper with Univ. Grenoble Alpes and UCSB,
  "Preliminary Integration of Vortex within the OpenPiton Platform".

I have to say that it's an amazing plan, and I will eagerly be looking out for your next iteration.
Would you happen to have a repo in mind ?  Thanks.

Also, would you happen to have any thoughts on an "off-chip" P-Mesh for multiple CVAC6/Vortex FPGAs.
  In your 2019 paper, "OpenPiton: An Open Source Hardware Platform For Your Research", you
  mentioned that "OpenPiton’s cache coherence protocol extends off chip."

With more powerful FPGA chips like the Altera Arria 10, Altera Stratix 10, Xilinx Alveo U50/U250/U280,
  or the Xilinx Versal VCK5000, the dream of limitless configurable computational power is closer than ever.

However and unfortunately for myself at the moment, my poor hobby budget does not yet reach the
  $8000 to $15000 to get the awesome Altera Arria/Stratix or Xilinx Alveo/Versal Dev Kits along with
  the hefty annual Dev Software licensing fees.
  I'll just have to settle for some Linux Framebuffer code and an ebay-ish Genesys 2 (albeit with a
  permanent Vivado license ... yay).

I appreciate all clues that you've dropped and will start digging in my spare time too.

Thank you so much,
Kyra


Jonathan Balkind

unread,
Feb 6, 2025, 1:17:23 PMFeb 6
to OpenPiton Discussion
Yep, off-chip works. We'll be looking more deeply at a chiplet-based integration for 3.5D interposer settings. If you check out SMAPPIC from Grigory Chirkov, it also has multi-chiplet capabilities as run on F1.

A G2 with the framebuffer code should be sufficient - that's how we ran it before (note it was VGA).

Thanks,
Jon

uma k k

unread,
Mar 4, 2025, 12:23:09 PMMar 4
to OpenPiton Discussion
Hi Jon,

I am analyzing load and store operations in OpenPiton + Ariane and want to monitor L1 and L2 cache hits, misses, and evictions. I understand that performance counters like mhpmcounter3, mhpmcounter4, and mhpmcounter5 can be used for L1 dcache by configuring their respective mhpmeventX registers. Similarly, I assume there are equivalent counters for L2 cache tracking.

  1. How can I correctly configure and read these registers to track both L1 and L2 cache behavior (hits, misses, and evictions)?
  2. What are the key signals to observe in the waveform to analyze L1 and L2 cache hits, misses, and evictions?
  3. Is there any recommended method to correlate register values with waveform events for better debugging of cache behavior?
Thanks
Uma

Jonathan Balkind

unread,
Mar 4, 2025, 12:24:32 PMMar 4
to OpenPiton Discussion
Hi Uma,

I think these questions have been asked and answered in some previous threads if you could do some searching. There's a more convenient library which was contributed last year by Noelia from BSC that you can find in the repo and the (now merged) PR is here: https://github.com/PrincetonUniversity/openpiton/pull/144

Thanks,
Jon

Reply all
Reply to author
Forward
0 new messages