
Kevin:
*** Claude tells me we have a super fast Verilog simulator running on a RISC-V softcore array, but my Xilinx FPGA is on the small side. ***
Your statement does not make sense.
There is no way in hell a Verilog simulator running on an array of RISC-V soft cores synthesized on any Xilinx FPGA - could beat any regular Verilog simulator (Synopsys VCS, Cadence Xcelium, Mentor Questa) running a typical Verilog simulation load on a regular x64 computer.
Soft cores on FPGAs generally run on much lower clock frequencies than in ASIC (like 20 MHz versus 2 GHz). It is possible to have an FPGA-optimized core running at 200, maybe 500 MHz, but this would be a simple core with a static pipeline, not a superscalar with high performance. You cannot compensate for this low frequency with a large number of RISC-V cores in the case of Verilog simulation because of the bottleneck of communicating these cores with the shared memory will kill the performance.
Xilinx FPGA-based emulators such as Synopsys ZeBu do _NOT_ achieve their high performance by running a software simulator on an array of RISC-V soft cores synthesized into Xilinx. They work on a completely different principle, but do a specialized synthesis of the given Verilog design without using soft RISC-V cores as an intermediate layer.
Your claim sounds like "Linux kernel can run super fast if we rewrite it in Python". This is just not something connected with reality.
If you need fast software simulator (Verilog, RTL level) and don't have access to commercial tools (Synopsys VCS, Cadence Xcelium or slower Mentor Questa), you can use Verilator (less features, but still good performance) or slower Icarus Verilog (it run ~50 slower than Synopsys VCS but still would beat any "super fast Verilog simulator running on a RISC-V softcore array synthesized in FPGA").
If you want to use a Xilinx board for prototyping, just synthesize your design; there is no point in adding an extra layer (a Verilog simulator running on a RISC-V softcore array synthesized in an FPGA). It is like adding horses to move your car.
*** Does anyone want to help me out with programming it into a Stratix 10 (I don't have a license)? ***
Why do you need synthesis for Altera Stratix? You just said you have Xilinx ZCU104. Stratix is a different and incompatible FPGA family that requires a different toolchain (Quartus instead of Vivado).
Any contractor you can hire would have to buy a $4000-5000 commercial license for Quartus Pro to synthesize for Stratix 10. He can also use a temporary evaluation license or convince Intel that he is doing some important research for the good of Mankind that warrants a free license.
Thank you,
Yuri Panchul
--
You received this message because you are subscribed to the Google Groups "SystemVerilog Meetups in Silicon Valley" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetsv+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/meetsv/CAAuo2DxxWwfe3fA_dPDWc-TRf_oNTF%3DS5%2BJJQv_-pa0OH1MA2g%40mail.gmail.com.
Kevin:
I am not sure what you are trying to communicate. You put links to several projects, including (based on my quick look):
1. A proposal for an equivalent of VHDL resolution function in SystemVerilog. Which mentions analog Verilog-AMS.
2. SystemVerilog to VHDL translator with the claim of improved X-propagation and user-defined nets..
Yep.
The "3D" logic approach removes the X-propagation problem (it's in the nvc libs). That's a UDN in SV and VHDL; I added the functionality required for describing bidirectional components using it (missing for 40/10+ years).
3. A linker that is capable of translating calls from C functions into a C-based design running on an FPGA board.
ldx is a smart linker that rewrites the code to take advantage of available accelerators, which can be on an FPGA.
I do DV for things like power electronics and RF, it's useful. O/1/X/Z logic doesn't translate properly on AMS boundaries, and the 3D logic is more efficient than 01XZ in simulation.
As far as (3) concerned, I can see that your c2v.py (C function to synthesizable Verilog) translator can produce only combinational logic. I do not see any value in it because the interface between C program running under Linux on ARM core in Zynq and FPGA logic is going to be the bottleneck and there is no point to spend numerous cycles to run a small combinational function on FPGA fabric, running at much lower clock frequency than the program running on CPU.
It's useful because you can define processor extensions as routines (for RISC-V), that makes code forward and backward compatible. You supply code with the routine in it and ldx will patch the binary to use the accelerated version if it is present. In one case that was a 30 to 1 cycle reduction.
It works better if the FPGA is bolted into a basic RISC-V CPU pipeline, but that hardware currently doesn't exist, so I'm using soft-cores.
There's no reason the translation can't be multi-cycle, we just haven't done that yet.
It works for any ISA.
If you have a demo of something useful running on a Zynq board, why don't you put some short readable slides outlining the practical usefulness and a video of the design running on a board?
In the repository, there is a mess of files, for example:
../../../arv/RISC-V.srcs/asynchronous/cpu/decoder.vhdl referred from Quartus - where is this file, Why do you need it for your simulation accelerator?https://github.com/kev-cam/ldx/blob/main/fpga/vexriscv/GenLdxCpu.scala - what is this file doing in the repository? If VexRISCV a part of your acceleration?
VexRISCV is the core being used in the processor array. You'll have to ask Claude why files are there, it writes the code and the docs.
The demo code that got uploaded today is an RTL simulation accelerator, offering a 1000x speed-up over Verilator, I think folks will find that useful (if it works as Claude claims).
What other kind of demo would you like? Maybe a scalable GPU?
Regards,Kevin:
*** The demo code that got uploaded today is an RTL simulation accelerator, offering a 1000x speed-up over Verilator, I think folks will find that useful (if it works as Claude claims). What other kind of demo would you like? Maybe a scalable GPU? ***
Let's keep the testing simple. You claim the sandwich of technologies produced by Claude (a SystemVerilog software simulator running on an array of VexRISCV cores synthesized and uploaded in Xilinx UltraScale FPGA) runs a representative test 1000x times faster than Verilator, right? Then it should surely run it even faster than Icarus Verilog, right?
Let's get a small RTL example, for instance 4_2_9_a_plus_b_using_wrapped_fifos from basics-graphics-music repository.
Let's modify the test a little bit: increase the number of transfers from 100 to 10000000 (ten million) and the timeout from 10 thousand to 1 billion. Let's also comment out logging and turn off VCD dump. It runs for 4 minutes 30 seconds (270 seconds) on my old Dell OptiPlex 3050 computer (i5-6500T 2.50GHz ~8GB memory) and for 1 minute 45 seconds (105 seconds) on my Mac Mini M4.
The modified testbench is here:
I propose you to run both the original test and the modified test on your Claude-generated thingy on your ZCU104 board. If it produces functionally correct results on the original test and runs the modified test in less than 10 seconds (27x speedup compared to Icarus on Dell and 10x on Mac), and there is no testbench cheating (i.e. displaying the expected output without actually running the test), and you do all of this until 4th of July 2026
- then I will pay you $1000. If not - you pay me $1000. Deal?
Note this is a very generous deal for you. Claude claims 1000x speedup, I want you to prove merely 10x speedup.
Thank you,
Yuri Panchul