The talk went very well. We had a lot of good feedback. A couple of good points:
1. They wanted to see video (i.e. streaming, compression etc.) be a part of the spec
2. They also wanted to describe how the frame buffer would be output to HDMI or DP etc. (not sure what they meant here)
3. They LOVED the idea of a fused CPU-GPU ISA
4. One guy who is ex-NVidia, ex-Intel did a lot of work on GPUs was extremely excited and praised this effort as being bold and what the industry needs.
5. There were people from ARM and Huawei there who were interested in compiler technology for the new ISA - i.e. how we would build that
6. They STRONGLY encouraged keeping the ISA description and a description of the reference implementation SEPARATE in the slides
At the Chennai Workshop I explained the commercial reasoning why the Libre RISCV SoC is a hybrid CPU, GPU *and* VPU. It's "boring" (i.e not AI) however there simply is no desktop, TV, tablet, smartphone or netbook processor that does not have Video Encode / Decode capability.
These are enormous markets where a hundred million units is considered small.
> 2. They also wanted to describe how the frame buffer would be output to HDMI or DP etc. (not sure what they meant here)
enjoy-digital's framebuffer RTL or Richard Herveille's RGB/TTL HDL.
Whilst the GPU (and VPU) *write* to the framebuffer, the Video Output HDL reads from the same memory location(s) over the same shared AXI4 (or other) Bus architecture, to which the same DRAM (or SRAM in embedded systems) is attached.
1080p60 32bpp will take up something like 10% of the capacity of a 32 bit wide DDR 800 memory channel, which, for a low cost tablet solution is tolerable (power consumption wise).
The nice thing about Richard Herveille's RGBTTL HDL is that external ICs such as the Solomon SSD2828, TI TFP410a, TI SN75LVDS83b or offerings from Chrontel can perform external conversion to LVDS, MIPI, DVI, VGA etc.
This reduces licensing NREs and also means that the clock rate of external GPIO is kept below 125mhz, even for 1080p60.
> 3. They LOVED the idea of a fused CPU-GPU ISA
cool. It just makes sense. The only reason why GPUs are separate ISAs is for convenience in 3rd party sales (think PCIe Cards).
The need for RPC marshalling of API functions massively complicates both hardware and software, and makes debugging and development extremely challenging.
> 4. One guy who is ex-NVidia, ex-Intel did a lot of work on GPUs was extremely excited and praised this effort as being bold and what the industry needs.
That's great to hear. It may be of interest that the idea is not new: ICubeCorp first proposed the idea of a hybrid GPGPU. They have both an ex SGI Compiler expert on the team, as well as a former ATI/AMD hardware expert.
The 55nm IC3128 was an extremely capable chip. Despite an internal clock rate of only 400mhz its actual performance, due to hybrid VLIW style instructions managed by the SGI compiler port, was equivalent to a whopping 1.6ghz.
> 5. There were people from ARM and Huawei there who were interested in compiler technology for the new ISA - i.e. how we would build that
The strategy that we laid out last year, before the 3D Alliance initiative, was as follows:
* Develop a Vulkan SPIRV to LLVM IR shader compiler *on x86*. The LLVM JIT compiler takes over and creates native assembler, suitable for parallel multicore execution.
* Use the *x86* LLVM IR JIT compiler (which is stable code) to run the Vulkan Khronos Conformance Tests.
* Repeat until passing all Conformance tests, 100%
* We therefore have at that point a "known good" position for the Vulkan source code.
* The source code is recompiled on RISC-V using the RISC-V LLVM port, in particular the LLVM IR JIT RISC-V compiler.
* Re-run the Conformance Tests. If they fail, investigate and debug the RISC-V JIT compiler (on the basis that they worked fine on x86).
* Once successful, another incremental known-good milestone is achieved. "actual" work (as far as hardware is concerned) only begins *after* this point.
* Benchmarking and analysis can be carried out to find the points in the Vulkan implementation where, on RISC-V, performance and power consumption suck (different 3D Alliance Members will have different definitions on what "sucks", for their customers).
* Hardware opcodes *for RISC-V* (not x86) can be designed, one at a time, which give a power-performance gain, incrementally.
* A simulator (spike-sv) can be used to assess whether those opcodes actually do the job, long before hardware is actually developed.
* Last in the chain is to develop the hardware opcode implementation, and confirm whether the power consumption anr gate count is acceptable (which will vary for each 3D Alliance Member). If not, go back four steps and repeat.
In this way we have a fast-iterative incremental strategy which goes from known-good to known-good milestones, where at the point that the performance and power consumption is "good enough", a point release may be declared.
Note: whilst the actual hardware needed by any given 3D Alliance Member will vary (due clearly to the different customer and market requirements served by each member), the use of Vulkan as the de-facto Industry Standard for 3D (OpenCL, OpenGL, DirectX gateways) is not.
Note in particular that at no point in this picture is the planned RVV Vector Proposal or in fact any Vectorisation at all a hard requirement.
Thie is critical for embedded 3D, where the gate count of a Vector Unit may simply be too great or not even necessary (640x480 or 800x600, 25fps, 15bpp).
Also note that the only point where native RISC-V opcodes actually get generated is in the *LLVM-IR JIT Compiler*. All and any work on RISC-V RVV front-ends, in both gcc and LLVM, as well as the work being done on binutils to support RVV is *completely redundant*.
Vulkan compliance *requires* a SPIR-V Compiler, and that SPIR-V shader compiler bears absolutely no relation to or have anything in common with work being done on binutils, gcc or LLVM.
We chose SPIR-V to LLVM IR translation because it at least allows us to short-cut the development time by using one small tiny fraction of pre-existing work: the LLVM JIT assembler engine.
Even here it will be necessary to modify the JIT engine to understand 3D hardware opcodes (and optionally, Vectorisation)
It is a lot of work and it is absolutely critical work, without which the hardware is completely useless.
Also of potential interest to ARM is that all that ARM need do is: replace the LLVM IR JIT RISC-V assembler with the ARM equivalent. Which no doubt already exists, and has ARM Vector extensions already in development.
btw please do not be lulled into a false sense of security by assuming that Vectorisation alone will give "great performance" and make a fully commercially saleable GPU.
Even with the best high-performance general purpose Vector engine in the world, it will *not* be commercially viable. Software-only performance will suck, and is only of value for Reference purposes (like MesaGL is, right now, as well as education and debugging purposes).
Jeff Bush's Nyuzi work, which replicated Larrabee, showed that a software only GPU, even with a high performance Vector Engine, requires FOUR times the gates (and power consumption) to reach parity with modern GPU performance.
Bottom line: whilst the Vulkan Software is essential, so are some 3D custom opcodes.
Also note that certain opcodes can be predicted well in advance to be needed, based on knowledge of the 3D (and VPU) fields. An overview of these was presented by Atif.
> 6. They STRONGLY encouraged keeping the ISA description and a description of the reference implementation SEPARATE in the slides
Yes, I normally do that. I wanted to keep to a single slideshow / video for this meeting, as it was quite a short talk (evening only).