On Thursday, August 10, 2023 at 3:17:28 AM UTC-4, Christopher Lozinski wrote:
> Thanks to all of you, my understanding of the technology and the market continues to improve. 16 bit Forth machines are very space and memory efficient, but the problem is that real time control applications use more than 16 bits of accuracy’s and you really do not want to be doing stack operations on doubles. Dup is two deep, Over is 4 deep, but rot is 6 deep. Sure at 60Mhz there is enough time, but it is just ugly. So I need an application which uses 16 bit data or less.
You've lost me. Rather than designing a CPU for an applicatiion, you want to design an application for a CPU??? What???
> AFAIK, the only data which is 16 bits wide is audio quality CD, so my focus is now on an audio processor for musicians, hearing aids, and echolocation. There are a lot of older Forth developers who need hearing aids, and in the US, they are no longer regulated.
Here is the $35 board I am targeting.
>
>
https://tinyvision.ai/products/pico-ice
>
> This board, can happily fit 8 * 16 bit stack machines.
> Each will have a hard core multiplier and 128Kbits of memory. That is 8K 16bit words. If needed, I can even go t0 16 processors, 8 with hard core multipliers.
Why are you locked into 16 bits? Audio actually needs more than 16 bits, because the calculations need to use more bits, in order to not lose precision in the process. It is very common to use 24 bits in the data and ALU paths for audio work. Any musician is going to drop a 16 bit device in the trash, because the artifacts will be audible.
> This processor will be very different from the J1 and Mecrisp processors. They are optimized to be as small as possible, with only 16 instructions. Each one only takes 160 LUT’s, but I have 5260 LUTs on this board. With all of that logic fabric, I can happily have 32 instructions instead of the J1’s 16. That makes this closer to Ting’s eForth EP16/24/32. Of course Ting had 3 different code bases, but with modern tools like Python’s Amaranth, or Java’s SpinalHDL, I should be able to generate 16, 24 and 32 bit cpus all from the same code base.
Why do you want more instructions? Not only will more instructions use more resources, they will make the processor slower. You can counter this by using long instruction words, to reduce or eliminate decoding, but there will be a limit. You will also improve speed by limiting all instructions to 1 clock cycle. When you need multiple clock cycles, you have to then have a counter, which becomes part of the instruction for decode purposes.
There is a reason why most people designing CPUs for Forth, don't call them Forth processors. They call them MISC (Minimal Instruction Set Computer). The use of Forth as the language for writing code is irrelevant. Minimizing the CPU and instruction sizes gains speed.
This is why I tell people to work on paper first. Once you write some code for your processor, you will find your additional instructions are seldom used and actually slow the processor overall.
> The J1 depends on dual port memory. It can read or write to memory at the same time that it is fetching the next instruction. And one needs that to be able to edit code. Sadly the inexpensive Lattice boards are mostly single port memory. So I have to use a Harvard architecture, where code and memory are separate. MicroCore does this. I suspect Ting also did this.
??? Are you talking about memory external to the CPU?
Harvard is good for stack machines. The instruction memory is seldom accessed, other than when loading. You can put constants in the data memory. The dual port memory is internal to an FPGA and is available in nearly every device that is still in production.
> I will also need some logic dedicated to audio processing. I am not yet sure what it will be. There may need to be a shift register to store some history from two or three microphones for cross-correlation and edge detection. There may need to be some flags for when the next bit of audio becomes available.
You are trying to optimize your code, not only before you've written the code, but before you've designed the CPU. It will be so much better if you don't try to do everything at once. Just try designing a stack processor and see where the issues are. First, you will need to figure out how to find logic paths slowing your clock cycle. Then write some code to do something useful. Run it in the simulator and get some timing data. No need for hardware even. Get some experience and learn what areas you need to come up to speed in.
> So I see a family of many core eForth CPU’s, all in Amaranth, taking the best ideas from many of the existing Forth cpus. Initially my focus will be on 16 bits for audio, later on 24 bits for video. If someone actually needs a 32 bit processor, I will release that as well.
>
> Particular thanks to @Lorum Ipsum for pointing out that the low end Lattice boards are also low power. Thanks to Juergen Pintaske for connecting me to the Facebook Forth group, and to a real time control expert who pointed out that they need 32 bit data. Thanks to the the Mecrisp-Ice author for pointing out that the Lattice Ice40 boards are single port. Thanks to the Core-1 lead for all of his great advice. Thanks to the 22 people on Facebook who like this idea, and to the one person who already wants to buy my many core Forth cpu!
Good luck.
--
Rick C.
- Get 1,000 miles of free Supercharging
- Tesla referral code -
https://ts.la/richard11209