Experiments with increasing the number of HL2 receivers

229 views
Skip to first unread message

Steve Haynal

unread,
Jun 11, 2018, 12:39:00 AM6/11/18
to Hermes-Lite
Hi Group,

This past week I made some experiments to increase the number of receivers in the HL2. Here is a summary:

The native DSP clock runs at 76.8 MHz to match the ADC/DAC clock. The FPGA we are using is capable of higher clock speeds. For example, 122.88 MHz as used by the original Hermes or 125 MHz as used by the QS1R. I doubled the rate at which the final FIR operates to reduce required resources. This allows us to either reduce the number of polyphase banks or send both I and Q samples to the same FIR. I reduced the number of polyphase banks and was almost able to fit in 4 receivers. I think there are a few other places to optimize and then we can fit in four receivers this way. I added an option to disable the TX logic and with TX disabled I could easily fit in 4 receivers. I ran Alan's SparkSDR software with 4 receivers decoding FT8 and WSPR on 80 40 30 and 20M with this experimental firmware. Reducing the number of polyphase banks may not be the final best solution, but at least the experiment shows that we can run pieces of the DSP logic at 153.6 MHz.

The RTL is now parameterized and can use the new openHPSDR receiver from protocol2. Unfortunately, I was only ever able to fit one receiver in the firmware. Maybe with some reduction from 16 to 12 bits and maximum rate lowered to 384 kHz we can fit in two.. The new FIR filter looks interesting as it decimates only by 2 and does save some resources. There appears to be a trade off here as the proceeding CIC filter must have more stages.

The RTL is now parameterized and can use the receiver from the QS1R. The QS1R is able to fit 7 receivers so I was hopeful, but again I found the limit to be 3, maybe 4 once conversion from 16 to 12 bits is done. Although the QS1R uses a similar capacity FPGA, it does not have TX and it does not use ethernet. Both of these use significant additional resources. Also, given our lower sampling rate of 76.8MHz, the FIR filter design in the QS1R only supports up to 96kHz. I would have to double the rate of this FIR filter to achieve the original max of 192kHz. I did like the memory based CIC in the QS1R. If enough decimation has occurred so that you can do the CIC filtering in serial, it makes sense from a resource savings perspective to use a table.

All in all, these experiments gave me many ideas for how to write lean and mean receivers so that we can pack more into the HL2 firmware. I would like to rewrite the entire receive chain, but have so little time and so many other things I want to work on. I still want to send full spectrum data as discussed before, and look at synchronized receivers and transmitters. Since the HL2 is mainly a hobby and leisure activity for me, I often end up just doing what interests me most at the time...

73,

Steve
KF7O









James Ahlstrom

unread,
Jun 11, 2018, 12:10:36 PM6/11/18
to Hermes-Lite
Hi Steve and Group,


On Monday, June 11, 2018 at 12:39:00 AM UTC-4, Steve Haynal wrote:
Hi Group,

This past week I made some experiments to increase the number of receivers in the HL2. Here is a summary:
 
I have often wondered whether it would be useful to replace the cordic algorithm with a quadrature oscillator. The cordic is famous for not needing multipliers. But we have multipliers available. So we could use the oscillator in Richard G. Lyons book on page 786. Or if we have memory, we could use DDS techniques.

Jim
N2ADR

Alan Hopper

unread,
Jun 11, 2018, 1:18:05 PM6/11/18
to Hermes-Lite
Hi Steve, Jim and Group

I've also wondered about the cordic, it is quite greedy of gates, it also currently creates some very small spurs(probably irrelevant) at some frequencies, these can be removed by using more bits but at the cost of more gates.  I had wondered if tuning was restricted to much bigger steps (say 10KHz) if something simpler could be done, this would obviously cause problems with existing software but for an all out skimming solution might be worth considering.

I did have a hl1 cv running with extra receivers by leaving out the fir.

I made some progress with a hybrid of the memory cic and the new receiver, the block size of the memory is a bit wasteful but by processing I and Q in the same cic the number of blocks can be halved.  I think the number of banks in the new receiver fir could be halved anyway (assuming we stick to 384k max) and by using Steve's faster clock it possibly becomes a simple fir. Now the HL2 firmware has settled down a bit I'll dig out what I did.

73 Alan M0NNB


Steve Haynal

unread,
Jun 16, 2018, 2:31:57 PM6/16/18
to Hermes-Lite
Hi Jim,

Thanks for the link. I don't have the book but have ordered it. Is it:

e^(w+d) = e^(w)e^(d)

So that you only have to do the complex multiplication by a constant e^(d) to go around the unit circle at the correct frequency? If so, I wonder if it would drift off the unit circle due to accumulated errors. As you know, Cordics and most NCOs/DDSs have one wider (in terms of bits) accumulator from which both the real and imaginary are calculated from for each point on the unit circle. I think this may help with quantization/bit truncation errors. It would be interesting to run an experiment as it would be quite simple to jut proved e^(d).

Memory based oscillators due use up the memory resources. I had the arbitrary memory-based NCO on the HL1 for distorted oscillator experiments. If you want a pure sine, the table will typically be divided into a coarse and fine table and essentially make use of the same equation above. This allows you to have 16-bits of resolution for a single quadrant (18-bits total), but not require a memory with 16-bit addresses, just two at 8-bit addressed memories. There is a good example of this here: https://github.com/alexforencich/verilog-dsp/blob/master/rtl/sine_dds_lut.v

I was thinking of maybe the 2 table approach as above, but pipelined at twice the ADC clock so that real and imaginary are computed separately to reduce the required table width. Or maybe a hybrid approach with the coarse table doing the first 8 to 10 stages of the existing Cordic but then the remaining 8 refinement stages done by a smaller Cordic.


73,

Steve
KF7O

Steve Haynal

unread,
Jun 16, 2018, 2:38:14 PM6/16/18
to Hermes-Lite
Hi Alan,

When you left of the fir, did you send at 384kHz but then apply a decimating fir down to 48kHz in software? I'd like to have options to leave of the fir, but then we'd also have to increase the current sampling rates. Maybe when firs are not used the sampling rate will always be 8x of what is selected?

I've been thinking about a memory based CIC too. It seems that for the second state of CIC, after some decimation, one may even be able to use a memory based CIC for several receivers at the same time. I'd like to optimize the memory usage. You have to commit a memory block and that block will have more than enough slots for a single receiver so it would be good to use the extra for other receivers.

I've also been thinking about trade offs in the FIR given the different sampling rates. At the lower sampling rates, the FIR is often idle. It maybe possible to share a FIR at lower sampling rates. For example, one would have a maximum of 2 receivers at 384 kHz, 4 at 192 kHz and 8 at 192 and 48 kHz.

73,

Steve
KF7O

Alan Hopper

unread,
Jun 16, 2018, 5:02:50 PM6/16/18
to Hermes-Lite
Hi Steve,
my no fir experiment was done using a memory version of the cic from the protocol2 receiver so only expected a 2:1 fir, I simply added another 2:1 to the cic so it worked with existing software as long as you only used the middle half of the bandwidth,  this was just a first hack to see if the gains were worthwhile, this was done on the cv with cyclone 5 and worked well, my code did not transfer well to hl2 due to the different memory block structure/size and I ran out of steam tweaking it for hl2.  I did spend some time looking at doing multiple receivers in the same memory block and believe it is ultimately the way to go. 

Your point about sample rate is interesting, i remember realizing there were various trade offs here. There maybe a case for a special build with just enough bandwidth for data skimming purposes only.

lots of opportunities here .
73 Alan M0NNB

James Ahlstrom

unread,
Jun 17, 2018, 9:05:17 AM6/17/18
to Hermes-Lite

Hello Steve,

The NCO in Lyons' book uses four multipliers for the basic calculation, which is to multiply the previous output by e^(2 * pi * f * j). But as you say, the amplitude may drift off the unit circle, so an additional four multipliers are used for AGC. There are other quadrature oscillators in the literature. I don't know how the spectral purity of this oscillator compares with the cordic.

Jim
N2ADR

Steve Haynal

unread,
Jun 28, 2018, 2:20:52 AM6/28/18
to Hermes-Lite
Hi All,

I've been experimenting with different FPGA oscillators to save resources over the current Cordic. My copy of Lyons' "Understanding Digital Signal Processing" arrived and the algorithm Jim pointed to is very clever with the AGC. But I felt it required too may multipliers, 8 per two oscillator frequencies. (Two frequency outputs since we can run pipelined at double speed.) Also, to keep compatibility with the current protocol, one must still have some way to compute the cos+isin for the angle of rotations, at least once per frequency change. 

I am leaning towards a modified version of this NCO, which is a pretty standard implementation. Another description of it is in this paper. Once pipelined and making maximum use of any allocated memory blocks, the first version would require 1.5 memory blocks and 2 multipliers per two oscillator frequencies (double speed pipelined) and the second version 1 memory block and 3 multipliers per two oscillator frequencies. The second version uses a multiplier to approximate sin where as the first versions use an additional lookup table, which is a nice trade off knob between memory and multipliers.

For reference there are 66 8kb memory blocks and 66 18x18 multipliers in the FPGA we are using. Ten oscillator frequencies would then require between 5 to 8 memory blocks and 10 to 15 multipliers, which is very doable. All these methods would also require 2 multipliers per pipelined oscillator to do the mixing, or an additional 10 multipliers for 10 oscillator frequencies.

Below are spectrum plots for this new NCO and for the current Cordic for comparison. The spurs will vary with frequency, but I am seeing spurs no worse than about 95 dBc as indicated in the Harris paper. Given that we have a 12-bit signal, even spurs of 80 dBc would probably not be noticed. The spectrum plots are generated from the Verilog RTL so include all bit numerical representation limitations.

73,

Steve
KF7O

Alan Hopper

unread,
Jun 28, 2018, 3:01:16 AM6/28/18
to Hermes-Lite
Hi Steve,
that sounds very interesting and promising.  Is there any obvious pattern with the variation of spurs with frequency? It would be easy to make software avoid them or use sweet spots if the pattern is describable. 
73 Alan M0NNB

Steve Haynal

unread,
Jun 29, 2018, 1:26:59 AM6/29/18
to Hermes-Lite
Hi Alan,

The image at the negative frequency is perhaps the strongest consistent spur, but still better than 95 dBc for both the Cordic and the NCO. The NCO appears to have a lower overall average noise floor which makes the spurs stand out a little more. The Cordic has more spurs that break my 100 dBc limit. I haven't found any spurs worse than 95 dBc for either. For the same input settings, sometimes the NCO has the worst spur and sometimes the Cordic. Are you noticing any artifacts due to spurs in the current Cordic? Any concerns?

73,

Steve
KF7O

Alan Hopper

unread,
Jun 29, 2018, 2:35:12 AM6/29/18
to Hermes-Lite
Hi Steve,
the current firmware still has a small spur at 20.480Mhz, does this show up in your simulation? I don't think it matters but I was able to reduce it at the expense of more gates https://groups.google.com/d/msg/hermes-lite/_AIqU4jJqCg/S3WC6-CPAwAJ
73 Alan M0NNB

Steve Haynal

unread,
Jul 3, 2018, 1:18:30 AM7/3/18
to Hermes-Lite
Hi Alan,

I simulated at 20.480 MHz. The existing Cordic did have more than the usual spurs at that frequency, 6 between 89 and 95 dBc. The NCO was better with just three spurs at 96 dBc. I will keep an eye on this. 

When I finally do release firmware using the new NCO, I will first only use it for receivers 2 and up so that we can do some extensive testing and comparison against the existing receiver.

73,

Steve
KF7O
Reply all
Reply to author
Forward
0 new messages