V2 Clock Update

Steve Haynal

unread,

Feb 1, 2016, 2:03:56 AM2/1/16

to Hermes-Lite

Hi List,

I am about to draw up the V2 clock schematic, but there has been some new information and new ideas since we last discussed this. Further discussion and comments are appreciated.

VersaClock 6 Measurements from IDT

About a month ago when I asked KE5FX for phase noise measurements of the VersaClock 5, I also asked IDT for VersaClock 6 measurements for the specific frequencies of interest. We were fortunate that Eddy van Keulen, W6EDY, at IDT responded and provided helpful input. The summary is that generating 79.872 MHz from a 25 MHz reference is good, but the same from a 10 MHz reference is bad. I've added all the details to the end of the clock phase noise wiki page. The original plans to support an external reference at 10MHz directly to the VersaClock 6 won't work. We will have to lock the VersaClock to a 10 MHz input to the FPGA digitally, in similar fashion to a VCXO, or even use a VCXO.

Less Expensive VersaClock 5 Part

Eddy mentioned the following regarding differences between the VeraClock 5 and 6:

"The main feature in the VersaClock 6 that improves the phase noise is a frequency doubler behind the reference clock input. A higher frequency into the phase detector means a higher loop bandwidth for the PLL in the VC6 which causes lower phase noise levels at certain distances from the carrier. Spur levels will be similar between VC5 and VC6."

Since VC5 and VC6 differences are not too significant, and since the VC5 still looks a bit better than the Si510 in the plots from John Miles, I took a closer look at VC5 offerings. It turns out that for single-ended clocks, which is what we need, there is a $2.80 version of the VC5. See the 5P49V5923 at Digikey and Mouser. Since this part should also replace a $1-$2 oscillator for the gigabit PHY, it now only adds $1-$2 overhead. If we go with a VC5 or VC6, this is the part I'd like to use.

79.872 or 76.8 MHz HL Frequency

Eddy also mentioned that we should consider a 26 MHz oscillator, as these are a common frequency for the cellphone industry. As mentioned before, you want a high reference clock input frequency. Also, you want a simple fraction or "short" decimal portion. 26 MHz is more friendly as 79.872/26 = 3.072 but 79.872/25 = 3.19488.

This led me to investigate just what the best existing and inexpensive options out there are for a reference clock. I like 38.4 MHz as it is a common frequency and there are several inexpensive yet stable options. See this 500ppb part and this 1.5ppm VCTCXO. 79.872/38.4=2.08, a relatively simple fraction, and 38.4 is a higher frequency, not in the HF spectrum.

The frequencies we use such as 61.44, 73.728, 79.872 are frequencies the existing architecture can support easily. As an example of how these are derived, take the highest bandwidth we currently support, 384kHz, multiple by 8 for the polyphase fir filter, and then multiply by some integer to represent the decimation factor of the CIC filters. For example, 384kHz*8*24=73.728MHz. The 24 is a nice factor as it can be divided further for future higher bandwidths. For example, 768kHz*8*12 also equals 73.728MHz. If we look at 79.872MHz (the closest we can easily get to the 80MHz spec of the AD9866), we see 384kHz*8*26=79.872MHz. If we divide 26 in half, we end up with the prime number 13 which means it is not quite as nice as 24.

There is a choice in between 24 and 26, 384kHz*8*25=76.8 MHz. This choice has several nice qualities. First, it is exactly twice the frequency of the common 38.4 MHz parts. Just multiplying by 2 makes the job easy and cleaner for the VC5, and opens the door for some other frequency doubling methods. Second, 25 can be factored as 5*5. This leads to some of the bandwidths already seen in the HiQSDR and supported by the polyphase FIR filter: 76.8MHz/(5*8*4)=480 kHz, 76.8MHz/(5*8*2)=960kHz. Another possibility is 76.8MHz/(5*8)=1.92MHz. We could support 480kHz easily, 960kHz would require doubling the clock (not too bad) for the polyphase FIR, 1.92MHz is an interesting possibility for the future.

The downside of 76.8 MHz is that the spur at Fs-Ftx will only be moved up by 3.072 MHz, not 6.144MHz as would be with 79.872 MHz. If we use a VC5, this is not that important as we can use either with fairly simple multipliers of 2 or 2.08. Recent results without the IAMP appear to indicate that this spur is not that big of a concern anymore. Please provide feedback: what do you think the final preferred frequency should be?

VersaClock 5 or Other Frequency Doubler

Since 76.8 is just a simple doubling of 38.4, we could possibly use some other low phase noise frequency multiplication scheme. Poking around, I found this and this. The JFET doublers and the HCMOS doubler look attractive and relatively simple, but I am not sure what the final phase noise will be. I know it will be worse than the TCXO spec due to the the doubling, and the TCXO specs for the parts I linked to earlier are not significantly better than what comes out of a VC5. (One might want to find a 38.4 oscillator with the lowest phase noise.) There should be fewer spurs with these doubling techniques when compared to the VC5. Once you add up the price for one of these doublers, it works out to be not too much different from the low priced VC5 minus the cost ethernet oscillator which would now be required without a VC5. Any non-VC5 doubler would have to offer significant improvement in phase/spur performance over the VC5 to be considered. Does anyone have experience and/or recommendations for such a doubler?

The AD9866 has a PLL for frequency doubling built in. Unfortunately, figure 77 in the datasheet leads one to believe that using it for RX introduces quite a bit of phase noise degradation. I've been replicating similar data to what is shown in figure 77 by using firmware that expects a 38.4 MHz clock and then using my VC5 evaluation board to generate the 38.4 MHz. I then use another HL as a signal source with an attenuator fed to the RX of this 38.4 MHz unit. Granted there is the phase noise in the TX, processing in the firmware and software not tuned for these types of measurements, and phase noise in my VC5 EVB, but I am looking for large relative differences. So far I see some phase noise effects for Si510, VC5 and doubling in the AD9866 with little difference among them. This is from spot checking and I am currently running more automated tests. If I had to rank them from best to worse, it would be best VC5, next Si510, worst AD9866 doubling. But again the differences are not that much, and it may be possible or easy for some to drive the AD9866 directly from a 38.4 MHz oscillator.

VCTCXO

Both oscillators I linked to earlier have the same small footprint and pinout. One of the parts, this 1.5ppm VCTCXO, is a VCTCXO. I think it would be good to be able to make use of a VCTCXO as done in the Hermes, but with a digital loop. This $0.71 DAC could be used to set the voltage. Since it uses an I2C interface, it can share that interface with the one used for the VC5 if present. It would be an option and would not need to be stuffed in every build. It would allow us to lock to an external 10MHz input yet still maintain a simple 2x requirement.

73,

Steve

KF7O

James Ahlstrom

unread,

Feb 2, 2016, 10:47:11 AM2/2/16

to Hermes-Lite

Hello Steve,

I think the 76.8 MHZ clock is a good option. As you say, there are more available sample rates, and it is a better fit for the 38.4 MHZ oscillator. I do not think that moving the spur down by 3 MHZ is a problem.

I like the idea of using the AD9866 multiplier. For Fossin=38.4, M=2, N=1 we get Fdac=153.6 and Fadc=76.8. Figure 77 does show greater phase noise but any PLL will add phase noise. Both the Versaclock and Si570 are PLLs. The question is whether the AD9866 PLL is so inferior that an external PLL can produce a significant improvement. I doubt this.

I am glad you are measuring the phase noise to clarify the issues. But I don't think your measurement is fair to the AD9866. Your source is a PLL followed by the AD9866 PLL. The direct clock case is an XO and a single PLL. Certainly it is hard to beat the the simplicity of a common XO and nothing else.

Your lengthy post describing the clock issues is masterful. Way to go Steve!

Jim
N2ADR

Steve Haynal

unread,

Feb 7, 2016, 10:26:44 PM2/7/16

to Hermes-Lite

Hi Jim,

I am generating some interesting plots using a hacked version of quisk to take noise floor measurements across 10 MHz in 4.8 kHz steps. I am still putting together the information in a more comprehensible fashion. I've made the following changes to make the test more legitimate:

I updated the firmware to work at 76.8 MHz and have been using that for several days.
I have the 38.4MHz oscillators and am driving a Hermes-Lite with them. The "user experience" feels about the same.
I modified my Versa Clock 5 board to use a 38.4 MHz oscillator so that I can see what the simplest 2x looks like and compare that to the AD9866 2x.
I am still using a Si510 based Hermes-Lite for the signal source. Obviously this has its own clock phase noise, and the best I hope for are significant relative differences in measurements. I am using no interpolation on this TX HL as past measurements showed that had the least phase noise. I tried using another signal generator I have access to but it was noisier. A signal generator with known excellent phase noise properties is what I really need...

I too would like to use a straight 38.4 MHZOscillator/crystal, but want to make sure there is no significant compromise. Although the AD9866, Si510 and VersaClock 5 all use PLLs, I'd expect the VersaClock 5 and Si parts to excel as they are sold and marketed with low phase noise specifically in mind. Just as the IAMP on the AD9866 doesn't perform as well as the opamp you are using, the same may be true with the PLLs.

I plan to finish up my measurements, do some head-to-head WSPR measurements, (I am worried phase noise may be impacting signals buried deep in the noise), and then decide on the final clocking schematic.

73,

Steve

KF7O

Steve Haynal

unread,

Feb 13, 2016, 2:55:29 AM2/13/16

to Hermes-Lite

Hi List,

I have posted my phase noise experiments comparing the AD9866 PLL and the VersaClock, both using a 38.4 MHz reference. The details are on the wiki page (right click to open larger versions of the plots), but the summary is that I see enough phase noise degradation with the AD9866 PLL that it is too big of a risk not to have a VersaClock 5 available in V2, especially given the phase noise data we have on the VersaClock 5 from outside sources and the overall low price. But, the results with the AD9866 PLL were still very usable, and I will definitely include a build option in V2 to support this alternative, leave off the VersaClock 5 and widen the source of oscillators that builders can use. I have checked that board space should not be an issue. To support both options with a single firmware, we will standardize on 76.8 MHz and dedicate 1 FPGA pin (pulled high or low) to tell the firmware to enable/disable the RX doubling PLL in the AD9866.

73,

Steve

KF7O

Steve Haynal

unread,

Feb 13, 2016, 2:57:08 AM2/13/16

to Hermes-Lite

The direct wiki link is https://github.com/softerhardware/Hermes-Lite/wiki/Clock-Phase-Noise#versaclock-5-versus-hermes-lite-pll-frequency-doubling

James Ahlstrom

unread,

Feb 13, 2016, 10:25:14 AM2/13/16

to Hermes-Lite

Hello Steve,

Another excellent piece of work. I am surprised that the AD9866 PLL performs so poorly, but as always, measurements win out. The versaclock is cheap, and I suggest we standardize on it and leave no option for the AD9866 PLL alone. But I do not mind the option if you feel there is some benefit. I am just prejudiced against options.

Jim

N2ADR

Vasyl Kuzmenko

unread,

Jul 19, 2016, 8:55:16 AM7/19/16

to Hermes-Lite

Hi All,
I write plain FIR insted of polyphase one.
The purpose of that is less usage of FPGA resources. I sacrifice bandwidth to 48kHz.
FIR use 976 coefficients. Clock frequency is 76.8Mhz. So we can calculate 76800000/976 = 78689 samples/sec. This speed is fast enough to give us 48kHz.
Is it possible to use FPGA PLL 100Mhz multiplied by 2-3 to 200-300Mhz to overclock and speed up FIR calculation? 200Mhz give us 192kHz bandwidth with low usage FPGA multipliers and LE's.

Here is example code(It has not been tested in real FPGA, but gave imagination):

module fir1024(
    input clock,
    input clock_calc,
    input we,                                                  // memory write enable
    input signed [MBITS-1:0] x_real,                        // sample to write
    input signed [MBITS-1:0] x_imag,
    output reg y_avail,
    output reg signed  [ABITS-1:0] Raccum,
    output reg signed  [ABITS-1:0] Iaccum
    );

    localparam ADDRBITS    = 10;                                // Address bits for 18/36 X 1024 rom/ram blocks
    localparam MBITS        = 18;                                // multiplier bits == input bits
    
    parameter MifFile    = "./FIRII/coefL8_single.mif";                            // ROM coefficients
    parameter ABITS    = 24 ;                                    // adder bits
    parameter TAPS        = 976;                                    // number of filter taps, max 2**ADDRBITS

    reg [ADDRBITS-1:0] raddr, caddr;                        // read address for sample and coef
    wire [MBITS*2-1:0] q;                                    // I/Q sample read from memory
    reg  [MBITS*2-1:0] reg_q;
    wire signed [18:0] q_real, q_imag;            // I/Q sample read from memory
    wire signed [18:0] coef;                            // coefficient read from memory
    reg  signed [MBITS-1:0] reg_coef;
    reg signed [MBITS*2-1:0] Rmult, Imult;                // multiplier result
    reg signed [MBITS*2-1:0] RmultSum, ImultSum;        // multiplier result
    reg [ADDRBITS:0] counter;                                // count TAPS samples
    reg  [ADDRBITS-1:0] waddr;                                // write sample memory address
    reg [2:0] counter_skip_calc;



    assign q_real = reg_q[MBITS*2-1:MBITS];
    assign q_imag = reg_q[MBITS-1:0];
    

    
    firrom18_1024 rom(caddr, clock_calc, coef);        // coefficient ROM 18 X 1024
    
    // sample RAM 36 X 1024;  36 bit == 18 bits I and 18 bits Q
    // x_real, y_real 18 bit Real and 18bit Imag
    // it's a double ported RAM with a two clock signals for reading and writing
    
    firram36_1024 ram({x_real, x_imag}, raddr, clock_calc, ~we, waddr, clock, we, q);      
    
    always @(posedge clock) 
    begin
        if (we)
            begin
                waddr = waddr + 1'd1;
                counter_skip_calc = counter_skip_calc + 1'd1; //skip 2**3 = 8 samples - decimation rate
            end
    end
    
    always @(posedge clock_calc)
    begin        // main pipeline here
        if (we)            // Wait until a new sample is written to memory
            begin
                counter = TAPS[ADDRBITS:0] + 4;            // count samples and pipeline latency (delay of 3 clocks from address being presented)
                raddr = waddr;                                    // read address -> newest sample
                caddr = 1'd0;                                    // start at coefficient zero
                Raccum <= 0;
                Iaccum <= 0;
                Rmult <= 0;
                Imult <= 0;
                y_avail <=1'd0;
            end
        else
            begin
                if ((0 < counter+1'd1) && (counter < (TAPS[ADDRBITS:0] + 2)) && (counter_skip_calc==3'd0) )  //calculation only every eighth input sample
                    begin
                        Rmult <= q_real * reg_coef;
                        Raccum <= Raccum + Rmult[35:12] + Rmult[11];  // truncate 36 bits down to 24 bits to prevent DC spur
                        Imult <= q_imag * reg_coef;
                        Iaccum <= Iaccum + Imult[35:12] + Imult[11];
                        counter <= counter - 1'd1;
                        raddr <= raddr - 1'd1;                        // move to prior sample
                        caddr <= caddr + 1'd1;                        // move to next coefficient
                        reg_q <= q;
                        reg_coef <= coef;
                        y_avail <=1'd0;
                    end
                else 
                    if (counter==0)         //if (counter ==0) then we have been calculated ouput sample in Raccum(I) and Iaccum(Q)
                        begin
                            y_avail <=1'd1;
                        end 
                        
            end
        end

endmodule

Steve Haynal

unread,

Jul 19, 2016, 11:31:40 AM7/19/16

to Hermes-Lite

Hi Vasyl,

A plain FIR is one way to go, although I think you will have enough multipliers even in the smallest devices to divide the problem into phases. Maybe double the multipliers, etc.

On page 1-26 of this document the maximum speed of a Cyclone IV 18-bit multiplier is 200 MHz. This is under ideal conditions - fully pipelined with no logic on the input or output, and perfectly (perhaps manually) placed and tweaked. In the past I've hoped that we can run the FIR filters at 2x the clock, or 153.6 MHz. I've never met timing although I didn't try too hard. I think that is still possible and would give you 96kHz bandwidth. A multiple of the core 76.8 MHz clock is also better as it makes it easier to interface.

Have you tried simulating your Verilog? There are open source Verilog simulators, Icarus, cver and others, that are fairly easy to use. You could use MyHDL to simulate the original Verilog and your Verilog to make sure the end results are the same. You can even simulate/verify your Verilog at http://www.edaplayground.com/ . This site also has a list of populate open source simulators in the pull down menu.

73,

Steve

KF7O

Vasyl Kuzmenko

unread,

Jul 19, 2016, 12:37:40 PM7/19/16

to Hermes-Lite

Hi Steve,
Thanks for the quick answer.
Right now whole project takes 97% LE's and 51% multipliers on EP4CE6E22. It use 91% without decimation block. With two bank phase probably I will not fit, but besides plain, I will try to develop 2,4 banks polyphase FIR.
In addition, FIR that use minimal resources. Anyway, 96kHz is real 6K EP4CE6E22. Double/triple/four multipliers was an option I though about.

Thank you a lot about simulators. I did'n have so much experience,so I don't know how to simulate. I'm going to figure out. This is most important for now. \
73!
Vasyl

Steve Haynal

unread,

Jul 20, 2016, 2:59:56 AM7/20/16

to Hermes-Lite

Hi Vasyl,

That is interesting that you are running out of LEs with still half of the multipliers available. A very heavy user of LEs in this design is the cordic use to generate the LO. There are separate cordics for RX and TX. These could be combined if you don't want to support split TX and RX. Also, you could replace the cordic with a NCO to save LEs at the expense of using a couple more multipliers. I wrote a NCO in Hermes-Lite/rtl/ramnco that should be a direct replacement. It supports loading of arbitrary waveforms for some predistortion experiments, but you could populate the RAM (could be ROM) with pure sine wave coefficients. This will take less resources than a cordic. This example also shows how you can verify that the nco is equivalent to the cordic. It generates frequency domain plots so you can be sure that there are no bad spurs. I want to eventually replace the cordics in the HL2 RTL with these, but haven't gotten around to it yet.

73,

Steve

KF7O

Vasyl Kuzmenko

unread,

Jul 28, 2016, 8:52:47 AM7/28/16

to Hermes-Lite

Hi Steve,
Your advices was very useful. The results:
I have written and successfully simulate simplest(it use only two multipliers and very few LE's ) plain fir, let's call It fir ver. 1.01. I guess it's can be only some light improvements regarding to less resources usage.
It can handle 48k sample rate. So if it's really works in hardware - than based on fir ver1.01 I can write then some double clocked 153.6Mhz and 2 banks polyphase FIR. (ver. 1.10 and ver. 1.20)
But right now I'm switch to NCO development. Actually I spend this day for reading how it basically works.
The goal is to write simple and simulate NCO with couple of multipliers to produce raw I, Q-samples at Fs frequency.

// File: ramnco.v
// Generated by MyHDL 0.9.0
// Date: Mon Jul 25 15:37:03 2016


`timescale 1ns/10ps

module ramnco (
    clk,
    intf_cos,
    intf_run,
    intf_addr,
    intf_we,
    intf_din,
    intf_phase,
    intf_sin
);
// RTL 

input clk;
output signed [11:0] intf_cos;
reg signed [11:0] intf_cos;
input intf_run;
input [10:0] intf_addr;
input intf_we;
input signed [11:0] intf_din;
input [31:0] intf_phase;
output signed [11:0] intf_sin;
reg signed [11:0] intf_sin;

reg [10:0] raddr;
reg [19:0] lfsr;
wire signed [11:0] rdata;
reg [0:0] state;
reg [32:0] phase_acc;

reg signed [11:0] wavetable [0:2048-1];




always @(posedge clk) begin: RAMNCO_WRITE
    if (intf_we) begin
        wavetable[intf_addr] <= intf_din;
    end
end



assign rdata = wavetable[raddr];


always @(posedge clk) begin: RAMNCO_FSM
    case (state)
        1'b0: begin
            intf_cos <= rdata;
            raddr <= (raddr + (2048 / 4));
            phase_acc <= (phase_acc + {1'b0, intf_phase});
            if (intf_run) begin
                state <= 1'b1;
            end
        end
        1'b1: begin
            intf_sin <= rdata;
            raddr <= phase_acc[32-1:(32 - 11)];
            lfsr <= {lfsr[0], (lfsr[19] ^ lfsr[0]), lfsr[18], lfsr[17], (lfsr[16] ^ lfsr[0]), lfsr[15], (lfsr[14] ^ lfsr[0]), lfsr[14-1:1]};
            state <= 1'b0;
        end
    endcase
end

endmodule

Here is the code generated by MyHDL. I have some questions, with is probably general educational:
1) It doesn't use Linear Feedback Shift Register (LFSR) for dither. How does LFSR effect the output signal? How it should effect that?
2) This module generate one after another cos(Wt) and sin(Wt) for I and Q channel respectively. But we need actually EVERY tact to have sin and cos? Or this already with decimation? Or we need doubled clock for NCO?
Where can I read some more useful info about NCO?
73!
Vasyl

On Wednesday, July 20, 2016 at 8:59:56 AM UTC+2, Steve Haynal wrote:

Hi Vasyl,

Vasyl Kuzmenko

unread,

Jul 28, 2016, 3:53:12 PM7/28/16

to Hermes-Lite

I have understand that LFSR should generate some small random number noise and add It to the phase_acc.

lfsr <= {lfsr[0], (lfsr[19] ^ lfsr[0]), lfsr[18], lfsr[17], (lfsr[16] ^ lfsr[0]), lfsr[15], (lfsr[14] ^ lfsr[0]), lfsr[14-1:1]};

This is basically 19 bit wide RNG . Witch bits we should to add to phase and what distribution RNG has? normal or uniform?
Also right now I do not understand some multiplication things:
input[ADC] signal is 10 - 16 bit wide. After multiplication 18 x 18bit =36bit. It use only as 12 input bits X 12bits of NCO sin/cos = 24 bit. after CIC it must be 18 bit wide(FIR use 18bit multipliers ).
After decimation in cic 25-200 it reduce signal by 5-7 (min from 2**4 = 16 to 2**7 =128 times reduce signal level ) bit to 24 - 4 =20 bit, 24- 7 = 17bit. This already overflow 18 bit. Am I right? Or there is some attenuation in NCO? For example 10 bit sin/cos LUT.
Whats is going on when we have 16 bit ADC and multiply it by 12 bit NCO sin/cos = 28bit wide signal?
Thanks in advance!
73!

Steve Haynal

unread,

Jul 29, 2016, 12:04:37 AM7/29/16

to Hermes-Lite

Hi Vasyl,

Please read the thread "Signal Cancellation for Spurs Works!" on this list. In particular, posts by me on 11/22/15 and 11/27/15 to that thread contain plots showing the effects of dither to reduce spurs. You will want to generate plots similar to the ones I did to verify the largest spur amplitude is low. There are also links in that thread to information on NCOs. You do have to truncate and round to the desired bit length sometimes.

73,

Steve

KF7O

Reply all

Reply to author

Forward