Frequency synthesis

gabor

unread,

Jul 10, 2007, 10:03:57 AM7/10/07

to

I seem to remember a number of threads about generating a fractional
frequency
for such things as baud rate. One approach is to use DDS, usually
with a
power of two divisor. For large numbers of bits you can get good
frequency
resolution, but you always end up with cycle jitter when your output
frequency
is not a power of two division of the input frequency. Also any
fraction not
reduceable to n over a power of two will not come out exact.

This code uses DDS with a variable divisor to allow jitter free output
for integer division of the input frequency, and exact fractional
frequencies with the usual induced cycle jitter for other
frequencies. The output is relatively square and must be less than
or equal to 1/2 the input frequency. If you want to use the code
to generate a clock enable instead of a square wave, the
output rate can go up to the input frequency as noted below.

module freq_synth
(
clk,
clr,
m,
d,
q
);

input clk; // Frequency reference in
input clr; // Asynchronous reset
input [15:0] m; // Frequency multiplier
input [15:0] d; // Frequency divider
output q; // synthesized clock out

// This module takes the input reference frequency and
// generates an output frequency of m / 2d times that
// frequency. d must be greater than or equal to m.
// For 200 MHz input, the maximum output frequency is
// 100 MHz (d == m).

reg q;
reg [17:0] a, b, diff; // counters and comparators (2 extra bits)

always @*
diff = b - a; // keep track of difference

always @ (posedge clk or posedge clr)
if (clr)
begin
q <= 0;
a <= 0;
b <= 0;
end
else
begin
a <= a + m; // a counts up by multiplier (always)
if (diff[17]) // if a gets ahead of b
begin
// count up by divider value and toggle the output
b <= b + d;
// Instead of complementing q, for a clock enable you could
// set q to 1 here and clear it otherwise:
q <= !q;
end
end

endmodule

John_H

unread,

Jul 10, 2007, 12:40:26 PM7/10/07

to

Gabor,

Rather than having two accumulators and a diffence calculation, why not use
a single accumulator?

While the accumulator doesn't oveflow, the accumulator is a<=a+m.

When the accumulator overflow, instead add m and subtract d at the same time
where the m-d constant is stored as one value, a<=a+m_d where m_d==m-d.

The mux-input accumulator is a simple implementation in Xilinx FPGAs, giving
the maximum accumulator performance if the overflow is pipelined.

Some of the better information for excellent use of this arbitrary modulus
accumulator is how to find the best fractional ratio for a desired
frequency. It's not obvious but is something software (or the developer)
would need to be able to figure out. I use a spreadsheet to get my "best
ratios" that I used for years but found out recently I needed to tweak that
further to a 2-step process for "acceptable" accuracy with fewer bits. DDS
is great fun!

Now how about developing a bit-serial DDS with master-clock edge placement
accuracy? For an n-bit accumulator, it's the placement of the edge within
those n bits that wasn't obvious to me how to do it in very few FPGA LUTs.
(The bit-serial approach is attractive for the Xilinx SRLs that turn a LUT
or two into one large shift register.)

- John_H

"gabor" <ga...@alacron.com> wrote in message
news:1184076237....@w3g2000hsg.googlegroups.com...

gabor

unread,

Jul 10, 2007, 5:54:41 PM7/10/07

to

On Jul 10, 12:40 pm, "John_H" <newsgr...@johnhandwork.com> wrote:
> Gabor,
>
> Rather than having two accumulators and a diffence calculation, why not use
> a single accumulator?
>

That was my initial thought, but it seemed like the dual accumulator
approach would be easier to meet the timing at 200 MHz in my Lattice
ECP (not ECP2) part.

> While the accumulator doesn't oveflow, the accumulator is a<=a+m.
>
> When the accumulator overflow, instead add m and subtract d at the same time
> where the m-d constant is stored as one value, a<=a+m_d where m_d==m-d.
>

I think the LUT count works out to be the same if you actually
implement
the subtractor for m-d instead of requiring the input to be supplied
in
that form. The subtractor for m-d would obviously not need to run
fast,
however.

> The mux-input accumulator is a simple implementation in Xilinx FPGAs, giving
> the maximum accumulator performance if the overflow is pipelined.
>

You don't really need to use overflow, just use the MSB of the
accumulator. When zero, accumulator is positive, add m_d. When
1 accumulator is negative, add m. This is probably the same
idea since the accumulator is presumed to have more bits than
the m and d inputs. In my case I had two more bits because
I wanted to pipeline the difference. But that broke the design
by always giving me two clock cycles of diff[17] in a row.

> Some of the better information for excellent use of this arbitrary modulus
> accumulator is how to find the best fractional ratio for a desired
> frequency. It's not obvious but is something software (or the developer)
> would need to be able to figure out. I use a spreadsheet to get my "best
> ratios" that I used for years but found out recently I needed to tweak that
> further to a 2-step process for "acceptable" accuracy with fewer bits. DDS
> is great fun!
>

Such software seems to exist, however not always in a cleanly usable
fashion. Cypress has some pretty sophisticated fitting software for
their phase-locked-loop chips (CyClocks). It is however tailored to
fit their products, so for a more general case you may still need to
roll your own.

> Now how about developing a bit-serial DDS with master-clock edge placement
> accuracy? For an n-bit accumulator, it's the placement of the edge within
> those n bits that wasn't obvious to me how to do it in very few FPGA LUTs.
> (The bit-serial approach is attractive for the Xilinx SRLs that turn a LUT
> or two into one large shift register.)
>

Yeah, bit serial would seem to be the holy grail for running at
or near the FPGA's maximum clock speed. Maybe on the next project
I'll have more time to play with that. :) Also it would be nice to
place the output on the closest edge (rising or falling) to its
ideal position. I've done that for fixed ratio designs. It seems
like it would be possible to do it here, too by looking at the
amount by which the accumulator overflowed in comparison to m/2.

Cheers,
Gabor

> - John_H
>
> "gabor" <g...@alacron.com> wrote in message

devices

unread,

Jul 10, 2007, 6:37:04 PM7/10/07

to

"gabor" <ga...@alacron.com> wrote in message
news:1184076237....@w3g2000hsg.googlegroups.com...

> I seem to remember a number of threads about generating a fractional
> frequency
> for such things as baud rate. One approach is to use DDS, usually
> with a
> power of two divisor. For large numbers of bits you can get good
> frequency
> resolution, but you always end up with cycle jitter when your output
> frequency
> is not a power of two division of the input frequency. Also any
> fraction not
> reduceable to n over a power of two will not come out exact.
>
> This code uses DDS with a variable divisor to allow jitter free output
> for integer division of the input frequency, and exact fractional
> frequencies with the usual induced cycle jitter for other
> frequencies. The output is relatively square and must be less than
> or equal to 1/2 the input frequency. If you want to use the code
> to generate a clock enable instead of a square wave, the
> output rate can go up to the input frequency as noted below.
>

I find this very easy to implement and also very compact:

http://www.fpga4fun.com/SerialInterface2.html

The good explanation makes it easy to understand and to
customize. The only two parameters needed are the
accumulator's width (related to the multiplier) and the divider.
The use of a spreadsheet is essential in order to find an optimal
value. I try some width values - Wi - and choose the one that,
keeping the wanted ratio, asks for a divider that is
closer to an integer number. Suppose

W1 gives D1 = D(W1) = 34.5

and

W2 gives D2 = D(W2) = 34.1,

i choose W2 / D2

John_H

unread,

Jul 10, 2007, 7:52:50 PM7/10/07

to

"gabor" <ga...@alacron.com> wrote in message

news:1184104481....@n60g2000hse.googlegroups.com...

> On Jul 10, 12:40 pm, "John_H" <newsgr...@johnhandwork.com> wrote:
>> Gabor,
>>
>> Rather than having two accumulators and a diffence calculation, why not
>> use
>> a single accumulator?
>>
>
> That was my initial thought, but it seemed like the dual accumulator
> approach would be easier to meet the timing at 200 MHz in my Lattice
> ECP (not ECP2) part.

<snip>

A quick run through SynplifyPro suggests the ECP (or ECP2) series won't
embed the mux into the adder structure like you can do in the Xilinx parts.
The timing suggested by SynplifyPro are just over 200 MHz for the Lattice
ECP/ECP2M parts and and nearly 300 MHz for the Xilinx Spartan3E (using the
default size/speed for each part from the SynplifyPro pick-list).

I like that your diff value is a clock enable rather than something that
feeds the carry chain for either accumulator.

In the approach I suggested that works well with the Xilinx parts, the m-d
value would typically be the constant input rather than the d value so it
would use the same resources as the d value for storage. If the m and m-d
constants are used, the Xilinx approach is one carry chain for resources.
The Lattice device implementing this approach would need the carry chain and
the external mux. Your method delivers nearly the top speed for the Lattice
architecture with two carry chains and one difference.

It's interesting how tight functions can end up with large resource
differences based just on the architecture. I'm hoping to get some Lattice
designs going so I appreciate the exposure this little exercise has given
me. I'll need to shift some of my assumptions around and learn anew how
well things can optimize.

- John_H

gabor

unread,

Jul 11, 2007, 8:54:34 AM7/11/07

to

On Jul 10, 6:37 pm, "devices" <me@home> wrote:
> "gabor" <g...@alacron.com> wrote in message

>
> news:1184076237....@w3g2000hsg.googlegroups.com...
>
>
>
> > I seem to remember a number of threads about generating a fractional
> > frequency
> > for such things as baud rate. One approach is to use DDS, usually
> > with a
> > power of two divisor. For large numbers of bits you can get good
> > frequency
> > resolution, but you always end up with cycle jitter when your output
> > frequency
> > is not a power of two division of the input frequency. Also any
> > fraction not
> > reduceable to n over a power of two will not come out exact.
>
> > This code uses DDS with a variable divisor to allow jitter free output
> > for integer division of the input frequency, and exact fractional
> > frequencies with the usual induced cycle jitter for other
> > frequencies. The output is relatively square and must be less than
> > or equal to 1/2 the input frequency. If you want to use the code
> > to generate a clock enable instead of a square wave, the
> > output rate can go up to the input frequency as noted below.
>
> I find this very easy to implement and also very compact:
>
> http://www.fpga4fun.com/SerialInterface2.html
>

Yes, this is the traditional DDS using a power of two
(in the non-paramterized case 1024) as the divisor.

> The good explanation makes it easy to understand and to
> customize. The only two parameters needed are the
> accumulator's width (related to the multiplier) and the divider.
> The use of a spreadsheet is essential in order to find an optimal
> value. I try some width values - Wi - and choose the one that,
> keeping the wanted ratio, asks for a divider that is
> closer to an integer number. Suppose
>
> W1 gives D1 = D(W1) = 34.5
>
> and
>
> W2 gives D2 = D(W2) = 34.1,
>
> i choose W2 / D2

Another interesting point with baud rate dividers is the standard
use of the 16x clock for receiving. This is a holdover to the
original UART chips that did not have internal baud rate counters.
Usually in an FPGA you have a clock that has a high enough frequency
that a simple integer division provides enough accuracy for a 1x
baud rate. In the case of 115200 baud for example, you can be
within 0.5% of the actual rate given an integer divide of any
frequency larger than 23 MHz, a pretty slow clock for modern
FPGA's. Multiply that by 16 and then you get into trouble with
the simple divider.

For my UARTs I generally use a simple integer divide to generate
the receive clock, there the divider gives a 1x clock output. At
the falling edge of the start bit I reset the counter to 1/2 of
the divisor. This provides a center-bit sampling pulse when the
count carries out. This method effectively combines the baud
generation and center sampling logic. Of course you'll need a
separate non-resetting counter to generate your transmit clock.

Cheers,
Gabor

devices

unread,

Jul 11, 2007, 12:02:23 PM7/11/07

to

"gabor" <ga...@alacron.com> wrote in message

news:1184158474.2...@n60g2000hse.googlegroups.com...

I developed a uart in vhdl, but when i started with verilog i didn't
feel like designing one again and i chose micro-uart by
http://www.cmosexod.com/micro_uart.htm . As i use a slow
(20 Mhz) clock and the micro-uart generates 16*baud i experienced
the issue you talk about. But when i dropped the original
baud generator module and replaced with the one i showed
i had no troubles. Perhaps it's because i chose the best divider.
I also chose a large accumulator's width. (if that's what you
mean when you say power of two divisor)