ddr clock issues

David Ashley

unread,

Sep 18, 2006, 3:47:39 PM9/18/06

to

Open Cores DDR controller uses 2 DCM's to generate the clocks.

clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
feedback comes from true ddr clock
fddr has hard wired 01 inputs for true clock,
10 inputs for negative clock

clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
internal logic
-> (270 clock) bufg2 -> clock used for fddr's for
DDR's data in lines
feedback comes from the output of bufg1

dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
the way
to -530ps with no failure. It seems irrelevant.

I want to get rid of one of the DCM's, 2 seems excessive. Is it common
to use
an fddr to get a clock to the outside this way? That is, an fddr has
fixed inputs
(input0 <= '0', input1 <= '1') and so the fddr output is really just a
data selector,
when the input clock is low you get input0, when high you get output1. Why
not route the clock through to the outside directly?

I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
but it doesn't work reliably, I get flaky data.

Where can I find info about clock generation issues, specifically
related to ddr.
I never would have come up with the scheme that seems to actually work in
this case. Is it possible to do with just one DCM?

Thanks--
Dave

--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

David Ashley

unread,

Sep 18, 2006, 5:40:03 PM9/18/06

to

I found a xilinx app note xapp802.pdf which has a nice block
diagram of an approach with just just one DCM on page 3.
It is related to virtex but I'd hope spartan-3e would be the same...

-Dave

Gabor

unread,

Sep 19, 2006, 9:46:13 AM9/19/06

to

David Ashley wrote:
[snip]

> I want to get rid of one of the DCM's, 2 seems excessive. Is it common
> to use
> an fddr to get a clock to the outside this way? That is, an fddr has
> fixed inputs
> (input0 <= '0', input1 <= '1') and so the fddr output is really just a
> data selector,
> when the input clock is low you get input0, when high you get output1. Why
> not route the clock through to the outside directly?
>
> I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
> but it doesn't work reliably, I get flaky data.
>
> Where can I find info about clock generation issues, specifically
> related to ddr.

The FDDR is used to generate the external signal with the same
clock to output delay as the associated data lines. Routing
a clock to an output buffer requires non-clock resources in
the Xilinx parts. The FDDR takes the global clock (very low
skew) directly from the dedicated routing. Its delay is matched
to the clock to out delay of the DDR flops on the DQ bus. So
if you us a DCM and global clock resources to generate the
internal clocks for DQ and clock, you directly set the phase
relationship between the clock output and DQ. When you
try to route the clock through an output buffer you are at the
mercy of the router, and even if you get the design to work
the timing may change if you re-build due to chenges of
seemingly unrelated sections of the design.

David Ashley

unread,

Sep 19, 2006, 11:57:55 AM9/19/06

to

In experiments I had been able to get rid of the fddr's on the
true + inverted DDR clock outputs, but I just did that to
see if it would work. It's pointless since the FDDR's are part of
the IOB's anyway and conserving them doesn't make them
available for any other function.

However I wasn't able to get rid of the 2nd DCM, and I'm
running out of ideas to try.

One thing of note -- this is on the spartan-3e starter board.
It supplies a 50 mhz clock. I run this through a DCM to produce
100 mhz, and that's use to feed the other 2 DCM's. I kind of
remember this is not a good idea?

Unfortunately (according to my understanding of the DCM's)
you can't both get a multiplied output clock from a DCM and
have the 0, 90, 180 and 270 phases of that clock. So I don't
know how to accomplish this other than stringing DCM's
together. Or get an external 100mhz crystal oscillator and put
it into the socket.

Tommy Thorn

unread,

Sep 19, 2006, 12:25:44 PM9/19/06

to

David Ashley wrote:
> One thing of note -- this is on the spartan-3e starter board.
> It supplies a 50 mhz clock. I run this through a DCM to produce
> 100 mhz, and that's use to feed the other 2 DCM's. I kind of
> remember this is not a good idea?

The whole issue of DDR clock management and pin constraints is an area
I'm not too comfortable with. I wish X and A would include with their
development boards _simple_ example frobbing their SDRAM. Just enough to
show that it's working, not a complete controller. I can design the
logic for controller for SDRAM (DDR or SDR) just fine, but it seems
every FPGA (and board) have different clocking methodology and
constraints requirements.

David, I hope you find the solution and share it with us :-)

I assume there won't be too much difference between the ML401 (Virtex 4)
and the Spartan 3E starter kit.

Tommy

David Ashley

unread,

Sep 19, 2006, 12:57:39 PM9/19/06

to

I will certainly share whatever I learn.

One thing just occured to me. BTW I don't have any test equipment,
no 'scope, no logic analyzer, nothing. Just a crappy digital multimeter.
So I can't hook a scope up and look at the signals going into the DDR
itself.

For some reason I think the data going into the DDR is good. The
open cores controller does include logic to generate the DQS strobe
the DDR uses to latch input data. That approach would tend to
balance out timing problems -- the same logic that drives the data
also drives the DQS strobe, so they should sink or swim together
I suppose.

But the open cores DDR doesn't make use of the DQS strobe generated
by the DDR device itself. I'm only trying to run at 100 mhz. In that
case xilinx app notes say the timing is adequate so the DQS strobe isn't
needed to capture data reliably. Maybe the timing would get easier if
the logic made use of the DQS strobe from the DDR.

I have a feeling adding some constraints would make the thing work
with a single DCM. Unfortunately I have no clue what constraints to
add, as I don't know what's going wrong (and don't know much about
constraints writing anyway).

To get around the lack of test equipment, when the thing wasn't
working before I created a module called "vgatext" which outputs
a 96x40 stable text display to the vga outputs. Then I set up so in
the event of a ddr error it will display the desired data vs the actual
data -- and saw it was an off-by-one problem. That was a problem in
my logic but not in the timing.

-Dave

Nico Coesel

unread,

Sep 19, 2006, 2:46:42 PM9/19/06

to

David Ashley <da...@nowhere.net.dont.email.me> wrote:

>Open Cores DDR controller uses 2 DCM's to generate the clocks.
>
>clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
> feedback comes from true ddr clock
> fddr has hard wired 01 inputs for true clock,
> 10 inputs for negative clock
>
>clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
>internal logic
> -> (270 clock) bufg2 -> clock used for fddr's for
>DDR's data in lines
> feedback comes from the output of bufg1
>
>
>dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
>the way
>to -530ps with no failure. It seems irrelevant.
>
>I want to get rid of one of the DCM's, 2 seems excessive. Is it common
>to use
>an fddr to get a clock to the outside this way? That is, an fddr has

All you need is a normal clock and a 90 degrees phase shifted clock.
The whole clocking outside the fpga thing is unnecessary. If you place
the output flipflops inside the IOBs and use an fddr in the IOB to
replicate the internal clock, all signals connected to the DDR memory
will have the same delay.

--
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

David Ashley

unread,

Sep 19, 2006, 3:35:03 PM9/19/06

to

Nico Coesel wrote:
> All you need is a normal clock and a 90 degrees phase shifted clock.
> The whole clocking outside the fpga thing is unnecessary. If you place
> the output flipflops inside the IOBs and use an fddr in the IOB to
> replicate the internal clock, all signals connected to the DDR memory
> will have the same delay.

But the DDR spec says the DQS strobe for data written to the
fpga must be center aligned. The DQS is in phase with the
DDR clock. That means the data must be put on the lines
1/2 of 1/2 of a clock cycle early for proper alignment.

This requires a clock that is 270 degrees out of phase from the
DDR's clock. This is the clock used for the data lines going into
the DDR..

I don't understand the "clocking outside the fpga" you mention.
The fpga currently has one 50 mhz external clock source. I
run that through a DCM to make it 100 mhz. Then in order for
the DDR to work I need to use two more DCM's. One is used
to make the DDR clocks (positive and negative). The other is
used for everything else.

-Dave

Gabor

unread,

Sep 19, 2006, 4:58:05 PM9/19/06

to

David Ashley wrote:
[snip]

> But the open cores DDR doesn't make use of the DQS strobe generated
> by the DDR device itself. I'm only trying to run at 100 mhz. In that
> case xilinx app notes say the timing is adequate so the DQS strobe isn't
> needed to capture data reliably. Maybe the timing would get easier if
> the logic made use of the DQS strobe from the DDR.
>

I'm doing pretty much the same thing with Virtex 2 (similar
architecture
to Spartan 3) on a proprietary board. This board has a 66.66 MHz
clock that is doubled to run the DDR at 133 MHz (266 DDR). I
do not use the DQS inputs for sampling data. I did need to tweak
the delay in my DCM's to get reliable sampling. I did not use any
expensive test equipment for this, I just used the variable delay
mode of the DCM to run tests at various phases and centered
the final fixed value within the area that seemed to work.

At 100 MHz I would expect the timing margins to be quite good
even in the slowest speed grade parts. I'm using Virtex 2 -5
speed grade in my 133 MHz design.

> I have a feeling adding some constraints would make the thing work
> with a single DCM. Unfortunately I have no clue what constraints to
> add, as I don't know what's going wrong (and don't know much about
> constraints writing anyway).
>

The problem with a single DCM is that you need to make up
for phase differences in the board routing. Signals to the DDR
memory arrive there some prop. delay after they leave the FPGA.
At the memory end they need to meet setup and hold time to
the clock as it arrives at the memory, usually at the same
board routing delay as the clock. So if your clock and data/
address/control outputs use the same internal clock, you
would need to use board routing or some other delay element
external to the FPGA to ensure hold time is met at the memory.

Then the data returning from the memory shows up 2 board
prop. delays from the driven clock, plus the clock to output
timing specified in the memory datasheet. So the sampling
point isn't exactly centered within the outgoing clock half-
period. So your sampling clock may need to be off by some
phase other than 90 degrees from the clock driving your
outputs. All of this is pretty hard to accomplish with one
DCM, IMHO. And just adding timing constraints without the
mechanism to meet them makes life miserable on the tools,
which usually fail miserably in response (they have only
internal routing delays to make up your requested timing).

David Ashley

unread,

Sep 19, 2006, 5:20:43 PM9/19/06

to

David Ashley wrote:
> I will certainly share whatever I learn.

I got my simple write/ read-verify system to work. I was able
to get rid of one of the DCM's, so I only need 2.

DCM #1 takes 50 mhz input and I use the 2X output to
drive a clock buffer. This is the tclock signal. Feedback
comes from the clock buffer.

DCM #2 takes tclock and produces 4 phase output.
The 0 and 270 signals drive 2 clock buffers. The 0
clock buffered version goes back into the feedback input
on the DCM. These signals are sys_clk and sys_clk270.

FDDR's are used to produce the DDR's clock. Their inputs
are hardwired for "01" for the true clock, and "10" for
the negative clock. Both FDDR's take clock from
sys_clk and inverted sys_clk. The inverter is implicit
in the FDDR configuration, no delay penalty exists.

Here's the trick: The original open cores DDR controller
source sampled the data from the DDR on sys_clk rising
and falling edge. I instead push out the sampling by
1/4 of a cycle:
rising_edge(sys_clk) replaced by falling_edge(sys_clk270)
falling_edge(sys_clk) replaced by rising_edge(sys_clk270)

Then I made a slight tweak to get the sampled data back
into the sys_clk domain as required elsewhere. It works
fine. I had a feeling the problem was in the sampling side
since no special machinery existed to sample in the middle
of when it was valid. The setup time was not being met.

Here's a sample of the before code:
-- **** CODE BEFORE FIX
process (sys_clk)
begin
if rising_edge(sys_clk) then

-- sample HI-data word with rising edge
data_hi_q <= data;

-- store HI- und LO- data word in 32bit output register
data_out_q <= data_hi_q & data_lo2_q;

end if;
end process;
-- ...
process (sys_clk)
begin
if falling_edge(sys_clk) then

-- sample LO- word with falling edge
data_lo1_q <= data;

-- 1 clock additional delay to store HI- and LO-word
-- with the next rising edge as 32bit word
data_lo2_q <= data_lo1_q;
end if;
end process;

-- ***** CODE AFTER FIX

process (sys_clk270)
begin
if falling_edge(sys_clk270) then

-- sample HI-data word with rising edge
data_hi_q <= data;

end if;
end process;

process (sys_clk) -- (DA) fix to get back into sys_clk domain
begin
if rising_edge(sys_clk) then
-- store HI- und LO- data word in 32bit output register
data_out_q <= data_hi_q & data_lo2_q;
end if;
end process;

-- ...
process (sys_clk270)
begin
if rising_edge(sys_clk270) then

-- sample LO- word with falling edge
data_lo1_q <= data;

-- 1 clock additional delay to store HI- and LO-word
-- with the next rising edge as 32bit word
data_lo2_q <= data_lo1_q;
end if;
end process;

Hope this is of use to other people.

David Ashley

unread,

Sep 19, 2006, 5:42:15 PM9/19/06

to

Gabor wrote:
> David Ashley wrote:
> [snip]

> I'm doing pretty much the same thing with Virtex 2 (similar
> architecture
> to Spartan 3) on a proprietary board. This board has a 66.66 MHz
> clock that is doubled to run the DDR at 133 MHz (266 DDR). I
> do not use the DQS inputs for sampling data. I did need to tweak
> the delay in my DCM's to get reliable sampling. I did not use any
> expensive test equipment for this, I just used the variable delay
> mode of the DCM to run tests at various phases and centered
> the final fixed value within the area that seemed to work.

See other email in this thread for details. I got it working
by sampling data from the DDR on the 90 degree phase
clock, now it works fine. No tweaking of the DCM necessary.
And I'm only using one DCM.

The DDR's DQS output transitions right when the data
becomes valid out of the DDR. But the DDR controller
has to transition the DQS right in the middle of the data
going to the DDR being valid. This is hardly fair. I wish
there wasn't even the DQS signal, it's just a PITA.

David Ashley

unread,

Sep 19, 2006, 6:20:00 PM9/19/06

to

David Ashley wrote:
> Hope this is of use to other people.
> -Dave
>

I've gotten email asking for the source, so I put it up, it can
be found here:

http://www.xdr.com/dash/fpga/

It's targeted to a linux build environment. It needs unisim
to be in the right place in order to build as is...or tweak the
Makefile.

It's a pretty much identical copy of the open cores ddr
controller, except I removed one DCM, and I wrapped
it all in a synthesizable tester targeted to the
spartan-3e starter board. The test just fills up memory
with a non-repeating pattern, then reads it back out.
If the pattern matches an LED stays lit. It keeps doing
this forever.

Nico Coesel

unread,

Sep 20, 2006, 2:29:36 PM9/20/06

to

David Ashley <da...@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote:
>> All you need is a normal clock and a 90 degrees phase shifted clock.
>> The whole clocking outside the fpga thing is unnecessary. If you place
>> the output flipflops inside the IOBs and use an fddr in the IOB to
>> replicate the internal clock, all signals connected to the DDR memory
>> will have the same delay.
>
>But the DDR spec says the DQS strobe for data written to the
>fpga must be center aligned. The DQS is in phase with the
>DDR clock. That means the data must be put on the lines
>1/2 of 1/2 of a clock cycle early for proper alignment.
>
>This requires a clock that is 270 degrees out of phase from the
>DDR's clock. This is the clock used for the data lines going into
>the DDR..

Yes.

>I don't understand the "clocking outside the fpga" you mention.

As AFAIK the opencores ddr controller uses some sort of scheme which
routes the clock to the outside and pulls it back in again. This is
totally unnecessary IMHO.

>The fpga currently has one 50 mhz external clock source. I
>run that through a DCM to make it 100 mhz. Then in order for
>the DDR to work I need to use two more DCM's. One is used
>to make the DDR clocks (positive and negative). The other is
>used for everything else.

For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
get a clock which is 90 degrees out of phase. fddrs have an internal
inverter in their clock inputs so 1 clock to drive these is
sufficient.

Both DCMs can also be used to create a divided clock from each and
200MHz.

Nico Coesel

unread,

Sep 20, 2006, 3:10:23 PM9/20/06

to

"Gabor" <ga...@alacron.com> wrote:

All these problem go away if you drive the control signals at half the
DDR clock frequency. This is not going to cost performance since all
DDR commands need 2 clock cycles to execute anyway. The only signal
that needs to be fast is CS (which also happens to be the least loaded
line in a larger memory system).

Clocking data into the memory uses DQS which has the same delay as the
DQ lines (if your PCB layout is routed as it is supposed to be).

>Then the data returning from the memory shows up 2 board
>prop. delays from the driven clock, plus the clock to output
>timing specified in the memory datasheet. So the sampling
>point isn't exactly centered within the outgoing clock half-
>period. So your sampling clock may need to be off by some
>phase other than 90 degrees from the clock driving your
>outputs. All of this is pretty hard to accomplish with one
>DCM, IMHO. And just adding timing constraints without the
>mechanism to meet them makes life miserable on the tools,
>which usually fail miserably in response (they have only
>internal routing delays to make up your requested timing).

If you delay DQS by the IOBDELAY and use this signal to clock DQ
(without IOBDELAY) into the IOB flipflops, then setup and hold timing
should be met (with the proper constraints). But beware, there are
severe limits on how the IOBs must be arranged and you may need to
match the FPGA speed with the memory speed.

David Ashley

unread,

Sep 20, 2006, 3:25:44 PM9/20/06

to

Nico Coesel wrote:
> As AFAIK the opencores ddr controller uses some sort of scheme which
> routes the clock to the outside and pulls it back in again. This is
> totally unnecessary IMHO.

Yep you're right, that was the feedback line for one of the DCM's.
Current design works but has no feedback from the outside as
you suggest. Original design was right on the edge as regards
sampling the DDR's output data.

> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
> get a clock which is 90 degrees out of phase. fddrs have an internal
> inverter in their clock inputs so 1 clock to drive these is
> sufficient.

You're exactly right. Also DCM -> DCM seems to work ok, however
I'm ignoring the "locked" bit on the 50->100 DCM and the system
only pays attention to the locked bit on the 2nd DCM. This is
probably bad.

Gabor

unread,

Sep 21, 2006, 9:50:41 AM9/21/06

to

David Ashley wrote:

>
> You're exactly right. Also DCM -> DCM seems to work ok, however
> I'm ignoring the "locked" bit on the 50->100 DCM and the system
> only pays attention to the locked bit on the 2nd DCM. This is
> probably bad.
>
> -Dave
>
> --
> David Ashley http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

Beware of "locked" bits on the Xilinx DCM's. Once locked, they
tend to continue to report locked even if the input clock goes
away. You need to look at the "status" outputs to get the
whole picture, and note that you must reset the DCM if
you want it to attempt re-lock afer lock is lost.

In the older parts (Spartan 2) with DLL's, the 2x clock output
drives a 1x clock when the DLL is not locked. On those parts
I actually use this "feature" to detect lock rather than using
the "locked" output of the DLL (they have no status bus).

Regards,
Gabor

Austin Lesea

unread,

Sep 21, 2006, 11:13:59 AM9/21/06

to

Gabor,

In the DCM status, there is the "clock lost" bit. For the CLKFX, there
is also the "clock stopped" bit. The DCM is a digital synchronous state
machine, so loss of input clock means that the lock bit, which is a
state, will never change. These other two status bits are there to tell
you what happened (provide more information).

Good post,

Austin

Nico Coesel

unread,

Sep 21, 2006, 12:52:51 PM9/21/06

to

David Ashley <da...@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote:
>> As AFAIK the opencores ddr controller uses some sort of scheme which
>> routes the clock to the outside and pulls it back in again. This is
>> totally unnecessary IMHO.
>
>Yep you're right, that was the feedback line for one of the DCM's.
>Current design works but has no feedback from the outside as
>you suggest. Original design was right on the edge as regards
>sampling the DDR's output data.
>
>> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
>> get a clock which is 90 degrees out of phase. fddrs have an internal
>> inverter in their clock inputs so 1 clock to drive these is
>> sufficient.
>
>You're exactly right. Also DCM -> DCM seems to work ok, however
>I'm ignoring the "locked" bit on the 50->100 DCM and the system
>only pays attention to the locked bit on the 2nd DCM. This is
>probably bad.

By the way, there is a Spartan3 issue with daisy chaining DCMs. See
the other thread about 'product lifetime'. ISE 7.1 (dunno about the
other ISE versions) will warn you about this when routing the design.