Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DIFF_OUT buffer example

142 views
Skip to first unread message

Brian Davis

unread,
Feb 15, 2006, 10:12:53 PM2/15/06
to
Following up on John Providenza's question about the DIFF_OUT
buffer feature, I've put together a small example which builds a
complementary clock input buffer out of two normal IBUFGDS's.

Also for reference, I've copied my original notes about this handy
feature of the V2 & S3 families.

Brian

<from original post>

All the V2-ish differential input buffers have a complementary output
available, that can be used to create a 180 degree clock without
needing a DCM.

These can also be used just to invert a differential input without
needing any other logic (or board cuts & jumps).

Look at the DIFFS component in fpga_editor to see what's going on;
besides the normal 'phantom' route from the DIFFS to the DIFFM,
there's also a route from the DIFFM to a differential receiver in the
DIFFS that outputs the complement signal.

I first spotted these when they showed up in early versions of
XAPP622 as a hard macro.

Support & tool bugs for these have varied version to version,
see Answer Record 21958 for recent problems.

I've banged into various other problems in using them over the years;
if I get a chance this weekend, I'll try to dig up some old webcase
code showing how to create one out of two normal IBUF{G}DS's as
a work around.

These can be used on regular IOB inputs as well as global clock
inputs, but you've generally needed to LOC the global input buffer
and bufg's to allowed sites to get this to work.

search for
ibufgds_diff_out
ibufds_diff_out

<end original post>

<diff_out_test.vhd>
--
--
-- diff_out buffer example
--
-- shows how to create complementary internal clocks using
-- IBUF{G}DS's with neither a DCM nor local inversion required
--
-- forwards a global clock input to output, output/2
--
-- substitutes two ibuf{g}ds's for ibuf{g}ds_diff_out component;
-- various tool revs have choked when using attributes on those
--
-- intended for V2-ish family members
--
-- !!! example LOC constraints specific to XC3S200-FG256 !!!
--
-- COMPLETELY UNTESTED; SYNTHESIZED WITH 6.3 & EXAMINED IN FPGA_EDITOR
--
-- Input Clocking:
-- this example doesn't use the resulting clock for DDR inputs,
-- but best (or at least easier to analyze) DDR input timing may
-- result when using CLB registers rather than DDR IOB input regs
--

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library unisim;
use unisim.vcomponents.all;


entity diff_out_test is
port
(
global_clk_in_p : in std_logic;
global_clk_in_n : in std_logic;

gclk_out_p : out std_logic;
gclk_out_n : out std_logic;

gclk_div2_p : out std_logic;
gclk_div2_n : out std_logic

);

end diff_out_test;


architecture arch1 of diff_out_test is

--
-- signals
--
signal gclk_iob_p : std_logic;
signal gclk_iob_n : std_logic;

signal gclk : std_logic;

signal gclk_p : std_logic;
signal gclk_n : std_logic;

signal gclk_out : std_logic;
signal gclk_div2 : std_logic;

signal g_toggle : std_logic;
signal g_toggle_not : std_logic;

--
-- LOC the components
--
attribute LOC : string;

--
-- see Answer Record 17697 for script to find local clock sites
-- (complementary clocks are trickier with local routing resources)
--

--
-- inputs have an IBUFDS comp in both DIFFM and DIFFS pin sites
-- to create the diff_out buffer variant
--
attribute LOC of GC1P : label is "D9";
attribute LOC of GC1N : label is "C9";

--
-- outputs need to LOC one OBUFDS to the P pin site
--
attribute LOC of FGC1 : label is "E10";

attribute LOC of FGC2 : label is "C12";

--
-- Note : to use one BUFG for IOB DDR logic clocking,
-- and another for internal logic clocking, intentionally
-- LOC the BUFG's to the sites having the shared routing
-- resources to the IBUFGDS ( may produce spurious ISE warning )
--
-- To locate the shared routing resources, see:
-- Answer Record 19947 for S3
-- Answer Record 11756 for V2
--
attribute LOC of GCLK_BUFGMUX_LOGIC : label is "BUFGMUX4";
attribute LOC of GCLK_BUFGMUX_P : label is "BUFGMUX5";
attribute LOC of GCLK_BUFGMUX_N : label is "BUFGMUX6";


begin

--
-- global clock input
-- two IBUFDS's with input nets swapped
-- in place of IBUFGDS_DIFF_OUT
--
GC1P : IBUFDS_LVDS_25
port map
(
I => global_clk_in_p,
IB => global_clk_in_n,
O => gclk_iob_p
);

GC1N : IBUFDS_LVDS_25
port map
(
I => global_clk_in_n,
IB => global_clk_in_p,
O => gclk_iob_n
);

--
-- note that the use of shared routing resources to two
-- BUFGMUX's from the same internal clock source requires
-- both LOCs and manual selection of the proper I0/I1 input
-- ( see comments near LOCs above )
--
GCLK_BUFGMUX_LOGIC : BUFGMUX
port map
(
S => '1',

I0 => open,
I1 => gclk_iob_p,

O => gclk
);

GCLK_BUFGMUX_P : BUFGMUX
port map
(
S => '0',

I0 => gclk_iob_p,
I1 => open,

O => gclk_p
);

GCLK_BUFGMUX_N : BUFGMUX
port map
(
S => '0',

I0 => gclk_iob_n,
I1 => open,

O => gclk_n
);

--
-- clock divider
--
gdiv2 : fd
port map
(
C => gclk,
D => g_toggle_not,
Q => g_toggle
);

g_toggle_not <= NOT g_toggle;

--
-- DDR output flops
--
GCLK_FWD : FDDRRSE
port map
(
Q => gclk_out,

C0 => gclk_p,
C1 => gclk_n,

CE => '1',
R => '0',
S => '0',

D0 => '1',
D1 => '0'
);

GCLK_DIV2_FWD : FDDRRSE
port map
(
Q => gclk_div2,

C0 => gclk_p,
C1 => gclk_n,

CE => '1',
R => '0',
S => '0',

D0 => g_toggle,
D1 => g_toggle
);

--
-- Output buffers
--
FGC1 : OBUFDS_LVDS_25
port map
(
I => gclk_out,

O => gclk_out_p,
OB => gclk_out_n
);

FGC2 : OBUFDS_LVDS_25
port map
(
I => gclk_div2,

O => gclk_div2_p,
OB => gclk_div2_n
);

end arch1;

Symon

unread,
Feb 16, 2006, 5:48:35 AM2/16/06
to
"Brian Davis" <brim...@aol.com> wrote in message
news:1140059573.9...@g44g2000cwa.googlegroups.com...

> Following up on John Providenza's question about the DIFF_OUT
> buffer feature, I've put together a small example which builds a
> complementary clock input buffer out of two normal IBUFGDS's.
>
> Also for reference, I've copied my original notes about this handy
> feature of the V2 & S3 families.
>
> Brian
>
Hi Brian,
I'm struggling a little to see why I'd require a complementary clock. The
DDR output IOBs have inversion control on both clock inputs, so why not just
connect the normal clock to both pins and invert the appropriate one? Are
you saying that a local inversion affects the skew? I have seen big clock
nets' mark/space get affected by a lot of loads, is this the problem you're
addressing? IFAIK, all the clocked resources have programmable inversion so
what am I missing?
Cheers, Syms.


Brian Davis

unread,
Feb 16, 2006, 7:30:44 AM2/16/06
to
Symon wrote:
>
> I'm struggling a little to see why I'd require a complementary clock. The
> DDR output IOBs have inversion control on both clock inputs, so why not just
> connect the normal clock to both pins and invert the appropriate one? Are
> you saying that a local inversion affects the skew?
>
Exactly, the local inversion feature introduces quite a bit of skew,
which can be avoided by using complementary internal clocks.
(excluding from this discussion V4 with internal diff clock nets)

The DIFF_OUT feature lets you get a low jitter complementary
DDR clock on-board without needing a DCM (with its inherent jitter)

It also can be used to invert a differential input right at the pad,
which I didn't show in the example, but that same input net swap
trick works with IBUFDS buffers too.

From my past measurements of internal clocks (using clock forwarding)
it looks like the internal clock net rise/fall is quite asymmetric; so,
it's best to use the same edge sense of complementary clock phases.

If you use a DCM with duty cycle correction, it pre-skews the driver
so that both the threshold crossings line up again near 50%, but now
you're stuck with the DCM jitter and other baggage ( and at higher
input frequencies, eventually the duty cycle correction makes the
clock pulse sallying forth from the driver extremely narrow )

>
> I have seen big clock nets' mark/space get affected by a lot of loads,
> is this the problem you're addressing?
>

Yes, that too; the other trick shown in the example is how to keep
the two IOB DDR clock nets identically loaded by splitting the internal
logic clock loads out onto another BUFG net.

Brian

Tim

unread,
Feb 16, 2006, 7:49:51 AM2/16/06
to
Symon wrote

> I'm struggling a little to see why I'd require a complementary clock. The
> DDR output IOBs have inversion control on both clock inputs, so why not
> just connect the normal clock to both pins and invert the appropriate one?
> Are you saying that a local inversion affects the skew?

from XAPP462, page 37:

The CLKx clock signal precisely triggers the DDR flip-flop's C0 input at the
start of the clock period. Similarly, the CLKx180 clock signal precisely
triggers the DDR flip-flop's C1 input halfway through the clock period. The
cost of this approach is an additional global buffer and global clock line,
but it potentially reduces the potential duty-cycle distortion by
approximately 300 ps..


Symon

unread,
Feb 16, 2006, 7:56:56 AM2/16/06
to
"Brian Davis" <brim...@aol.com> wrote in message
news:1140093044.4...@z14g2000cwz.googlegroups.com...

> Exactly, the local inversion feature introduces quite a bit of skew,
> which can be avoided by using complementary internal clocks.
> (excluding from this discussion V4 with internal diff clock nets)
>
> The DIFF_OUT feature lets you get a low jitter complementary
> DDR clock on-board without needing a DCM (with its inherent jitter)
>
> It also can be used to invert a differential input right at the pad,
> which I didn't show in the example, but that same input net swap
> trick works with IBUFDS buffers too.
>
> From my past measurements of internal clocks (using clock forwarding)
> it looks like the internal clock net rise/fall is quite asymmetric; so,
> it's best to use the same edge sense of complementary clock phases.
>
> If you use a DCM with duty cycle correction, it pre-skews the driver
> so that both the threshold crossings line up again near 50%, but now
> you're stuck with the DCM jitter and other baggage ( and at higher
> input frequencies, eventually the duty cycle correction makes the
> clock pulse sallying forth from the driver extremely narrow )
>
>>
>> I have seen big clock nets' mark/space get affected by a lot of loads,
>> is this the problem you're addressing?
>>
> Yes, that too; the other trick shown in the example is how to keep
> the two IOB DDR clock nets identically loaded by splitting the internal
> logic clock loads out onto another BUFG net.
>
> Brian
>
Thanks Brian, that makes sense. I've seen similar clock skew effects before,
but never bad enough yet to need separate clocks. If I do, I'll remember
your neat solution! In fact, I've just remembered something that I had to
fix with a DCM doubler, I'll try this on it when I get time.
BTW, as it comes for free, I guess it's a complimentary complementary clock!
I'll get me coat...
Cheers, Syms.


Symon

unread,
Feb 16, 2006, 8:03:14 AM2/16/06
to

"Tim" <t...@rockylogiccom.noooospam.com> wrote in message
news:dt1sdk$pll$1$8300...@news.demon.co.uk...

> from XAPP462, page 37:
>
> The CLKx clock signal precisely triggers the DDR flip-flop's C0 input at
> the start of the clock period. Similarly, the CLKx180 clock signal
> precisely triggers the DDR flip-flop's C1 input halfway through the clock
> period. The cost of this approach is an additional global buffer and
> global clock line, but it potentially reduces the potential duty-cycle
> distortion by approximately 300 ps..
>
Hi Tim,
OK, I guess that's why I've not had problems with using just one clock. Even
at >600Mbps I've got enough slack in my timing budget to cope with 300ps.
Thanks for the reference!
Cheers, Syms.


Brian Davis

unread,
Feb 16, 2006, 9:18:22 AM2/16/06
to
Tim wrote:
>
> from XAPP462, page 37:
><snip>

>The cost of this approach is an additional global buffer and global
> clock line, but it potentially reduces the potential duty-cycle distortion
> by approximately 300 ps..
>
Thanks for pointing out that link.

One caution on XAPP462 v1.1 : the novice at Xilinx who wrote the
"Skew Adjustment" section (pp 32-34) got the descriptions and figures
completely backwards, and confused the terms 'skew' and 'delay'.

Pages 4-5 of XAPP259 give a much better description of the delay
element.

------

DESKEW_ADJUST = SYSTEM_SYNCHRONOUS :

Inserts a delay into the DCM FEEDBACK path, which makes the
output clock happen EARLIER. ( not later, as depicted in XAPP462 )

This increases setup, guarantees zero hold, and adds a temp
and VCCAUX affected delay element into the DCM deskew path.

DESKEW_ADJUST = SOURCE_SYNCHRONOUS :

Removes delay element from the DCM FEEDBACK path, which makes
the output clock happen LATER. ( not earlier, as depicted in XAPP462 )

This reduces setup time, increases hold time, but results in a
smaller overall input sampling window.

------

For DDR input applications, or for cascaded DCM's, you generally
want to be in SOURCE_SYNCHRONOUS mode (the latest few
revisions may do that automatically for DCM cascades)

See also Answer Records 12406, 18079

Brian

Hal Murray

unread,
Feb 21, 2006, 5:07:36 AM2/21/06
to
> Yes, that too; the other trick shown in the example is how to keep
>the two IOB DDR clock nets identically loaded by splitting the internal
>logic clock loads out onto another BUFG net.

Does this run into skew problems between the main clock and the IOB clock?

--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.

Andy

unread,
Feb 21, 2006, 1:30:58 PM2/21/06
to
Bryan,

I posted a question about this technique in a response to a separate
thread (multiphase data extraction question).

I'm using this to gain access to the IDDRs associated with both pads of
a diff pair, so I can sample the input on four phases of a clock, with
very low skews in the data paths.

Do you recommend separate ibufds primitives, or a single
ibufds_diff_out primitive?

Andy Jones

Brian Davis

unread,
Feb 21, 2006, 7:21:02 PM2/21/06
to
Andy wrote:
>
> Do you recommend separate ibufds primitives, or a single
> ibufds_diff_out primitive?
>

The only reason I started using two IBUFDS's instead of one
IBUFDS_DIFF_OUT was to avoid various tool bugs that dropped
placement, I/O standard, and termination attributes when applied
to the IBUFDS_DIFF_OUT components.

The IBUFDS_DIFF_OUT is really just two IBUFDS's in disguise
for V2/S3, but I haven't looked at the V4 implementation.

Brian

Brian Davis

unread,
Feb 21, 2006, 7:43:42 PM2/21/06
to
Hal wrote:
>
>Does this run into skew problems between the main clock and the IOB clock?
>
The output DDR nets traverse loaded->unloaded, which shouldn't
be a problem ( except for the usual caveat about perhaps clocking
the falling edge data with a falling edge clock ahead of the IOB ).

DDR inputs traverse unloaded->loaded, which might require
opposite edge or 90/270 phasing.

IIRC, for fast V2 DDR inputs I used two differential local clock
inputs ( to work around limited local clock routing resources ),
DDR registers implemented in CLBs ( published IOB timing at the
time was obsfucated by the inclusion of DCM jitter in IOB setup/hold
numbers), and a global clock input driving a DCM to generate 90/270
phases to help reclock the two-wide data path phases into the
global clock domain.

maybe I should have used input latches instead :)

Brian

Sandro

unread,
Feb 22, 2006, 5:29:47 AM2/22/06
to
Hi,

Maybe you all know... if not... take a look to
<Path_Were_is_Xilinx>/Xilinx/vhdl/src/unisims/unisim_VITAL.vhd
there are the vhdl VITAL source code for unisim library used for
simulation purpose.

there are DIFF_OUT, IDDR... almost all

Sandro

0 new messages