Virtex4 running at 360Mhz DDR

fastgr...@yahoo.com

unread,

May 10, 2005, 2:48:57 PM5/10/05

to

I'm about to use Virtex 4, and wonder if this is achievable. All
literature seems to indicate that it is, but I'd like hear what others
think and perhaps point out where I need to be careful in the design.

I'd be receiving an LVDS clock pair @ 360Mhz, running part of the
internal logic at 360. This internal logic includes DSP48 slices (but
need to be pipelined in the fabric since I need more than 48-bit 'C'
input for adder). Preliminary testing indicates that it can go above
360 with light user intervention. One thing I'm cautious about is, the
rest of logic runs much slower, at 90Mhz. Initially was thinking of
using /4 version, but Peter Alfke's post regarding added skews due to
loading differences in DCM outputs is making me think about it
carefully.

For otuput, I'd be using ODDR to multiplex 360 Mhz logic, to send the
data out at 360Mhz DDR (so the data can look like 360Mhz 'clock').
Data is LVDS, so is the forwarded LVDS clock pair @ 360Mhz. The
receiving device will use both edges of the forwarded 360 Mhz clock to
sample the data. Clock to output delay is not good, 3+ ns, but since
the clock will be forwarded and will incur effectively the same delay
as data (other than IOB-IOB clk skew), as long as I send out 180 deg
version of internal 360 clock using ODDR, it should be ok. Not sure
what kind of SI issue there will be, however.

I have an option of running it at 180Mhz if 360 is risky. External
device will be different. Am I playing too safe by going to 180? Will
360 be a challenge?

I'd appreciate feedback.

Symon

unread,

May 10, 2005, 3:02:31 PM5/10/05

to

<fastgr...@yahoo.com> wrote in message
news:1115750937....@f14g2000cwb.googlegroups.com...

> I'm about to use Virtex 4, and wonder if this is achievable. All
> literature seems to indicate that it is, but I'd like hear what others
> think and perhaps point out where I need to be careful in the design.
>
> I'd be receiving an LVDS clock pair @ 360Mhz, running part of the
> internal logic at 360. This internal logic includes DSP48 slices (but
> need to be pipelined in the fabric since I need more than 48-bit 'C'
> input for adder). Preliminary testing indicates that it can go above
> 360 with light user intervention. One thing I'm cautious about is, the
> rest of logic runs much slower, at 90Mhz. Initially was thinking of
> using /4 version, but Peter Alfke's post regarding added skews due to
> loading differences in DCM outputs is making me think about it
> carefully.
>

Clock everything at the higher rate, use a clock enable for the /4. IIRC V4
can use a global clock net as an enable net. Well done Xilinx!

>
> For otuput, I'd be using ODDR to multiplex 360 Mhz logic, to send the
> data out at 360Mhz DDR (so the data can look like 360Mhz 'clock').
> Data is LVDS, so is the forwarded LVDS clock pair @ 360Mhz. The
> receiving device will use both edges of the forwarded 360 Mhz clock to
> sample the data. Clock to output delay is not good, 3+ ns, but since
> the clock will be forwarded and will incur effectively the same delay
> as data (other than IOB-IOB clk skew), as long as I send out 180 deg
> version of internal 360 clock using ODDR, it should be ok. Not sure
> what kind of SI issue there will be, however.
>
> I have an option of running it at 180Mhz if 360 is risky. External
> device will be different. Am I playing too safe by going to 180? Will
> 360 be a challenge?
>

It's certainly within the realms of possibility. I do stuff like this at
clocks >300MHz in V2PRO, with >600 Mbit outputs. So, gamble! You'll learn
enough to get another job if it goes bad! ;-)
Cheers, Syms.

fastgr...@yahoo.com

unread,

May 10, 2005, 3:30:54 PM5/10/05

to

Thanks for the reply.

Questions about the clock enable :

- Is there an easy way to specify it as clock enable, so the tool knows
about it for timing analysis, and so you don't have to specify
multi-cycle constraints for gobs of FFs?

- And how do you make the enable signal go on the global clock net?

Symon

unread,

May 10, 2005, 4:26:43 PM5/10/05

to

<fastgr...@yahoo.com> wrote in message
news:1115753454....@f14g2000cwb.googlegroups.com...

> Thanks for the reply.
>
> Questions about the clock enable :
>
> - Is there an easy way to specify it as clock enable, so the tool knows
> about it for timing analysis, and so you don't have to specify
> multi-cycle constraints for gobs of FFs?
>

In Synplify there's an attribute, syn_direct_enable . Check out the
reference manual. I imagine other synthesis tools provide something similar.
In the UCF you then do something like:-

NET "ENABLE_NET_NAME" TNM=FFS "ENABLED_FFS";
TIMESPEC TS1000 = FROM : ENABLED_FFS : TO : ENABLED_FFS : 11.1ns;

>
> - And how do you make the enable signal go on the global clock net?
>

You ask someone from Xilinx! I've not yet started my V4 design. I just
remembered that from the marketing spiel we had.
Cheers, Syms.

John_H

unread,

May 10, 2005, 6:26:19 PM5/10/05

to

Look at
http://www.xilinx.com/products/virtex4/capabilities/selectio.htm#chipsync
and

http://www.xilinx.com/products/design_resources/conn_central/resource/ssio_resources.htm

for discussions on the 1 Gbit/s/pin Virtex-4 differential I/O and you may
get a better feel for the margins you can get in your design.

If all your LVDS are at the same clock rate, the DCM skews shouldn't be a
problem and could easily be registered back to the slower domain if the
skews were an issue.

Xilinx went to great lengths to make the Virtex-4 I/O capable of some pretty
high speeds without taxing the designer.

<fastgr...@yahoo.com> wrote in message
news:1115750937....@f14g2000cwb.googlegroups.com...

Peter Alfke

unread,

May 10, 2005, 6:33:21 PM5/10/05

to

"fastgreen2", your design looks just right for the Virtex-4
capabilities.
Keep the pc-board data skew reasonable. You can use a DCM to divide the
clock, or you can go with the CE scheme, whatever you prefer. Should
work with a reasonable amount of attention to detail.
Peter Alfke, Xilinx Applications (I passed this by several experts...)

Symon

unread,

May 10, 2005, 7:10:03 PM5/10/05

to

"John_H" <johnha...@mail.com> wrote in message
news:fyage.14$p%5....@news-west.eli.net...

> Look at
> http://www.xilinx.com/products/virtex4/capabilities/selectio.htm#chipsync
> and
>
>
http://www.xilinx.com/products/design_resources/conn_central/resource/ssio_resources.htm
>
> for discussions on the 1 Gbit/s/pin Virtex-4 differential I/O and you may
> get a better feel for the margins you can get in your design.
>
> If all your LVDS are at the same clock rate, the DCM skews shouldn't be a
> problem and could easily be registered back to the slower domain if the
> skews were an issue.
>
> Xilinx went to great lengths to make the Virtex-4 I/O capable of some
pretty
> high speeds without taxing the designer.
>
>

You might also like to look at this link.
http://www.altera.com/products/devices/stratix2/utilities/st2-signal_integrity.html?f=tchio&k=g3

Altera claims their parts have a better LVDS 'eye' because they have
superior (i.e. less) pin capacitance. The capacitance gives a lot of ISI.
I've been bitten by Xilinx FPGA pin capacitance before, albeit in a slightly
different situation. I guess it's the inevitable consequence of trying to
fit every possible I/O standard onto every pin. If I'm reading the page
properly, Altera mitigate this by having different sets of pins which are
capable of different I/O standards. Separate 'Rocket I/Os' also solve this
problem.
Of course, the OP's data rate is significantly lower than the rate shown in
the link, so should be no problem! ;-)
Cheers, Syms.

John_H

unread,

May 10, 2005, 8:03:58 PM5/10/05

to

"Symon" <symon_...@hotmail.com> wrote in message
news:42813...@x-privat.org...

<snip>

> You might also like to look at this link.
>
http://www.altera.com/products/devices/stratix2/utilities/st2-signal_integrity.html?f=tchio&k=g3
>
> Altera claims their parts have a better LVDS 'eye' because they have
> superior (i.e. less) pin capacitance. The capacitance gives a lot of ISI.

<snip>

But doesn't the Altera information use external LVDS terminations and
monitor external to the part rather than internal terminations and the eye
seen by the receiver? By looking outside the package that has an embedded
differential termination, isn't the data skewed?

Symon

unread,

May 10, 2005, 8:07:10 PM5/10/05

to

"Symon" <symon_...@hotmail.com> wrote in message

news:4281198e$1...@x-privat.org...

> >
> > - And how do you make the enable signal go on the global clock net?
> >
> You ask someone from Xilinx! I've not yet started my V4 design. I just
> remembered that from the marketing spiel we had.
> Cheers, Syms.
>

Hmmm, I might have given you a bum steer there. I just looked at the FPGA
editor view of V4 and it seems there's NOT a path from the GBUFs to the CLB
CE. You can control a CE pin on the GBUF, but that's about as useful as a
chocolate teapot in this case.
Sorry about that, Syms.

Symon

unread,

May 10, 2005, 8:15:25 PM5/10/05

to

"John_H" <johnha...@mail.com> wrote in message

news:OZbge.15$p%5....@news-west.eli.net...

I don't think so John. I think the whole white paper is based on a
simulation using the published Xilinx IBIS files. The terminations are
simulated as being on-chip. The Figure 1. in this document:-
http://www.altera.com/literature/wp/signal-integrity_s2-v4.pdf
is in the mind of an IBIS simulator, I guess. So, there are no physical
measurements.
Their data doesn't surprise me; 12.5pF of pin capacitance will really screw
with inter-symbol interference at Gbit rates.
Best, Syms.

austin

unread,

May 10, 2005, 10:02:19 PM5/10/05

to

Symon,

I am suprised at you. Their white paper clearly shows the simualtion is
done with the external termination, and not the internal one.

Use the internal one, and the capacitance does not matter (do the sim
yourself if you do not believe me).

Of course, you really should use the LVDS where it is specified. If you
want 1.3 Gbs, use our MGTs .... oh, I forgot, then do not have 2-GX
parts ....

The fact that their LVDS works up to 1.3 Gbs in simulation is nice, but
can it be used in a real application on a real board?

We have our ML450 Network Board for V4 to demonstrate 1 Gbs DDR
interfaces, and it works just fine. Ask your FAE for a demo, or go
online and buy the board.

Austin

Brian Davis

unread,

May 11, 2005, 12:25:03 AM5/11/05

to

Austin,

>
> Use the internal one, and the capacitance does not matter
> (do the sim yourself if you do not believe me).
>

Quoting Symon's last response [1] to such a misleading,
incomplete, and inaccurate claim:
>>
>>Total bollocks
>>

And, just recently, some other Austin from Xilinx wrote [2]:
>>
>>Placing the cap at the receiver is really bad from a signal
>>integrity standpoint: it makes for a huge reflection
>>

Exactly the point I was trying to make a couple years back,
with which you disagreed so virulently. Rather than repeat my
explanation of why high input C is bad, just re-read [3].

[ I agree that if the cap is right at the receiver in a
point-point connection, all it will see is the filtered edge on
the initial transition; but ignoring the aftereffects of that
massive reflection is foolhardy. ]

>
> We have our ML450 Network Board for V4 to demonstrate 1 Gbs DDR
> interfaces, and it works just fine.
>

The last time you said that, I asked [4]:

>> Where in Xilinx's V4 documentation might one find these pictures
>>and eye diagrams, including real world vs. simulated waveforms at
>>the driver, receiver, and points in between ?

Also from that thread, I suggested some other measurements
to make on your "A vs. X" test platform:

>> Since you have that spiffy board at hand, I'd love to see
>> plots of the following:
>>
>> A) X vs. A ICCO for the "Hammer Test" at several toggle rates
>>
>> B1) X vs. A waveforms for a high speed single ended standard (xSTL)

>> B2) X vs. A ICCO for a high speed single ended standard (xSTL)
>>
>> C1) X vs. A waveforms for 1 Gbps differential LVDS
>> C2) X vs. A ICCO for 1 Gbps differential LVDS
>>
>> D) X vs. A differential TDR input waveforms into a DT termination
>> at 100, 200, 500 ps input edge rates

When might we see some published data on those, particularly items C &
D?

Brian

[1]
http://groups-beta.google.com/group/comp.arch.fpga/msg/a6252d7a3566ea3f?hl=en
[2]
http://groups-beta.google.com/group/comp.arch.fpga/msg/7ef4d001e4d8ff65?hl=en
[3]
http://groups-beta.google.com/group/comp.arch.fpga/msg/a044806f313848e6?hl=en
[4]
http://groups-beta.google.com/group/comp.arch.fpga/msg/3619e923a589ef59?hl=en

Symon

unread,

May 11, 2005, 1:46:14 AM5/11/05

to

"austin" <aus...@xilinx.com> wrote in message
news:d5rp06$80...@cliff.xsj.xilinx.com...

> Symon,
>
> I am suprised at you. Their white paper clearly shows the simualtion is
> done with the external termination, and not the internal one.
>

Hmmm, so why do they say in Table 1. "Virtex-4 IBIS Models do Not Have
Package Information"? How can they simulate the V4 package parasitics
without this? Are you saying they're up to no good? It'd be interesting to
see the Xilinx simulation from V4 LVDS output to V4 LVDS input.

>
> Use the internal one, and the capacitance does not matter (do the sim
> yourself if you do not believe me).
>

I disagree. The Cpin of course limits the rise time at the pin. But also,
and maybe worse, the capacitor at the input reflects a whole bunch of energy
back down the T-line. The bigger the cap, the more energy is reflected. Some
of this energy comes back again to the receiver after hitting the Cpin at
the Tx end of the T-line. This causes inter-symbol interference. (Were you
running your sim with a perfectly source terminated transmitter?)

>
> Of course, you really should use the LVDS where it is specified. If you
> want 1.3 Gbs, use our MGTs .... oh, I forgot, then do not have 2-GX parts
> ....
>

Stop changing the subject! You should be a politician! ;-) I did say your
Rocket I/Os were a solution! And I bet they don't have 12.5pF of
capacitance.

>
> The fact that their LVDS works up to 1.3 Gbs in simulation is nice, but
> can it be used in a real application on a real board?
>

I'm pretty sure you're agreed IBIS simulations are a good idea. Works in
simulation, should work in real life, right?

>
> We have our ML450 Network Board for V4 to demonstrate 1 Gbs DDR
> interfaces, and it works just fine. Ask your FAE for a demo, or go online
> and buy the board.
>
> Austin
>

Yep, Xilinx make great parts. They'd be even better with less Cpin though...
Bloody customers want it all! ;-)
Cheers, Syms.

Austin Lesea

unread,

May 11, 2005, 11:44:35 AM5/11/05

to

Brian,

Get the ML450 board, or ask for the documentation.

http://tinyurl.com/b2dmo

Lots of scope shots are available (ask your FAE).

or, http://www.xilinx.com/publications/prod_mktg/pn0010778.pdf (page 2)

Or, go to one of our RocketLabs and measure it for yourself.

As for the reflection, the LVDS transmitter is also a 100 ohm
termination, so reflections are absorbed at the transmitter (when the
LVDS is properly done and meets the specifications, which ours do).

Anyone with an IBIS simulator can see all of the above happening, so I
really don't want to take this any further - demanding to see scope
shots of things is pretty pointless when the simulations are perfectly
good (when they are done correctly).

But, I am sure our Marketing Folks will be rolling our scope shots as
part of pitch-packs, etc. for those who are unable or unwilling to do
the SI engineering that their job requires of them.

Austin

Ajay Roopchansingh

unread,

May 11, 2005, 11:46:28 AM5/11/05

to

No you didn't bum steer... you were right initially. CE nets can be put
onto a global clock network. Look at the CLB switch box in FPGA Editor
again... each CE pin can be driven by a bounce pip (4 stubs in the
middle right edge of the switch box), and these 4 bounces can all be
driven by the GLK pips on the lower left edge of the switch box.
There's your path.

Cheers,
Ajay Roopchansingh
Xilinx Inc

Symon

unread,

May 11, 2005, 12:59:05 PM5/11/05

to

"Austin Lesea" <aus...@xilinx.com> wrote in message
news:d5t993$30...@cliff.xsj.xilinx.com...

>
> As for the reflection, the LVDS transmitter is also a 100 ohm
> termination, so reflections are absorbed at the transmitter (when the
> LVDS is properly done and meets the specifications, which ours do).
>

No. Not if this transmitter is a Xilinx FPGA with 12.5pF of parasitic
capacitance. The high frequencies see a lower impedance, and so stuff
relects back out of the transmitter, exactly the same as at the receiver.
This is the point I'm trying to get you to understand. I tell you what, why
don't you call that nice Dr. Howard Johnson and ask him?

Here's a quote from National's LVDS manual.
"In a good design the connector contributes 2 pF to 3 pF, the trace
contributes 2 pF to 3 pF, and the device contributes 4 pF to 5 pF.The total
load in such a design is around 10 pF. The flexibility of programmable
devices comes at the cost of capacitance. National Bus LVDS products have an
I/O capacitance of 5 pF. The I/O capacitance of a programmable device is
approximately double or 10 pF. This increase in capacitance will lower the
loaded bus impedance, thereby reducing the available noise margin and
lowering the reliability of operation in the design."
http://www.national.com/appinfo/lvds/files/ownersmanual.pdf

>
> Anyone with an IBIS simulator can see all of the above happening, so I
> really don't want to take this any further - demanding to see scope
> shots of things is pretty pointless when the simulations are perfectly
> good (when they are done correctly).
>

I don't want to see scope shots, I agree I want to see a simulation 'done
correctly'. I think I already have on Altera's website.
Best regards, Syms.

Symon

unread,

May 11, 2005, 1:05:42 PM5/11/05

to

"Ajay Roopchansingh" <ajaytr@donotspam_xilinx.com> wrote in message
news:d5t9ck$30...@cliff.xsj.xilinx.com...

>
> No you didn't bum steer... you were right initially. CE nets can be put
> onto a global clock network. Look at the CLB switch box in FPGA Editor
> again... each CE pin can be driven by a bounce pip (4 stubs in the
> middle right edge of the switch box), and these 4 bounces can all be
> driven by the GLK pips on the lower left edge of the switch box.
> There's your path.
>

Doh,
Thanks Ajay. I see it now. Sneaky!
Cheers, Syms.

Philip Freidin

unread,

May 11, 2005, 1:05:41 PM5/11/05

to

On Tue, 10 May 2005 19:02:19 -0700, austin <aus...@xilinx.com> wrote:
>The fact that their LVDS works up to 1.3 Gbs in simulation is nice, but
>can it be used in a real application on a real board?

Make up your mind Austin. On numerous occasions you have recommended
that people run simulation of I/O systems to see what should happen,
and you have recommened the IBIS models. To suggest that Altera does
not know how to run simulations is insulting. Enough!

Philip Freidin

===================
Philip Freidin
philip....@fpga-faq.org
Host for WWW.FPGA-FAQ.ORG

Austin Lesea

unread,

May 11, 2005, 1:12:10 PM5/11/05

to

Symon,

What is 12.5pF in series with 12.5pF?

Yes, that is right, 6.25pF differential load, not 12.5pF.

Falling for the A FUD is especially embarrassing when you just repeat
things which are factually incorrect.

All these things are taken into account from the simulation.

Austin

Austin Lesea

unread,

May 11, 2005, 1:35:59 PM5/11/05

to

Philip,

1. They ignored the top comment lines of the IBIS model which instructs
them how to model the package (since package modeling is incorrect and
wrong in IBIS 3.2).

2. They used an external resistor instead of the internal termination.

Run it right, or not at all.

Austin

Symon

unread,

May 11, 2005, 4:36:18 PM5/11/05

to

"Austin Lesea" <aus...@xilinx.com> wrote in message

news:d5teda$2v...@cliff.xsj.xilinx.com...

OK, I'm not sure where that came from, but let me explain. Each pin of the
pair is driven by a 50 ohm line. Taken together, these two 50 ohm lines make
a 100 ohm differential pair. At one single pin you've got a 12.5pF capacitor
being driven from 50 ohms. If you view the pair together, you've got 6.25pF
driven by 100 ohms. The return loss is the same in both cases. As is the
rise time (RC = 625ps). So, Altera's 6.1pF per pin turns into 3.05pF when
viewed as the termination to a pair. In the 1Gbit region, their return loss
is much better. Their rise time is twice as fast. (RC=305ps).
Now, pay attention. I can make your c(r)appy LVDS work better. Given the
problem is that you've got a large Cpin because (I guess) of all the other
attached goodies, a way to improve things is to drive this capacitance with
a lower impedance. For an FPGA to FPGA connection, use a 50 Ohm differential
transmission line instead of 100 Ohms. If you place an extra external 100
Ohms differential termination resistor at _BOTH_ end of this t-line, you
have a pretty well matched 50 Ohm connection. The pole caused by the
parasitic capacitance has moved out to double the frequency it was at. Of
course, the signal amplitude has halved, which tends to make the eye close
vertically, but it will open horizontally, which is the limiting factor in
this case. Maybe LVDS_EXT would be a good idea too.
There are also other methods to open the eye for specific bitrates if the
t-line characteristics and parasitics of the parts are known. You need to
use an application specific filter at the end of the t-line.
HTH, Syms.

Austin Lesea

unread,

May 11, 2005, 5:12:36 PM5/11/05

to

Symon,

All true.

I would suggest that you should have more of a differential line, than
two single ended 50 ohm lines, but it doesn't change anything at all
(you still end up being differentially terminated at the receiver, with
6.25pF across 100 ohms).

The eye is plenty good for up to 1 Gbs (see the ML450).

It does not work up to 1.3 Gbs, because we didn't design it to work up
to there: that is what the MGTs are for.

If there is a 'beauty contest' for the 'best LVDS eye pattern', I will
admit we come in second (due to increased Cpin), but I will not admit
that it matters so far as use, function, or anything important is
concerned. The Idly feature that allows for independent skew adjustment
for each IO pin (pair) to center the eye sampling point to within
+/-78ps is a far more useful feature than having 'pretty eyes'.

Austin

Brian Davis

unread,

May 11, 2005, 9:41:29 PM5/11/05

to

Symon wrote:
>
>Taken together, these two 50 ohm lines make a 100 ohm
>differential pair. At one single pin you've got a
>12.5pF capacitor being driven from 50 ohms. If you view
>the pair together, you've got 6.25pF driven by 100 ohms.
>The return loss is the same in both cases. As is the rise
>time (RC = 625ps). So, Altera's 6.1pF per pin turns into
>3.05pF when viewed as the termination to a pair.
>

nice explanation...

Best wishes on getting Austin to stop with his
"but it's really half, differentially" handwaving.

I've tried before, with results similar to that
"but it goes to Eleven" bit from "Spinal Tap".

>
>a way to improve things is to drive this capacitance
>with a lower impedance
>

Also, when you've got plenty of drive margin, a differential
attenuator ahead of the FPGA (with internal termination) works
nicely to attenuate the reflection, and also makes for a convenient
differential probe point. If you have 6dB to spare, even the most
horrible of loads presents at least 12dB return loss, with the probe
seeing 1/4 the reflection voltage of the original circuit.(however, the
attenuator doesn't lower the drive impedance as does your suggestion )

Brian

Brian Davis

unread,

May 11, 2005, 11:28:25 PM5/11/05

to

Austin,

>
>Lots of scope shots are available (ask your FAE).
>

Then why not publish them, along with a comparison of IBIS/HSPICE
simulations versus the real world measurements?

>
>But, I am sure our Marketing Folks will be rolling our scope shots
>as part of pitch-packs, etc. for those who are unable or unwilling
>to do the SI engineering that their job requires of them.
>

Let's see if I've got this straight [1]:

A) Xilinx publicly posts in FPGA and SI forums touting their
real world X vs. A package testing, and asks for feedback [2]

B) Forum users post some suggested measurements, which a
certain Xilinx employee says they can make

C) Two months later, when asked when said measurements might
be published, the very same Xilinx employee cops an attitude

>
>Get the ML450 board, or ask for the documentation.
>

That would be the same manual (UG077 v1.2) that mentions a
HyperTransport compliant DUT interface connector, without
pointing out that the the specified V4 FPGA Cin is 5x the
allowed HyperTransport max Cin for a 1 Gbps part ???

As to why that matters: a HyperTransport test probe attempting
to monitor the input link to the FPGA can't function properly
because Cin reflections off the FPGA would prevent the probe from
properly clocking the data at the mid T-line probe sampling point.

There are ways around this, but life would be easier if Xilinx
actually bothered to meet the spec in the first place.

Lacking that, proper documentation of your part's shortcomings,
and how and when to work around them, would be appropriate.

Brian

[1] Speaking of those unable to perform the SI engineering that is
required of them : when might we expect publication of characterized
static DCI power and DCI impedance modulation limits for the five year
old Virtex2 FPGA family ?

[2]
http://groups-beta.google.com/group/comp.arch.fpga/msg/d1004ae1fdca9825?hl=en

Austin Lesea

unread,

May 12, 2005, 11:04:45 AM5/12/05

to

Brian,

All I am trying to point out is that the load is 6.25pF + 100 ohms, not
12.5pF + 100 ohms.

When folks wave their arms and state 12.5pF is the LVDS load, they are
miss-stating it.

Simple point.

And once you do the simulations, or look at the actual waveforms, you
realize that it is mostly just a beauty contest. In communications
theory, excess bandwidth in the channel only adds to the error rate (due
to noise). Some band limiting is a good thing. Too much is a bad thing
(eg using the LVDS at 1.3 Gbs where it wasn't designed to be used, that
is where our MGTs are to be used).

Austin

Austin Lesea

unread,

May 12, 2005, 11:40:22 AM5/12/05

to

Brian,

Sigh.

See below.

Austin

Brian Davis wrote:

> Austin,
>
>>Lots of scope shots are available (ask your FAE).
>>
>
> Then why not publish them, along with a comparison of IBIS/HSPICE
> simulations versus the real world measurements?

All I can say, is that they are coming. Just takes awhile. Right now
we have much more important things to do: tout our power advantage, our
static current advantage, our speed advantage, our MGT advantage, our
PPC advantage, our SI packaging breakthrough ...
Showing an IBIS simulation of a five year old interface is just not high
on our list -- too many customers use it, and are perfectly delighted
with it. We do not want to be defocused and stop pointing out the areas
where we are clearly superior.

>
>
>>But, I am sure our Marketing Folks will be rolling our scope shots
>>as part of pitch-packs, etc. for those who are unable or unwilling
>>to do the SI engineering that their job requires of them.
>>
>
>
> Let's see if I've got this straight [1]:
>
> A) Xilinx publicly posts in FPGA and SI forums touting their
> real world X vs. A package testing, and asks for feedback [2]

Sure.

>
> B) Forum users post some suggested measurements, which a
> certain Xilinx employee says they can make

I did. Yes.

>
> C) Two months later, when asked when said measurements might
> be published, the very same Xilinx employee cops an attitude

OK, so I was snippy. I am told that the measurements will be done, but
again, it isn't a high priority.

>
>
>>Get the ML450 board, or ask for the documentation.
>>
>
>
> That would be the same manual (UG077 v1.2) that mentions a
> HyperTransport compliant DUT interface connector, without
> pointing out that the the specified V4 FPGA Cin is 5x the
> allowed HyperTransport max Cin for a 1 Gbps part ???

True: we are not an ASIC/ASSP. That is the one area where they win
(they can make these specs as tight as they please). But guess what?
We are growing, increasing sales, and ASICs are not. Our real
competition now is no longer other FPGA companies; it is the ASIC/ASSP
providers. We can supply features and circuits on technologies they
can't (yet). Who has 10 Gbs transceivers? Who has the lowest power
405PPC? Who has the lowest power/highest performance DSP48 blocks for
DSP applications? We do, they don't.

>
> As to why that matters: a HyperTransport test probe attempting
> to monitor the input link to the FPGA can't function properly
> because Cin reflections off the FPGA would prevent the probe from
> properly clocking the data at the mid T-line probe sampling point.

I claim in a real system, with a compliant transmitter, there will be
sufficient return loss matching to make the eye visible, and useful.
But, I agree, that in some cases, what you see is not what you get.
That can happen with a simple single ended input pin, and is definitely
true about 1Gbs, where observing it, breaks it (often). I think that
there is a whole class of people out there who have to see it to believe
it. OK. But, they should get used to the fact that none of the test
equipment is really fast enough to show them what they want to see. And
it is only getting worse.

>
> There are ways around this, but life would be easier if Xilinx
> actually bothered to meet the spec in the first place.

Already explained why we can't do that: 35 IO standards in one pin has
to make some compromises.

>
> Lacking that, proper documentation of your part's shortcomings,
> and how and when to work around them, would be appropriate.

We got all that. That is what the user's guide is for. That is what
the datasheet is for. Should we place a billboard on 101 South that
states the IOB pin capacitance is ~ 12pF? It is already in the
datasheet. So is the MGT, PPC, DSP48, etc. What do you think we should
spend time on?

>
>
> Brian
>
>
> [1] Speaking of those unable to perform the SI engineering that is
> required of them : when might we expect publication of characterized
> static DCI power and DCI impedance modulation limits for the five year
> old Virtex2 FPGA family ?

I think all this is now covered between data sheets, user's guides, and
technical answers on our website. Let me know if there is something
missing between those three resources.

Generally speaking, if we don't specify it, then you are on your own to
use it there. For example, if you chose to set the resistance to 100
ohms, to match a 100 ohm single ended line, we are not going to claim we
meet any standard (there isn't any), and we aren't going to spend time
characterizing all the silicon for it. I believe we state the range of
the resistance from 40 ohms to 150 ohms, but when you use it at anything
other than 50 ohms, you are required to check it out (I would run the
spice simulations -- you may request impedances other than 50 ohms for
the spice models of DCI, 40, 50, 68, and 75 are the ones we have if I
recall correctly), as that is not any one of the 35 IO standards that we
designed the IOB to support.

A small change, such as using the DCI at 68 ohms instead of 50 ohms is
used by quite a few (to save power). You can characterize it if you
need to, and if you feel there is a benefit you can derive, but unusual
usage of a feature in an area it was not intended to be used (not
specified), is not guaranteed.

fastgr...@yahoo.com

unread,

May 12, 2005, 1:31:23 PM5/12/05

to

Thank you all for responses. I didn't mean to start a spark, even
though I was curious about what Stratix/II could do in comparison.

I'm now leaning toward doing it - parts of core @360, DDR data output
@360 (720, effectively) along with forwarded clock @360. I'd be
running simulation to make sure there isn't any big issue at 720Mbps,
but since it's much lower than 1.2Gbps, I'm optimistic.

Can't say Altera is out of running, however. I just wanted to make
sure I could do it in some FPGA device before committing to the
interface.

Thanks again.

Symon

unread,

May 12, 2005, 1:47:30 PM5/12/05

to

"Brian Davis" <brim...@aol.com> wrote in message
news:1115862089.7...@f14g2000cwb.googlegroups.com...

>
> Best wishes on getting Austin to stop with his
> "but it's really half, differentially" handwaving.
>
> I've tried before, with results similar to that
> "but it goes to Eleven" bit from "Spinal Tap".
>

Brian, LOL, I'm beginning to feel the same way. It's interesting that almost
all the PCB differential pairs I've seen are edge coupled striplines or
microstrips, very few are broadside coupled. Of course, usually with edge
coupled lines most of the coupling is to the ground plane, and very little
between the conductors. So, it's much more like two 50 Ohm lines rather than
a 100 Ohm pair. Not that it makes any difference, of course.

> >
> >a way to improve things is to drive this capacitance
> >with a lower impedance
> >
>
> Also, when you've got plenty of drive margin, a differential
> attenuator ahead of the FPGA (with internal termination) works
> nicely to attenuate the reflection, and also makes for a convenient
> differential probe point. If you have 6dB to spare, even the most
> horrible of loads presents at least 12dB return loss, with the probe
> seeing 1/4 the reflection voltage of the original circuit.(however, the
> attenuator doesn't lower the drive impedance as does your suggestion )
>

Yep!
Cheers, Syms.

Jim Granville

unread,

May 12, 2005, 3:47:44 PM5/12/05

to

Austin Lesea wrote:

<snip>

>> There are ways around this, but life would be easier if Xilinx
>> actually bothered to meet the spec in the first place.
>
> Already explained why we can't do that: 35 IO standards in one pin has
> to make some compromises.

Perhaps it is time to make some pins less "jack of all trades, master
of none", and provide some with more focus ?

-jg

Austin Lesea

unread,

May 12, 2005, 4:16:16 PM5/12/05

to

Jim,

It is something we agonize over everytime we look at a new family.

Should we add IO standard specific IOB's? How many? How are they to be
organized?

What should the IO/CLB ratio be?

Or, should we continue with the present plans (if it ain't broke, don't
fix it)?

What business did we lose because we could not meet a customer's
requirement? How do we know we even lost any business at all?

We did add MGTs (and PPC's, EMAC's, DSP48's, ECC_BRAM's, FIFO_BRAM's,
etc), so it isn't like we are not looking at adding new things, or
mixing things up (the patented ASMBL architecture for example).

360 MHz, 720 Mbs DDR LVDS is now over five years old as something that
either X or A has provided with their devices. One can argue the fine
points, but as a gross capability, it has been there for quite awhile.

Austin

Brian Davis

unread,

May 15, 2005, 10:28:11 PM5/15/05

to

Austin,

Which of the following posts regarding Cin is more helpful
for both Xilinx and its' customers:

Austin [1]:

>
> Use the internal one, and the capacitance does not matter
> (do the sim yourself if you do not believe me).
>

Brian [2]:
>
> At no point have I claimed that the V2 inputs are unusable,
> but only that, in the presence of high speed drivers, extra
> engineering effort needs to be expended to both understand the
> impact of the V2 input capacitance on the interconnect, and
> find a work-around that is appropriate for the design at hand.
>

Austin wrote:
>
>When folks wave their arms and state 12.5pF is the LVDS load,
>they are miss-stating it.
>

The only I/O capacitance number published in your datasheet is a
single-ended parameter called Cin (or if you prefer, C_comp from
the IBIS files).

Quoting this published datasheet Cin value is perfectly valid,
and does not require "correction".

Comparing that number against the single ended Cin's of other
devices, or against a single ended spec, is also perfectly valid.

I have never said the differential load is 12.5 pf; it is clear
from my posts that I understand this, and also understand that
the assumption of Cdiff_effective = 1/2 Cin_single_ended applies
only for the differential components of the signals on the Tline.

I find it rather inconsistent that in past discussions of
Xilinx's newly onerous SSO limits for the current mode output
drivers, you've been quite insistent that real world paths are
NOT perfectly balanced-

Yet when discussing the effects of high Cin, you posit that
everything is perfectly balanced back to a perfect source
termination, so that a 50-60% voltage reflection off of your
input pins is never a problem.

If only all FPGA input buffers could live happily ever after
there in Austin's world, where all connections are ideal
differential point-point links, all drivers have perfect back
terminations, and no probing or multidrops are ever allowed.

>
>In communications theory, excess bandwidth in the channel only adds
>to the error rate (due to noise). Some band limiting is a good thing.
>

And massive, coherent input reflections do not fit the AWGN
assumptions of most channel models, now do they?

Brian

p.s. As for your other post, I'll reply once I finish recovering
from a hard drive crash at home and can find my old files again.

[1]
http://groups-beta.google.com/group/comp.arch.fpga/msg/57bbb3ea78e194ed?hl=en
[2]
http://groups-beta.google.com/group/comp.arch.fpga/msg/a044806f313848e6?hl=en

Paul Leventis (at home)

unread,

May 16, 2005, 11:43:32 PM5/16/05

to

Hi Austin,

Well, things are getting a little less busy with my day job, so I finally
have time to start replying again... I figured I'd start with an easy one.

> The fact that their LVDS works up to 1.3 Gbs in simulation is nice, but
> can it be used in a real application on a real board?

Yes. Stratix II has LVDS running at 1.3 Gbps reliably across process,
temperature, voltage. Beautiful eye diagrams. In simulation and on the
board. And as noted here
(http://www.altera.com/products/devices/stratix2/features/performance/st2-perf_improvements.html),
we will be increasing the spec to 1.25 Gbps in an upcoming version of
Quartus II.

BTW, our simulations line up very will with board measurements. We offer
accurate IBIS models that we proudly stand behind.

Regards,

Paul Leventis
Altera Corp.

Paul Leventis (at home)

unread,

May 17, 2005, 4:01:13 AM5/17/05

to

Austin:

> I am suprised at you. Their white paper clearly shows the simualtion is
> done with the external termination, and not the internal one.
> Use the internal one, and the capacitance does not matter (do the sim
> yourself if you do not believe me).

According to our engineer who ran the sims, we did use on-chip termination
for both V4 and Stratix II. I read the whitepaper again
(http://www.altera.com/literature/wp/signal-integrity_s2-v4.pdf) and I can't
find anywhere where it says we didn't use on-chip termination.

> The fact that their LVDS works up to 1.3 Gbs in simulation is nice, but
> can it be used in a real application on a real board?

Sorry to hammer on this again, but the above mentioned whitepaper does show
some beautiful eye diagrams for SII and some ugly ones for V4. It also
shows how nicely our lab measurement (of 1.3 Gbps LVDS on Stratix II)
compares to the IBIS simulation.

fastgr...@yahoo.com

unread,

May 17, 2005, 8:36:20 AM5/17/05

to

Wait a minute - don't oversimply the original design critera - 720Mbps
DDR LVDS is only a part of my question. The design also needs to run
the internals at 360Mhz, and that include portion of the fabric, not
just DSP48, etc. Five years ago, I don't think so. Maybe in the lab
somewhere, but not as an available product.

Austin Lesea

unread,

May 17, 2005, 10:36:16 AM5/17/05

to

No problem.

That is what all of the wonderful features are for in V4 (SSIO, IODLY,
DCM, etc.). All of the above go a long way to support the fabric. Even
though the fabric will run at 500 MHz, it is far easier to mux it down
to 200 MHz, or 100 MHz (using the built in SSIO features) which makes
place and route easier, and also provides a lot of margin.

Just go buy the ML450 board (network interfaces), and you will get a
fully working platform to test out all of your ~ 400 MHz up to 500 MHz
DDR interfaces.

Austin

Symon

unread,

May 18, 2005, 7:29:27 PM5/18/05

to

"Paul Leventis (at home)" <paulleve...@yahoo.ca> wrote in message
news:c9mdnXx_Euj...@rogers.com...

>
> we will be increasing the spec to 1.25 Gbps in an upcoming version of
> Quartus II.
>

Paul,
Does that mean in Stratix II I could run an internal clock at 625MHz and use
the I/O DDR to move data out at 1.25Gbps?
Thanks, Syms.

Paul Leventis

unread,

May 18, 2005, 8:46:06 PM5/18/05

to

Hi Symon,

There is a hard serializer/deserializer circuitry available for the
left and right LVDS I/O banks. These SERDES blocks allow you to
deserialize/serialize by any factor between 4 and 10x. For example,
you could bring in a 4x data bus running at 312.5 Mhz. Or you can
bypass the SERDES block and use the DDR registers for a 2x SERDES. Or
bypass completely for 1x... but not at 1.25 Gbps. I don't know what
speed the SERDES/DDR I/O clock can run at or will run at when we update
this specification. I'm sure it will be published at the time.

We also have dedicated Dynamic Phase Alignment (DPA) circuitry for
source-synchronous applications. The DPA block enables you to
eliminate channel-to-channel and clock-to-channel skew. It achieves
this by selecting the best clock phase to use for each I/O pair,
centering the sampling window in the eye.

John M

unread,

May 19, 2005, 9:13:53 AM5/19/05

to

Symon,

According to the data sheet, you can run the LVDS I/O up to 500 MHz in
the fastest speed grade part. That would get you 1 Gbps. More likely,
you would use the SERDES. For example, at 130 MHz and using x8
serialization, you get 1.04 Gbps per pair. Here is a link to the DPA
datasheet:
http://www.altera.com/literature/hb/stx2/stx2_sii52005.pdf

John

Symon

unread,

May 19, 2005, 12:56:47 PM5/19/05

to

Paul and John,
Thanks very much for your replies! So, for 1.25Gbps I'd need to use the
SERDES. I guess that means I have to use the PLL circuit to make the clock?
If I had more than 1 of these links, how easy is it to ensure that they're
all synchronised together. For example,
I want to send bits a_1, a_2, a_3, a_4 etc. on I/O LVDS_A
I want to send bits b_1, b_2, b_3, b_4 etc. on I/O LVDS_B
I use the serdes to do this. Can I ensure that a_n appears at (more or less)
the same time as b_n? I.e. that the shift registers in the two serdes are
aligned?
I know, I should read the bloody manual more carefully, but I couldn't find
this on a first pass.
Thanks, Syms.

"Paul Leventis" <paul.l...@utoronto.ca> wrote in message
news:1116463566.3...@g43g2000cwa.googlegroups.com...

Paul Leventis (at home)

unread,

May 30, 2005, 11:23:23 PM5/30/05

to

Hi Symon,

Sorry for taking so long to reply.

> I want to send bits a_1, a_2, a_3, a_4 etc. on I/O LVDS_A
> I want to send bits b_1, b_2, b_3, b_4 etc. on I/O LVDS_B
> I use the serdes to do this. Can I ensure that a_n appears at (more or
> less)
> the same time as b_n? I.e. that the shift registers in the two serdes are
> aligned?

That's what the SERDES block is for. You just need to instantiate a
altlvds_rx (receiver) or altlvds_tx (transmitter) with the number of
channels you want in the link. Each of the channels will share a common
PLL. Therefore, they share a common clock, and the enable pulses derived
from that clock.

And if you want to give the manual another stab ;-), I've been told that
volume 2, chapter 5 of the Stratix II handbook, "High-Speed Differential I/O
Interfaces with DPA in Stratix II Devices"
http://www.altera.com/literature/hb/stx2/stx2_sii52005.pdf is helpful.
Figures 5-2, 5-11 and 5-12 are most applicable in this case.

Symon

unread,

May 31, 2005, 5:08:17 PM5/31/05

to

Paul,
Many thanks!
Syms.