The Xilinx Virtex II Pro seems to go up to about 325MHz...
Thanks.
Dave
I doubt any FPGAs can count that fast ... not directly
- you could use 4 counters running with different clocks (shifted by 90
degrees) at 250 MHz ...
- you could use the Rocket-IO SerDes ... de-serialize your gate-signal
and process the datawords at a lower frequency ..
bye,
Michael
Short answer is no.
Getting a divide by 2 close to 1GHz on a room temp typical basis is
probably do-able, check with Peter A. at Xilinx ?
Full margin, 32bit count and capture requires carry logic and so is
going to be slower.
If you really want to measure time, (or create pulse widths),
then FPGAs do have resources that can go under 1ns in time resolve.
-jg
>starfire wrote:
>> Are there any FPGA parts available today that can contain a 32-bit,
>> free-running counter running at 1GHz and a 32-bit storage register to take a
>> snapshot of the count and read it to a slower external interface?
>>
>> The Xilinx Virtex II Pro seems to go up to about 325MHz...
I've done (tiny) Johnson counters in Virtex II Pro that would go to
more than twice that speed. They could be used as a prescaler for a
larger binary counter.
800MHz seems to be about the limit. Perhaps the OP should wait for
Virtex 4.
(What about Peter Alfke's frequency counter? Didn't he claim 1GHz in
V2P?)
Regards,
Allan.
My application is for precise time correlation readings between random input
pulses starting with a reset/sync pulse. The thought is if a free-running
counter with 1ns resolution were reset to zero on receipt of the reset/sync
pulse then a snapshot of the count made when a series of pulses are received
(a separate 32-bit counter value when each pulse is received), a precise
time correlation could be made from the sync to any input and from any input
to any other input. The reset/sync pulse would normally be received before
allowing the counter to overflow (typically about 35ms).
What resources are you referring to when you say FPGAs have resources that
can go under 1ns in time resolve?
Dave
"Jim Granville" <no....@designtools.co.nz> wrote in message
news:SdXMc.280$zS6....@news02.tsnz.net...
A few months ago Xilinx announced that they had achieved 1 GHz performance
in the lab, so it's probably a couple of years away for production devices.
Leon
Sounds like a time-domain problem...
> What resources are you referring to when you say FPGAs have resources that
> can go under 1ns in time resolve?
Consider a 250MHz freq, with 4 phases in a DLL/PLL, capture of those
resolves to 1ns,but only needs to toggle at 250MHz.
Or, a long simple carry chain, with many capture registers :
An edge can capture to the delay quantize, so 200 chain of
200ps each, is 40ns. This will need alternate calibrate/measure,
as the delays are silicon derived, so are Vcc/Temp variable.
Some DLLs/DCM allow finer phase adj than 4, so 8 phase clock, and 8
copies of 125MHz counters/capture would resolve to 1ns (each IP edge).
You will need to watch aperture and metastable effects in cross-clock
domains, but the x8 copy scheme would allow you to check the integrity,
as all counters should be within 1 count of one another.
So you might read [+1][+1][+1][Whoops][+0][+0]{+0][+0]
[Whoops] is a wildly variant value, that indicates the sample edge
violated the [DeltaQt]+ [DeltaDt] aperture time.
As a general indication of the counter speeds/width, these are from
a Lattice data sheet ( not clear if these are guaranteed, or typical )
16-bit counter 360 MHz
32-bit counter 280 MHz
64-bit counter 180 MHz
-jg
well - there are still the RocketIOs ... You can easily reach 0.5 ns
There was a thread in March - look at
Message-ID: <BC8772E1.5C19%pe...@xilinx.com>
(with RocketIO-X you could even go further..)
bye,
Michael
http://groups.google.com/groups?threadm=BC8772E1.5C19%25peter%40xilinx.com
will work better for most people, as very few news servers will hold a
message from March.
Regards,
Allan.
I am surprised that no one has mentioned that you can pipeline a counter
to get much higher speeds. This takes more logic and your capture
registers must also be pipelined, but you can get much higher speeds
this way. Each bit of the counter has two FF outputs, one is that bit
of the count and the other is the carry out to the next stage. So each
bit of the counter will be one clock behind the next lower bit. It only
requires a single stage of carry propogation, so longer counters do not
run slower. This will run at about the same speed as a toggle FF.
Ci-1 ---- ----
-------| & |--------|D Q|--- Ci
+---| | | |
| ---- clk---|> |
+------------+ ----
---- |
|D Q|---+-------- Bi
clk | |
-------|> |
----
--
Rick "rickman" Collins
rick.c...@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.
Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX
Well spotted.
One key to pushing FPGAs is you can trade logic for speed, and as
these devices have a ton of registers anyway, it
does not matter if you use x8 or x16 the possible minimum number, if
it means you can get to x4 the time precision.
-jg
> I think your safest bet is to sample the input with four staggered 250 MHz
> clocks, feeding four shift registers.Then differentialte the edges and move
> them into a common 250 MHz clock domain. Or you use 8 phases for an even
> safer circuit. Virtex-II can adjust the clock in 50 ps increments, and 250
> MHz is reasonable, while 125 MHz is easy, and definitely guaranteed to work.
> You can capture the data, store the arrival time of 512 input pulses in a
> BlockRAM and read the data out at your convenience.
> The trick of using MGTs has not been proven yet, but this thread rekindled
> my interest...
Good to hear that :) - thinking aloud...
What about a design that uses all the tricks, to push time-resolve as
far as possible ? ( with maybe some FPGA family splits ) :
* Pilelined counter (rickman) - what about a pipelined Gray counter ?
* Multiple Phase counters/captures x4 is simplest, x8 is using more
resource. What is the limit - x16 / x32 ...
( 'limit' would be when the routing/delay/capture uncertainty jitter
meets the minimum time resolve - tho like ADCs, you can generate LSB
that needs further averaging/filtering to be usefull )
With simple phased clocks, you can syncronise the sample/unknown pulse
in each domain, which allows binary pipeline capture.
Gray counters would need post-conversion for maths, but they do avoid
the aperture effects of the capture pulse arriving, and so they
open the option of fractional LSB extension via clock-capture & delay lines.
-jg
--
Luis Vaccaro
"Jim Granville" <no....@designtools.co.nz> wrote in message
news:02gNc.497$zS6....@news02.tsnz.net...
> Good to hear that :) - thinking aloud...
> What about a design that uses all the tricks, to push time-resolve as
> far as possible ? ( with maybe some FPGA family splits ) :
> * Pilelined counter (rickman) - what about a pipelined Gray counter ?
> * Multiple Phase counters/captures x4 is simplest, x8 is using more
> resource. What is the limit - x16 / x32 ...
> ( 'limit' would be when the routing/delay/capture uncertainty jitter
> meets the minimum time resolve - tho like ADCs, you can generate LSB
> that needs further averaging/filtering to be usefull )
The most obvious limit here would be the amount of clock resources -
global clock lines (I suppose you don't want to use local clocks), and to
a lesser degree DCMs.
-g
> What about a design that uses all the tricks, to push time-resolve as
> far as possible ? ( with maybe some FPGA family splits ) :
> * Pilelined counter (rickman) - what about a pipelined Gray counter ?
You could just as well use a shift register for the lower 4 bits of the
counter (16 FFs), and use the pulse from the last FF to increment
a (16x slower) normal counter. *Somehow*.
> * Multiple Phase counters/captures x4 is simplest, x8 is using more
> resource. What is the limit - x16 / x32 ...
> ( 'limit' would be when the routing/delay/capture uncertainty jitter
> meets the minimum time resolve - tho like ADCs, you can generate LSB
> that needs further averaging/filtering to be usefull )
The most obvious limit here would be the amount of clock resources -
Yes, you can use prescaler schemes, which can be shift register /
Johnson counters, but that presents problems on capture.
>
>>* Multiple Phase counters/captures x4 is simplest, x8 is using more
>>resource. What is the limit - x16 / x32 ...
>>( 'limit' would be when the routing/delay/capture uncertainty jitter
>>meets the minimum time resolve - tho like ADCs, you can generate LSB
>>that needs further averaging/filtering to be usefull )
>
>
> The most obvious limit here would be the amount of clock resources -
> global clock lines (I suppose you don't want to use local clocks), and to
> a lesser degree DCMs.
The DCMs look to have all the nice logic, but a rather limited number
of taps for this type of time-extension (pity).
Their advantage is they are there already, and are easy to deploy,
and if this is your prime usage, who cares if all the DCMs are used ?.
You can, of course, make a similar fine-time device using the FPGA
itself, but that becomes a much more process and tool dependant path.
But it would be interesting, and may approach 100ps in resolution.
-jg
> Yes, you can use prescaler schemes, which can be shift register /
> Johnson counters, but that presents problems on capture.
Indeed, but not that big I guess.
> > The most obvious limit here would be the amount of clock resources -
> > global clock lines (I suppose you don't want to use local clocks), and to
> > a lesser degree DCMs.
> The DCMs look to have all the nice logic, but a rather limited number
> of taps for this type of time-extension (pity).
> Their advantage is they are there already, and are easy to deploy,
> and if this is your prime usage, who cares if all the DCMs are used ?.
Emphasis on 'to a lesser degree'. Even the small devices have 4 DCMs,
each of which should supply you with four clocks (0/90/..). Starting
with the 2vp20 you get 8 of them, but by then you are already running
out of global clock nets (8/16).
> You can, of course, make a similar fine-time device using the FPGA
> itself, but that becomes a much more process and tool dependant path.
> But it would be interesting, and may approach 100ps in resolution.
It does.
"Process dependance" is a really nice understatement *g* I'm thinking
more along environmental conditions, the exact placement you choose,
etc. But again, it's not much help if you can't distribute your clocks.
"Tools" are down to fpga_editor & co, too.
regards,
-g
I suggest a synchronous design running at 250 MHz (synchronous counter,
transfer to BlockRAM etc) augmented with a small "prescaling" front-end.
The input line gets clocked into four flip-flops in parallel, each clocked
on a different quadrant of the 250 MHz clock. Using the flip-flop clock
polarity option, this requires only two global lines driven by one DCM.
Now that we have captured the input edge in 4 flip-flops, we have to figure
out where it was captured first. For that, we must move the four staggered
signals into the same clock domain, and we should move any signal only by a
quarter clock per step (to avoid excessively tight delay requirements). This
takes half a dozen flip-flops, followed by a 1-of-4 decoder that defines the
position of the leading edge, and is used as the two LSBs for the timer.
This circuit would have problems if two pulses arrive within 4 ns, but I
hope that is physically impossible.
Counter trickery is really not necessary. It's all synchronous to 250 MHz.
It's only the sub-one-nanosecond resolution that requires some trickery.
Peter Alfke
>
Here is a real-world example - gives an indication of what is
technically possible (not sure if this is entirely in a FPGA), and why
time-domain is easier to push than frequency domain.
http://www.pendulum.se/Text.htm/CNT-90.htm
This specs 100ps resolve, 12 Digits/sec and 300MHz std,
with prescalers to some GHz.
12 digits/sec is == reading at 100KHz, and getting 7 digit results on
each reading.
-jg
Peter Alfke wrote:
> Here are my thoughts for a fairly simple implementation. If I recall, the
> original post asked for a report of the arrival time of input pulses (let's
> assume rising edges) with a resolution of 1 ns.
(snip)
If I understand the way it is done in some experiments requiring
sub nanosecond timing, they call it a TDC, time to digital converter,
and it seems to work by generating a ramp signal, and digitizing
it with a flash A/D converter at the trigger point. Maybe a 100MHz
counter, and the sawtooth/ADC to get the low order bits. It might
take some calibration, but numbers in the 100ps range seem to be
easily obtained. 100MHz and a 6 bit ADC would give
10ns/64 or 156.25ps.
-- glen
In some TDCs, the ramp is generated by trigger signal and sampled by the
steady clock. If the ramp came from the clock, you'd have large
uncertainties near the corners; the harmonics to get a "nice" corner are
also huge and outside the range of affordable A/D converters. By designing
to guarantee a (reasonably) linear ramp, the trigger-signal's start ramp can
be sampled in 2 spots on the ramp far from the "corner" giving the precise
delta voltage for one sampling clock period. The stop ramp can also be
sampled in 2 spots on the ramp and should have the same delta voltage. The
voltage difference between these two ramp sample pairs relative to the
voltage difference for one clock cycle will give you the offset in time
relative to one clock cycle. But I hate precision analog beyond a few 10s
of MHz.
I prefer sinusoids. Generating a reference sine and cosine pair at a high
frequency (say 200 MHz so nice A/Ds can be used), the two sinusoids can be
sampled with a dual-channel A/D using the incoming signal as the trigger to
get a sine/cos voltage pair. As long as the maximum amplitudes are measured
or otherwise calibrated and the phase offset is close enough to 90 degrees
(which can be calibrated downstream), the phase of the incoming signal comes
straight from the arctan( sin/cos ) without ambiguity since the signs of the
sine and cosine dictate the quadrant. The components to produce the clean
sin/cos pair are widely available since many RF systems use I/Q
modulation/demodulation or other "quadrature" techniques. With 10 bit A/D
converters, the phase resolution is about 11 bits or about 2.5 ps resolution
at 200 MHz; at this point the reference jitter and A/D aperture uncertainty
will be major factors in the error budget. The incoming signal can
transition as fast as the A/Ds can sample.
It's a pretty system and FPGAs can do a great job with cartesian to polar
conversion.
(snip of TDC description)
> In some TDCs, the ramp is generated by trigger signal and sampled by the
> steady clock. If the ramp came from the clock, you'd have large
> uncertainties near the corners; the harmonics to get a "nice" corner are
> also huge and outside the range of affordable A/D converters. By designing
> to guarantee a (reasonably) linear ramp, the trigger-signal's start ramp can
> be sampled in 2 spots on the ramp far from the "corner" giving the precise
> delta voltage for one sampling clock period. The stop ramp can also be
> sampled in 2 spots on the ramp and should have the same delta voltage. The
> voltage difference between these two ramp sample pairs relative to the
> voltage difference for one clock cycle will give you the offset in time
> relative to one clock cycle. But I hate precision analog beyond a few 10s
> of MHz.
That sounds like a better description of the one I was trying
to describe. It is sometimes used for high resolution timing
of photomultiplier tube pulses. You could use the semi-analog
method to measure the time relative to a steady clock, and count
the number of clock cycles in between.
The main point I was trying to make was that there are some
partly analog methods that can get timing resolution finer than
affordable digital methods. There are some that claim 50ps.
(snip of sin/cos TDC description)
-- glen
: (snip of TDC description)
:> In some TDCs, the ramp is generated by trigger signal and sampled by the
...
For TDCs, look at http://www.acam.de
--
Uwe Bonnes b...@elektron.ikp.physik.tu-darmstadt.de
Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
I've been thinking about this again, because the OP's question was
physics related, so from their PoV: does using mixed settings in the
posedge/negedge option of the FFs introduce a systematic error because
the inverter on half of the FFs causes additional delay? Or is there
no additional delay (because of how the FFs are built in silicon or
some other reason)?
regards,
-g
If you simply invert, then you have both the delay miss-match issue,
and also any duty cycle variation from 50.00% becomes a time skew.
If you use 4 possible phases from the DCM, presumably that removes the
duty cycle issue, but you will still have LSB step errors in
precise timing. My understanding of the DCM is these are << 1ns,
(but they are not zero).
-jg
how closely matched are the IOs assuming you stay within the same bank?
1ns is only about 140mm of trace on FR4
-Lasse
it's about the same as the distance around a FF1152 package ;)
-Lasse
I think it's reasonably common to use a "long" trace on a PCB
as a short delay. But suppose the delay is "timing-critical"?
How stable is FR4 over temperature? Humidity? Is there anything
else that influences the delay on an existing board?
How repeatable is the delay from batch to batch? I think the
delay only depends upon the dielectric constant and that probably
depends upon the ratio of glass to plastic. Are there layout
patterns that make it easier for the board house to make the
same result consistently?
--
The suespammers.org mail server is located in California. So are all my
other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's. I hate spam.
The use of pcb traces for circuits (delays, filters, etc.) has a long
tradition. I have heard that the characteristics can be held to +/- 3%
easily (almost without thinking as long as the materials used to make
the board are specified: layer thicknesses, pre-preg thicknesses,
copper thickness).
Cell phones are just one miracle that use a lot of pcb 'components'.
Austin
> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> :> In some TDCs, the ramp is generated by trigger signal and sampled by the
> For TDCs, look at http://www.acam.de
I looked for some TDC descriptions before posting,
but I didn't find a good one. Within the above site,
http://www.acam.de/Content/English/tdc_method.html
seems to be the one with an actual description. This is
a little different from the one I knew about before, but
interesting anyway. They call it fully digital, though
it depends on the delay through a series of inverters.
-- glen
Hi,
I am currently developping the oposite and came up that it might not be
good to use an FPGA because of
routing delays that might no be equal on all paths. My suggestion is to
use a cypress Roboclock to get
four phases and use a CPLD to modulate the pulse. The CPLD path delays
are expected to be allmost
the same as long as you take care not to use more than five PTs.
So in my application I have a 100MHz clock with four phases (1.25ns
increment) and the FF triggering
at rising and falling edges giving 800MHz resolution.
My interrest is to know whether I can archive the same precission
(<100ps) in an FPGA?
Regards
Thomas
PS: It is a PWM modulator for a digital amplifier.
Thomas Rudloff wrote:
(snip)
> My suggestion is to
> use a cypress Roboclock to get
> four phases and use a CPLD to modulate the pulse. The CPLD path delays
> are expected to be allmost
> the same as long as you take care not to use more than five PTs.
I just learned about the roboclock last week. According to the
sheet I have it only goes to 80MHz. Maybe there are newer ones that
go faster.
> So in my application I have a 100MHz clock with four phases (1.25ns
> increment) and the FF triggering
> at rising and falling edges giving 800MHz resolution.
> My interrest is to know whether I can archive the same precission
> (<100ps) in an FPGA?
The IOB FF's avoid the routing delay to internal FF's.
I don't know about 100ps, though.
-- glen
"almost the same" depends on what precision matters....
>
> So in my application I have a 100MHz clock with four phases (1.25ns
> increment) and the FF triggering
> at rising and falling edges giving 800MHz resolution.
>
> My interrest is to know whether I can archive the same precission
> (<100ps) in an FPGA?
>
> Regards
> Thomas
>
> PS: It is a PWM modulator for a digital amplifier.
It is not clear what you are trying to do. From the application, more
than your questions, it sounds like you wish to control an edge position
to a precision of ~100ps ? On what lower Frequency ?
You can use multiple phase clocks to improve timing precision above
1/fclk and getting to 1ns has the consensus of do-able.
If your modulator frequency is high, you can also use rate-multiplier
edge modulation, to give better audio-band precision ?
To get to 100ps is going to push away from the realm of clock edges,
and into the realm of silicon delay lines.
I think there are test modes in the DLLs, and carry chains are the user
fabric with the most speed.
Philip F. made this comment in another thread re Virtex4
> Looks like a 32 bit counter hits 360 MHz, in a -11, with preliminary
> speed files. Gotta love the 41.5 ps/bit carry chain.
that suggests a time granularity of sub 50ps will be doable in the next
generation devices - these delays need continuous calibration, as they
will be Vcc/Temp/Process dependant.
-jg
> Thomas Rudloff wrote:
> <snip>
>
>> Hi,
>>
>> I am currently developping the oposite and came up that it might not
>> be good to use an FPGA because of
>> routing delays that might no be equal on all paths. My suggestion is
>> to use a cypress Roboclock to get
>> four phases and use a CPLD to modulate the pulse. The CPLD path
>> delays are expected to be allmost
>> the same as long as you take care not to use more than five PTs.
>
>
> "almost the same" depends on what precision matters....
>
I expect this within 100ps when propperly floor planed.
>>
>> So in my application I have a 100MHz clock with four phases (1.25ns
>> increment) and the FF triggering
>> at rising and falling edges giving 800MHz resolution.
>>
>> My interrest is to know whether I can archive the same precission
>> (<100ps) in an FPGA?
>>
>> Regards
>> Thomas
>>
>> PS: It is a PWM modulator for a digital amplifier.
>
>
> It is not clear what you are trying to do. From the application, more
> than your questions, it sounds like you wish to control an edge
> position to a precision of ~100ps ? On what lower Frequency ?
>
Ok, wasn't quite clear. I want to controll the edge with a selected
phase with 1.25ns increment. The error between the different phases
should not be more than 100ps. So I have an effective sampling rate of
800MHz and an error of 100ps max.
> You can use multiple phase clocks to improve timing precision above
> 1/fclk and getting to 1ns has the consensus of do-able.
>
That's what it is.
> If your modulator frequency is high, you can also use rate-multiplier
> edge modulation, to give better audio-band precision ?
>
I do not think that I can get the same precision. As long as the error
is predictable I can correct it (noise shaping).
> To get to 100ps is going to push away from the realm of clock edges,
> and into the realm of silicon delay lines.
>
The 100ps are not the resolution. It's the phase error.
> I think there are test modes in the DLLs, and carry chains are the user
> fabric with the most speed.
>
> Philip F. made this comment in another thread re Virtex4
> > Looks like a 32 bit counter hits 360 MHz, in a -11, with preliminary
> > speed files. Gotta love the 41.5 ps/bit carry chain.
>
> that suggests a time granularity of sub 50ps will be doable in the next
> generation devices - these delays need continuous calibration, as they
> will be Vcc/Temp/Process dependant.
>
It's interresting. But can I keep the different on chip routing delays
of the different phases within 100ps to each other?
The DLL surely will give me the resolution. But if one delay is some
100ps longer than the other there will be the same difference as phase
error on the output for some patterns
The simpliest way is to use a SERDES but I cannot use BGA chips..And
since I need multiples using externals takes too mch PCB space. This
could be an option for the OP if only one is needed.
Regards
Thomas
>
>
> Thomas Rudloff wrote:
>
> (snip)
>
>
>> My suggestion is to use a cypress Roboclock to get
>> four phases and use a CPLD to modulate the pulse. The CPLD path
>> delays are expected to be allmost
>> the same as long as you take care not to use more than five PTs.
>
>
> I just learned about the roboclock last week. According to the
> sheet I have it only goes to 80MHz. Maybe there are newer ones that
> go faster.
>
Yup, it's the CY7C994 IIRC. I do not have the datasheets at home. There
is a part that goes up to 200MHz.
>> So in my application I have a 100MHz clock with four phases (1.25ns
>> increment) and the FF triggering
>> at rising and falling edges giving 800MHz resolution.
>
>
>> My interrest is to know whether I can archive the same precission
>> (<100ps) in an FPGA?
>
>
> The IOB FF's avoid the routing delay to internal FF's.
Yup, but I need to OR the output of different FFs. But maybe an external
gate could be an option.
>
> I don't know about 100ps, though.
>
It is only the difference of the different paths.
Now I follow...
The key question is: can you generate (eg) a 4 Phase clock, from a
200MHz source, to give 4 phases each 90' apart (1.25ns), with a
time-precision of +/-100ps on each edge ?
I believe you can do this with the std DCM (so do not need a Cypress
Clock chip?), but Peter A. might be able to better advise the
Family/speed grade to do this ?
Since I think you want a pulse-modulation from this finer-time scheme,
you will need to add the delay skews of the logic that decides which
edge to act on.
There is an Altera app note that specified the delays in each path of
their LUT, and they are NOT all the same. Xilinx quote a single larger
value, but it's not clear if that is because they are actually tightly
matched, or if their SW cannot track the path deltas, so they take the
worst one.
-jg