Does there exist a best implementation of Asynchronous FIFO?
Any suggestions will be appreciated!
Best regards,
Davy
I guess it depends on what you're looking for.
At minimum, it should *work* ...
Then the rest is a compromise of resources/speed/feature(like almost
empty/full flags,...)/...(reliability?)
Sylvain
I designed the crucial asynchronous empty arbitration logic, and it
works perfectly: We tested it by writing data at ~200 MHz into the
FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect
logic had worked flawlessly for all those >10e14 operations when we
stopped the test after a week.
No real FIFO application will probably ever go empty 200 million times
a second...
The high performance is due to very fast and compact full-custom logic,
and our long experience in analyzing and dealing with the effects of
metastability.
Peter Alfke, Xilinx Applications (posting from home)
Why stop after 1 week ?. Sounds like the sort of app nice to have
spinning in the corner of the lab forever....
Did you also test the full detect, or is that expected to be the same
by symmetry ?
> No real FIFO application will probably ever go empty 200 million times
> a second...
> The high performance is due to very fast and compact full-custom logic,
> and our long experience in analyzing and dealing with the effects of
> metastability.
So does that mean devices without this full-custom logic, can expect
lower performance, and if so, how much lower ?
[eg Spartan 3 / 3E ?]
-jg
For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).
Yes, using the fabric to implement the FIFO controller might limit the
speed to 250 MHz.
The reasons for the "hard" FIFO controller were:
Higher performance, guaranteed reliable operation without user
involvement, and saving fabric resources as well as power consumption.
The same reasoning will be used for future "hard" subfunctions. It's
the best way to increase speed, functionality, and user-friendliness.
How else can we improve by a factor 2 or even more?
Peter Alfke
"Peter Alfke" <al...@sbcglobal.net> wrote in message
news:1129501822.6...@f14g2000cwb.googlegroups.com...
"Peter Alfke" <al...@sbcglobal.net> wrote in message
news:1129501822.6...@f14g2000cwb.googlegroups.com...
So Peter, what do those of us with lowly Spartan-II FPGA's do if we
want say, a 16x9 FIFO?
-Dave
A very small 16 by N sync FIFO is easy in the SpartanII using N SRL16E's
(and a 5 bit counter)
Peter Wallace
RAUL
How do you model metastability, which needs sub-femtosecond resolution?
How do you model that an asynchronous FIFO generates its EMPTY flag in
time, even under the most adverse timing conditions between the two
incoming clocks?
Those have been things that kept me awake at night :-(
Peter Alfke
Usually in RTL simulations you don't even want to model things like that.
Most important thing is to get fast simulation times for the whole design.
And at least in the past Xilinx models were overly complex for pure RTL
simulations, and usually own simulation models were needed to get the speed.
The correctness of the async fifos must come from the design, reviews
etc. It's impossible to simulate all the cases.
Of course with netlist simulations timing accurate models are needed,
but that is small part of simulations. That is usually done to check
timing constraints and synthesis bugs (if formal verification tools are
not part of the users toolset). Asynch portions are almost impossible to
simulate. Nowadays there are also formal tools that check clock domain
crossing correctness etc. Those tools can even inject errors during
simulation that could be caused by metastability (the places are found by the
formal portion).
--Kim
There is no need to simulate metastability. The RTL simulations are
functional. All conditions of empty and full have been verified with
directed and random behavior over long simulations with clocks sliding
past each other. The FIFOs are as assymetrical as 128 bits in and 16
bits out and with clocks as different as 37.125 MHz and 100 MHz.
The simulations have been proven correct in the lab on Virtex-2 Xilinx
FPGAs running for several hours with real data.
ModelSim PE's code profiler said that time was being spent mostly in
the Xilinx FIFOs.
RAUL
Xilinx synchronization FIFO problems show in just a few minutes. There
used to be a problem with the previous release of their core generator,
the latest one works fine in the lab. Just make sure you always have
the latest version of the core generator to avoid headaches in the lab
that would not show in simulation.
RAUL
I think the thread was about async FIFO. For this SRL16's are of no
use. Your best bet is the COREgen FIFO using distributed memory for
shallow (x15 or x31) FIFO's or block memory for deeper ones.
You may want to browse a number of papers on my web page for coding
guidelines and coding styles related to multi-clock design and
asynchronous FIFO design.
At the web page: www.sunburst-design.com/papers
Look for the San Jose SNUG 2001 paper:
Synthesis and Scripting Techniques for Designing Multi-Asynchronous
Clock Designs
Look for the San Jose SNUG 2002 paper:
Simulation and Synthesis Techniques for Asynchronous FIFO Design
Look for the second San Jose SNUG 2002 paper (co-authored with Peter
Alfke of Xilinx):
Simulation and Synthesis Techniques for Asynchronous FIFO Design with
Asynchronous Pointer Comparisons
Peter likes the second FIFO style better but the asynchronous nature of
the design does not lend itself well to timing analysis and DFT.
I prefer the more synchronous style of the first FIFO paper.
I hope to have another FIFO paper on my web page soon that uses Peter's
clever quadrant-based full-empty detection with a more synchronous
coding style.
We spend hours covering multi-clock and Async FIFO design in my
Advanced Verilog Class. These are non-trivial topics that are poorly
covered in undergraduate training. I have had engineers email me to
tell me that their manager told them to run all clock-crossing signals
through a pair of flip-flops and everything should work! WRONG!
Regards - Cliff Cummings
Verilog & SystemVerilog Guru
www.sunburst-design.com