The memory on a fpga is limited. But the latency of external memory is
still a bottleneck. Especially because the local bus has to share its
resources. Anybody who knows a development board/fpga that has a reduced
latency of the external memory?
---------------------------------------
This message was sent using the comp.arch.fpga web interface on
http://www.FPGARelated.com
Hi,
you don't specify what "reduced latency" is,
reduced compared to what ? What is your goal ?
Can you speak a bit about your application ?
BTW latency is (roughly) proportionnal of
- the price of the components
- their age
- the required capacity
all 3 are very closely related.
If you only need a few megabytes,
some parts (quite expensive) are very fast :
there are synchronous SRAMs, some with dual data rate,
used in the telecom industry, that go above 200MHz
in pipelined mode. Some recent Altera/Xilinx go
even faster on expensive reference boards, IIRC.
Look at these manufacturers : ISSI, GSI, IDT, Cypress,...
Example of an interesting part that I found with
a broker : GSI's GS8322Z36B-225 has 1M words of 36 bits,
capable of 225MHz (cycle time below 5ns). Some newer
parts are even faster (350MHz ?) and have dedicated
data buses for read and write. Now, the price may be
a problem, not only the part itself but also the
PCB technology that the BGA packaging requires...
If you need a ready-made solution, it's going to cost
you what you'll get (a lot). But there are probably
many ways to turn your problem around so it does not
kill your budget : for example if your application
can be designed to use cache optimisations, strip-mining
or space-time locality, then your onchip SRAM could be enough.
But I suppose that if large SRAMs exist, it is because
not all algorithms can be tweaked this way :-)
hope it helps,
yg
--
http://ygdes.com / http://yasep.org
latency depends memory TYPE mostly
0 latency (async) memory cost MUCH more then dynamic memory per bit
Antti
I'll tell a little bit more. The development board I have now is the XUPV2
with the Virtex-2 Pro XC2VP30 FPGA. The purpose of the project is to make
video processing possible in real-time and with high resolutions. So the
internal memory will not suffice. But I heard that the latency of the local
bus is too high to make that possible. So I thought anybody could advice me
another board or FPGA. Or am I just bad informed about those latencies?
> 0 latency (async) memory
there can not be "0 latency" :-)
it's better to measure this in nanoseconds.
> cost MUCH more then dynamic memory per bit
sure. it's always a trade-off...
and the reduced cost of DRAM is offset by
the complexity of driving the complex signals :-/
> Antti
yg
PS: is your server back online ?
and your email ?
Video processing typically doesn't need access to large memories in
order to do the processing. The processing operations are relatively
local. For that, I'm sure you'll find that the Virtex has sufficient
memory. The larger, slower, external memory is used for the bulk
storage. In order to process the data, you move it from the external
memory into the internal memory, process it and store it back in the
external memory.
In that scenario, memory latency is generally not an issue, only clock
frequency.
> But I heard that the latency of the local
> bus is too high to make that possible. So I thought anybody could advice me
> another board or FPGA. Or am I just bad informed about those latencies?
>
You seem to be misinformed about the requirements that you have for
your own video processing and how one would implement that function in
hardware.
Kevin Jennings
>On Dec 20, 9:48=A0am, "Ghostboy" <Ghost...@dommel.be> wrote:
>> The purpose of the project is to make
>> video processing possible in real-time and with high resolutions. So
the
>> internal memory will not suffice.
>
>Video processing typically doesn't need access to large memories in
>order to do the processing. The processing operations are relatively
>local. For that, I'm sure you'll find that the Virtex has sufficient
>memory. The larger, slower, external memory is used for the bulk
>storage. In order to process the data, you move it from the external
>memory into the internal memory, process it and store it back in the
>external memory.
>
>In that scenario, memory latency is generally not an issue, only clock
>frequency.
>
>> But I heard that the latency of the local
>> bus is too high to make that possible. So I thought anybody could advice
=
>me
>> another board or FPGA. Or am I just bad informed about those latencies?
=
>=A0
>>
>
>You seem to be misinformed about the requirements that you have for
>your own video processing and how one would implement that function in
>hardware.
>
>Kevin Jennings
>
Ghostboy wrote:
> Thanks
> But what if I need to buffer a frame with a resolution of 1024*768 and it
> will constantly be updated?
You seem to mistake latency, access time and bandwidth.
Video streams are fairly stable, so latency issues can
easily be "masked", "shadowed", pipelined... given
some FIFOs here and there for example.
now, the speed is another issue but it can easily be overcome,
just do the math and read the datasheets.
1024*768*3 (assuming 888 RGB) = 2.3MB
If you need double buffering (if you can't exploit
some smart-ass pointer techniques) then the requirement is 5M bytes.
With a kit that has 8MB of RAM, you're fairly safe.
now, the bandwidth : let's say you need to read and write
the whole buffer for every frame, 30 times per second :
2.3MB*2*30 = 141MB/s
that's more than what a PCI bus can handle
but given a 32-bit wide bus, it is reduced to 35MHz speed.
Account for refresh cycles, bus turnaround cycles,
blanking times, inefficient packing of the RGB components (0RGB aligned
to 32-bit boundaries) and other stuff, so you need around 60MHz.
Most decent and recent kits should do this out of the box.
Even this old board
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=150398039709
(i'm not the seller) has 6MB of fast asynchronous SRAM
nicely organised in 24-bit words, no byte is wasted with 32-bit alignment.
it can store a bit more than 2 frames of 768Kpx.
Access time is 33ns, so let's say it can work at pixel speeed
(27 or 30MHz) with 96 bits each time : that's 360M Bytes per second.
With no refresh cycle, no burst, no address multiplex and no
DRAM bank to manage...
OK I cheat : this board is obviously designed for video applications.
And I have not checked the schematics.
But if only I could make Altera's tool work on my computer... :-(
Have fun,
Ghostboy wrote:
> Thanks
> But what if I need to buffer a frame with a resolution of 1024*768 and it
> will constantly be updated?
You seem to mistake latency, access time and bandwidth.
Have fun,
> so let's say it can work at pixel speeed
> (27 or 30MHz) with 96 bits each time : that's 360M Bytes per second.
no, more than that : we can expect about 60MHz, more than 700MB/s
which should be more than enough :-)
With that little tidbit of information you should be able to compute
how much external memory your system will require and nothing more.
I'll repeat one more time, video processing does not typically work on
more than a relatively small area of the image at a given time. The
video data for that relatively small area can generally be held in the
internal memory of an FPGA. Your system would move the data from the
large, slow external memory into the FPGA's faster internal memory,
get processed and then get send it back out (presumably to memory,
maybe someplace else).
The amount of internal memory you need is defined by the processing
algorithm, not the dimensions of the video frames that you're trying
to process.
KJ