feeding a FIFO from PCI

John Larkin

unread,

Apr 12, 2008, 12:38:30 PM4/12/08

to

Hi,

I'm working on a proposal to design a box that will control a
scientific gadget. Our box will output frequency sweeps, arbitrary
waveforms, a couple of dozen voltages that can be changed/ramped per
user desires, and some discrete logic levels and triggers.

One architecture would pack an Intel-cpu SBC and a custom board in a
2U rack box. The SBC would talk gigabit ethernet to the customer's
system and PCI to our board.

Something like this, maybe:

http://us.kontron.com/index.php?id=226&cat=527&productid=1726

Our board would have a PCI interface driving a biggish FIFO, say 8k
deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux
thing repacks the 32-bit pci transfers into the input of the 48-bit
wide fifo. The output side of the FIFO would be driving a fairly
simple state machine; each fifo word has an opcode field and a data
field, with different opcodes feeding various devices connected to the
physics... dds synthesizers, ttl outputs, whatever. The state machine
that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so
we can slow down operations to match the realtime needs of the
experiment and reduce the average fifo feed rate.

OK, we finally get to a question: If we run some flavor of Linux on
the SBC, what's a good strategy for keeping the fifo loaded? Assuming
that we have the recipe for an entire experimental shot in program
ram, some tens of megabytes maybe, we could...

1. Have the fifo logic interrupt the cpu when the fifo is, say, half
empty. The isr would compute how empty the fifo actually is at that
instant and set up a short dma transfer to top it off.

2. A task (or isr) would be run periodically, a thousand times per
second might work, and it would be responsible for topping off the
fifo, either dma or maybe just poking in the data in a loop.

3. Best, if possible: set up a single DMA transfer to do the entire
shot. That involves a dma controller that understands that the target
is sometimes busy, and retries after getting bounced. I know the pci
bus has hooks for split transfers, but I don't know if standard
Intel-type dma controllers can work in this mode.

4. If it's a dual-core cpu, is it hard (under Linux) to assign one cpu
to just do the fifo transfers?

5. Other ideas?

The problem is not unlike feeding a sound card. How does that work?
Maybe we could start with a sound card driver and hack that?

Any suggestions for resources? Books, model drivers, references,
people we could hire to write the drivers for us? Being a hardware
guy, mostly analog at that, I don't know much about this stuff.

Right now, I only have to write up a plausible proposed architecture,
but if we get the job, p>=0.5 maybe, we'll have to do actually it.

Thanks!

John

Jamie

unread,

Apr 12, 2008, 1:35:00 PM4/12/08

to

John Larkin wrote:

If you want my opinion, I've been down that road using DMA controllers
and trying to keep up with the technology as it changes..
I found that having a self contained Buffer on board (stand Alone) is
the best, using a common link like RJ-45/ Ethernet via a TCP to access
the information from the buffers a much better approach! that is, if you
time critical isn't a problem to synchronize something?
Using a 1GB or even the 100 Mb links should yield good results..
With that, You can support any platform that has a Ethernet interface.

if you want to go the route of linux or freeBSD, you can compile those
as embedded operations in which case the full attention of the system
can stay with your PCI card how ever, I would still use your own On
board cache. The better sound cards for example do a min of 512 bytes
and up on their own onboard cache before transporting that to the system
because other hardware activities can stall the board.

That is just my opinion of course.

http://webpages.charter.net/jamie_5"

rickman

unread,

Apr 12, 2008, 1:34:26 PM4/12/08

to

On Apr 12, 12:38 pm, John Larkin

<jjlar...@highNOTlandTHIStechnologyPART.com> wrote:
> Hi,
>
> I'm working on a proposal to design a box that will control a
> scientific gadget. Our box will output frequency sweeps, arbitrary
> waveforms, a couple of dozen voltages that can be changed/ramped per
> user desires, and some discrete logic levels and triggers.
>
> One architecture would pack an Intel-cpu SBC and a custom board in a
> 2U rack box. The SBC would talk gigabit ethernet to the customer's
> system and PCI to our board.
>
> Something like this, maybe:
>
> http://us.kontron.com/index.php?id=226&cat=527&productid=1726
>
> Our board would have a PCI interface driving a biggish FIFO, say 8k
> deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux
> thing repacks the 32-bit pci transfers into the input of the 48-bit
> wide fifo. The output side of the FIFO would be driving a fairly
> simple state machine; each fifo word has an opcode field and a data
> field, with different opcodes feeding various devices connected to the
> physics... dds synthesizers, ttl outputs, whatever. The state machine
> that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so
> we can slow down operations to match the realtime needs of the
> experiment and reduce the average fifo feed rate.

> 5. Other ideas?

I don't see the real need to tightly couple the FIFO to your Intel
computer. Why not give the FPGA its own memory, either DRAM or Flash
with some "offline" interface to load the entire experiment into it?
You can still put your board in the 2U box with the Intel computer,
but give it a USB port or something similar which wouldn't be too hard
to program up via an MCU (making the MCU and DRAM look like a storage
device should be a good solution). Or if you have any reason to
separate your board and the Intel computer, you could have the data
transferred via an SD card or similar.

Certainly a PCI interface is not rocket science, but talking to your
board in real time is likely a bit more work than other solutions. I
did the FPGA design for a similar project to record and playback high
speed data. We used a DRAM buffer on the card and had to stream data
to/from a RAID hard drive. The software *was* the long pole in the
tent, even using a real time version of NT. So speaking from
experience, you might be better off giving the FPGA plenty of RAM/
Flash and a less direct connection to the Intel computer.

Are your experiments likely to need more than a few gig of
instructions?

Fred Bloggs

unread,

Apr 12, 2008, 2:12:56 PM4/12/08

to

You're in effect putting a microprogrammed CPU in that FPGA so what's
the big deal with it fetching the new FIFO data for itself from an
external store, and you can design the architecture for that any way you
want, within the constraints of the OS that is, some kind of dual port
job. Seems to be better than the micromanaging you propose...

Didi

unread,

Apr 12, 2008, 2:59:13 PM4/12/08

to

John Larkin wrote:
> ...
>... Our box will output frequency sweeps, arbitrary

> waveforms, a couple of dozen voltages that can be changed/ramped per
> user desires, and some discrete logic levels and triggers.
>
> One architecture would pack an Intel-cpu SBC and a custom board in a
> 2U rack box. The SBC would talk gigabit ethernet to the customer's
> system and PCI to our board.
>

> ...
> ...
>
> 5. Other ideas?
>

Since the echo you are getting so far indicates the latency using a
x86
may well be too high under linux or whatever, I can suggest doing some
tiny
DPS thing for you - with PCI and Ethernet. The latency then is no
issue,
tcp/ip etc. comes with it, filesystem/disk etc. If you settle for 100
MbpS Ethernet,
it is quite easy for me - I can reuse some of the MPC5200 designs I
have.
1 GbpS it will take some other part and more than 3-4 months, though.
I am not sure I can beat the cost&time of someone writing the thing
for you
under linux while I do the whole thing, but I am willing to try hard
to do so,
the time has come when I want to make all that stuff I have more
popular than it is now.

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/3b61a5c2043b8477?dmode=source

John Larkin

unread,

Apr 12, 2008, 3:21:08 PM4/12/08

to

On Sat, 12 Apr 2008 14:12:56 -0400, Fred Bloggs <nos...@nospam.com>
wrote:

No, the fpga state machines, the ones that load and unload the fifo,
will be very simple. Ditto the dds synthsizers (which we have already,
from another product) and dac drivers.

>so what's
>the big deal with it fetching the new FIFO data for itself from an
>external store, and you can design the architecture for that any way you
>want, within the constraints of the OS that is, some kind of dual port
>job. Seems to be better than the micromanaging you propose...

Both using local-to-the-FPGA sequence store (which would have to be
wide DRAM... lots of pins) or having our board bus-master and suck
data out of the CPU ram, are a lot more parts and engineering. And
we'd still need the PCI interface to load the local ram.

The SBC already has a gig of dram, all the refresh stuff, and the DMA
controllers. And two CPUs. Given that we don't expect very high
average transfer rates, probably around a megabyte per second (we need
to understand the physics better and quantify that better) the
architecture I suggested is probably the simplest. The physics pretty
much limits how fast the experiment can ever go, so if we have
reasonable margin we'd be good forever.

We could buy an FPGA pci soft core (or use one of the public ones) or
even just use a PLX chip to handshake the PCI transactions for the
fpga. We're a small company, and can't put a massive project team on
this, so we prefer a simple architecture that has a high probability
of working soon.

John

Paul Hovnanian P.E.

unread,

Apr 12, 2008, 3:35:06 PM4/12/08

to

Having the board generate an interrupt and initiate a DMA transfer seems
more elegant than depending on a polled design.

> 3. Best, if possible: set up a single DMA transfer to do the entire
> shot. That involves a dma controller that understands that the target
> is sometimes busy, and retries after getting bounced. I know the pci
> bus has hooks for split transfers, but I don't know if standard
> Intel-type dma controllers can work in this mode.

There are dedicated PCI controllers that have all the smarts needed to
manage DMA transfers. All you'd need is some handshaking between your
FPGA and a PCI controller chip.

Its been a while since I played with any of this so I don't have part
numbers handy. But it wasn't rocket science 5 years ago.

The other way to do this is to put the PCI controller function (DMA
management, etc.) in your FPGA. Google for 'open cores', 'open source
hardware', etc. and 'PCI controller'. If there's enough room in your
FPGA, it would simplify the board design to do it this way.

--
Paul Hovnanian pa...@hovnanian.com
-----------------------------------------------------------------------
Procrastinators: The leaders for tomorrow.

rickman

unread,

Apr 12, 2008, 5:53:50 PM4/12/08

to

On Apr 12, 3:21 pm, John Larkin

I would again suggest that you can *simplify* the project by putting
RAM on your board instead of a *real time* PCI interface.

You can add a GB of DRAM to your board that with just a very few
chips, I haven't looked at the available RAM chips lately but I
believe they are beyond the Gbit level. Or you can use a module and
plug in what ever size you need. You can lose the PCI interface by
using something serial which can be done with a single MCU chip.
Check out the Luminary Micro parts with Ethernet including the PHY.
PCI is not really all that fast and getting any real speed out of it
will take a fair amount of programming effort.

The simple MCU receives the data over USB or Ethernet and fills the
RAM through an interface in the FPGA. The FPGA reads the RAM as
needed for playback. Your post says you *don't* need high average
transfer rates to the FIFO, so you wouldn't need high transfer rates
from the RAM. So you could use an 8 bit memory interface and clock it
slowly if you wanted.

This is all very simple stuff to implement and get working. DRAMs
handle their own refresh if you just give them a strobe periodically.
Your app reads consecutively, so if it reads fast enough, you don't
even need refresh. If it is not reading fast enough you have plenty
of time to do a refresh.

I have worked with SDRAM in exactly this application before. If you
are interested in doing this, I can offer help with the FPGA and RAM
design.

> We could buy an FPGA pci soft core (or use one of the public ones) or
> even just use a PLX chip to handshake the PCI transactions for the
> fpga. We're a small company, and can't put a massive project team on
> this, so we prefer a simple architecture that has a high probability
> of working soon.

To think of this as a "massive" effort seems like an overestimate.
Like I said in my other post, I worked on a design very similar to
this and the hard part was in the software. To make the durn computer
run anything remotely like real time took a lot of effort. Your
original post did not mention a real time version of Linux which would
almost certainly be required.

Maybe your team is better at the software than hardware. But
typically a well defined and constrained hardware design effort goes
much better than a software effort that requires the cooperation of a
large, complex operating system.

Hal Murray

unread,

Apr 12, 2008, 6:11:00 PM4/12/08

to

>One architecture would pack an Intel-cpu SBC and a custom board in a
>2U rack box. The SBC would talk gigabit ethernet to the customer's
>system and PCI to our board.

2U is ugly in that you can't get full height PCI cards without
using a riser kludge to turn the card sideways.

I think PCI cards fit in 3U. There is a short size that fits in 2U.
(or you can cross your fingers on the riser stuff.)

>Our board would have a PCI interface driving a biggish FIFO, say 8k
>deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux
>thing repacks the 32-bit pci transfers into the input of the 48-bit
>wide fifo. The output side of the FIFO would be driving a fairly
>simple state machine; each fifo word has an opcode field and a data
>field, with different opcodes feeding various devices connected to the
>physics... dds synthesizers, ttl outputs, whatever. The state machine
>that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so
>we can slow down operations to match the realtime needs of the
>experiment and reduce the average fifo feed rate.

>OK, we finally get to a question: If we run some flavor of Linux on
>the SBC, what's a good strategy for keeping the fifo loaded? Assuming
>that we have the recipe for an entire experimental shot in program
>ram, some tens of megabytes maybe, we could...

>3. Best, if possible: set up a single DMA transfer to do the entire
>shot. That involves a dma controller that understands that the target
>is sometimes busy, and retries after getting bounced. I know the pci
>bus has hooks for split transfers, but I don't know if standard
>Intel-type dma controllers can work in this mode.

I think that's what you want to do. It comes for free. I think
it will all make sense if you read the PCI specs. Or maybe
just the specs for the PCI interface block you are going to use.

Ignoring pipeline problems, the host side of a DMA read request
doesn't know how how much data the device wants. It just gets
an op-code that says read or read-cache-line. Once data
starts flowing, either side can say I'm-done-now. If the device
(still) wants more data, it starts over with bus arbitration.
The host may say "done" to let another device have a turn
or to cross a page boundary or ...

The DMA section of the FPGA will run in chatter mode. When
there is room in the FIFO for another cache block, it will
ask for more data. When the FIFO is near-full, it will stop
asking. You have to leave enough room in the FIFO to hold all
the data in the pipeline.

One quirk. The driver has to allocate a chunk of physically
contigious memory. That probably has to happen early in the
boot-up time so you still have a chunk of contigious memory
to grab.

--
These are my opinions, not necessarily my employer's. I hate spam.

Hal Murray

unread,

Apr 12, 2008, 8:46:00 PM4/12/08

to

>I would again suggest that you can *simplify* the project by putting
>RAM on your board instead of a *real time* PCI interface.

But he doesn't need any real time software. All he needs
is a big buffer in memory. A DMA engine on the card will
grab data from memory whenever the FIFO has room.

>You can add a GB of DRAM to your board that with just a very few
>chips, I haven't looked at the available RAM chips lately but I
>believe they are beyond the Gbit level. Or you can use a module and
>plug in what ever size you need. You can lose the PCI interface by
>using something serial which can be done with a single MCU chip.
>Check out the Luminary Micro parts with Ethernet including the PHY.
>PCI is not really all that fast and getting any real speed out of it
>will take a fair amount of programming effort.

Yes, a PC is overkill for just grabbing the data and buffering it
for the FPGA. There may well be better overall designs that don't
use a PC or PCI.

On the other hand, the PCI part of the board design is only ~40 or
~70 wires. I think the PCI logic is roughly as complicated as the
DRAM interface logic. (handwave)

John Larkin

unread,

Apr 12, 2008, 10:00:40 PM4/12/08

to

We could put a powerQuicc or a Blackfin on the board. But then we'd
need dram for the sequence storage, or we'd have to interface the
cpu's ram to the fpga some fast way, and we'd have to do the gigabit
ethernet and the tcp/ip stack and all that. We can get that stuff,
already done, with a 2 GHz dual-core CPU, for under $400.

The sbc has a lot of stuff already done. It will run Linux the day we
open the box.

John

rickman

unread,

Apr 12, 2008, 10:00:50 PM4/12/08

to

On Apr 12, 8:46 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal

For me the issue is not the complexity of the hardware because I think
that is in the noise for this project. The issue is the complexity of
interfacing an FPGA to a bank of memory and to an MCU which has either
USB or Ethernet connectivity compared to the complexity of interfacing
an FPGA to a PCI bus and developing the software to support whatever
transactions will be happening over the PCI bus. I guess if you have
designed PCI bus DMA hardware and software before, then this is not a
real issue. The experience I had was that the hardware for the FPGA
and DRAM was done and working 100% on schedule. The software had
significant complications and was the limiting factor in the project
schedule.

When you say that you don't need "real time software", I am missing
something. Once started, does the DMA run to completion by itself?
Maybe I am not up to speed with current software techniques on the PC,
but I thought even DMA required real time response to keep it queued
up and running. As far as allocating a block of memory to buffer the
data, I have no understanding of what it takes to allocate a buffer of
half a GB or more of contiguous memory. But like I said, I am not so
familiar with this approach.

I am, however, familiar with memory interfaces. They are well
specified in maybe a dozen pages vs. the hundreds of pages for the PCI
bus and the virtually unlimited amount of documentation (or lack
thereof) for the operating system and writing drivers for DMA.

To me the issue is that even if the DRAM hardware is about the same
complexity as the PCI bus hardware, it just seems like everything else
is a lot less complex by offloading the memory buffer onto the board.
The hard parts of this project are the real time issues. I just seems
so much simpler to keep all of the real time aspects on the board *in
100% controllable hardware* and AWAY from the Intel CPU, the shared
PCI bus, DMA controllers and some rather arcane software.

Hal Murray

unread,

Apr 12, 2008, 11:24:20 PM4/12/08

to

>When you say that you don't need "real time software", I am missing
>something. Once started, does the DMA run to completion by itself?
>Maybe I am not up to speed with current software techniques on the PC,
>but I thought even DMA required real time response to keep it queued
>up and running. As far as allocating a block of memory to buffer the
>data, I have no understanding of what it takes to allocate a buffer of
>half a GB or more of contiguous memory. But like I said, I am not so
>familiar with this approach.

The basic idea is that you give the FPGA a pointer and length.
It reads memory a cache block at a time as it needs it. When
it's done, it sets a status bit and maybe generates an interrupt.

The only thing that's different with this design and a typical
disk or network transfer is that this one will be much larger.

You might have to give it a clump of pointer/length pairs,
either stored in memory or on chip.

You could give it each piece of the clump one at a time,
but that gets you into the time constraints.

I haven't actually written the code (driver or FPGA) to do this.
I've worked on projects that did similar things.

It's possible I'm overlooking something critical. Maybe allocating
huge (as compared to big) chunks of memory is hard. I'm sure
a good kernel wizard can do it one way or the other. If nothing
else, you hack the very early part of the kernel to put some memory
in it's back pocket until you ask for it. Ugly, but effective.

Didi

unread,

Apr 12, 2008, 11:36:32 PM4/12/08

to

John Larkin wrote:
> ....

> We could put a powerQuicc or a Blackfin on the board. But then we'd
> need dram for the sequence storage, or we'd have to interface the
> cpu's ram to the fpga some fast way, and we'd have to do the gigabit
> ethernet and the tcp/ip stack and all that. We can get that stuff,
> already done, with a 2 GHz dual-core CPU, for under $400.
>
> The sbc has a lot of stuff already done. It will run Linux the day we
> open the box.
>

I know PCs are cheap. But you are after something more - which will
take some programming and latency spec meeting. I have no idea how
viable the thing is and how much time & cash it will cost you.
If I were to do it on a 5200 or other similar part I would likely do
the
actual Internet --> FIFO thing within a few days; having DPS run on
the
particular platform is hard to be predicted, if a 5200 is used a week
to a month, I would say.

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/9a636587b48980f3?dmode=source

Jure Newsgroups

unread,

Apr 13, 2008, 3:48:39 PM4/13/08

to

"Hal Murray" <hal-u...@ip-64-139-1-69.sjc.megapath.net> wrote in message
news:KNOdnTzMponprpzV...@megapath.net...

would this product work ?

http://www.strategic-test.com/ultrafast2_pci-x_cards/uf2-7000.htm

I'll check in my files at work for other possible product I have seen
before.

Thanks, Jure Z.

Nico Coesel

unread,

Apr 13, 2008, 4:54:44 PM4/13/08

to

John Larkin <jjla...@highNOTlandTHIStechnologyPART.com> wrote:

With PCI there is no DMA like DMA used to be like. A lot of people get
confused here. PCI is about pushing data into memory area's in a fast
way. The idea behind PCI is that you setup a transfer from one memory
area to another and be told when the transfer is ready.

>4. If it's a dual-core cpu, is it hard (under Linux) to assign one cpu
>to just do the fifo transfers?
>
>5. Other ideas?

Yes. Make the card a PCI master. You can prepare a buffer, lock the
buffer for PCI access, tell the card where to fetch the data and off
it goes. An interrupt when the buffer is nearly done so the driver can
prepare a new buffer is what it takes to feed the next buffer into the
card.

PCI is designed to do burst transfers. If you don't use burst
transfers, then the bandwidth will decrease dramatically, worst, the
CPU will have to wait for each transfer to finish which consumes huge
amounts of CPU cycles.

--
Programmeren in Almere?
E-mail naar nico@nctdevpuntnl (punt=.)

Joel Koltner

unread,

Apr 13, 2008, 7:25:55 PM4/13/08

to

John,

"John Larkin" <jjla...@highNOTlandTHIStechnologyPART.com> wrote in message
news:6t1204lakho2cj19k...@4ax.com...

> We could buy an FPGA pci soft core (or use one of the public ones) or
> even just use a PLX chip to handshake the PCI transactions for the
> fpga.

FYI, I've used the old PLX9054 (before PCI Express took over the word), and it
was a *very* nice chip. The board was, essentially, a frame grabber with 4GB
of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so
that a camera looking at multiple logical "windows" could have each window
appear as a contiguous stream of pixels) which fed the DMA engine in the
PLX9054. From the end-user's perspective then, what would happen would be:

1) User would request a particular frame buffer, that would already have been
set up such that on the "local bus" (the address/data bus connecting the
PLX9054 and the FPGA) sequential addresses would grab the correct pixels. The
user would want that frame buffer transferred into a contiguous buffer in
their own user-mode memory space.
2) The device driver for the frame grabber would ask Windows for all the
*physical* addresses of that user's frame buffer, since of course in many
cases Window had run off and used a large number of discontinuous physical
memory (pages) to create the user's (virtual) contiguous buffer.
3) For the benefit of the PLX9054, the device driver builds a "scatter-gather"
list in the PC's memory, where each list entry just contains information such
as the number of bytes to transfer, the physical address to transfer to, the
local bus address to transfer from, and whether or not this is the last entry
in the list.
4) The device driver writes to the appropriate control registers in the
PLX9054... and it does the rest! Poof! (An interrupt was generated when it
finished.)

In other words, the PLX9054 would start walking through the scatter-gather
list, automatically creating read requests on the local bus and write requests
on the PCI bus as needed, keeping its own internal FIFOs full (it had some
modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking
the write requests into multiple pieces as needed to keep the PCI bus protocol
happy. On quality motherboards, we got ~80Mbps, which was considered pretty
decent given the 33MHz/32 bit PCI bus architecture of the day.

It was really pretty impressive. The only caveat was that it couldn't
transfer more than 16MB or thereabouts in one complete setup, so in software
we just broke apart any larger transfers into multiple 16MB transfers (since
transferring 16MB took about 200ms anyway, the additional overhead of some us
setting up the next transfer was negligible).

I imagine the sequence of steps above is quite similar in Linux. Although
I've never written a Linux device driver, I've been told that they're actually
simpler in many ways that Windows device drivers are. If you end up using
Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy
who's going to write the device driver to the week-long classes by, e.g., OSR
to learn how to do so.

My main point here is that going with a chip such as those from PLX gives you
one heck of a lot of power that would otherwise take a LOT of time and effort
to implement yourself. Although for a high-volume project it probably makes
sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big
believer in using someone else's "all in one" IC.

---Joel

Hal Murray

unread,

Apr 13, 2008, 7:41:24 PM4/13/08

to

>One architecture would pack an Intel-cpu SBC and a custom board in a
>2U rack box. The SBC would talk gigabit ethernet to the customer's
>system and PCI to our board.
>
>Something like this, maybe:
>
>http://us.kontron.com/index.php?id=226&cat=527&productid=1726

Several odds and ends...

There are several Linux distributions targeted at running
without a hard disk. That avoids the heat, space, and
the unreliability of a hard disk.

Here is one. There are others.
http://www.linuxonastick.com/
Almost everything gets copied to ram at boot time.
/etc is still on disk. Maybe a few others.
If you want files preserved over booting you have
to think about it.

There are Flash disk modules that plug into 40/44 pin IDE sockets.
(no ribbon cables) Works well with above. The 40 pin versions
need power, typically from an IDE connector.
Google for >disk on module<.

Modern FPGAs don't get along with 5V PCI. You can save yourself
a pile of kludgery if your target is 3V PCI.
I think 66 MHz PCI is 3V. The board above is 5V.

If your box has room for an old/big CD (rather than the modern
thin ones), you can get LCD modules that will fit in that slot.

That lets you display the MAC address (for use with BOOTP) or
key in an IP Address to get your box off the ground. After
that you can use ssh/web or whatever. No keyboard or display
required at all. (They might be handy for debugging, but ssh
generally works fine for me.)

John Larkin

unread,

Apr 13, 2008, 10:45:37 PM4/13/08

to

Yup, I'm leaning towards using a PLX chip as the PCI interface. I
didn't know they were that smart!

I suspect we can persuade Linux and our application to make the shot
program (the opcodes we poke into the fpga FIFO) physically contiguous
in real memory.

Thanks

John

krw

unread,

Apr 13, 2008, 10:58:11 PM4/13/08

to

In article <r4h5045s5dlicuohi...@4ax.com>,
jjla...@highNOTlandTHIStechnologyPART.com says...

They'll save you a TON of work. PCI isn't easy, though PLX makes it
(relatively) easy. I also highly recommend the MindShare books as
reference.

> I suspect we can persuade Linux and our application to make the shot
> program (the opcodes we poke into the fpga FIFO) physically contiguous
> in real memory.

--
Keith

Nial Stewart

unread,

Apr 14, 2008, 6:06:31 AM4/14/08

to

John,

I might have missed something, but if you need a FIFO size of 8K x 48Bits
you'll get that into most reasonable sixed FPGAs these days (Cyclones or
Spartans). Otherwise hang a DRAM off it as rickman says.

If your data rate to the FPGA is only 1MByte/second you could almost get
away with one of the FTDI USB 1.1 interfaces (FT245R gives an 8 bit
fifo output).

If this isn't sufficient there are a couple of easy to implement microcontrollers
about that give you a FIFO interface at USB 2.0 full speed data rates.

The host PC would need to do more of the work but if this is acceptable then
an embedded SBC seems like complete overkill?

Nial.

Didi

unread,

Apr 14, 2008, 8:10:52 AM4/14/08

to

Nial Stewart wrote:
> ...

> If your data rate to the FPGA is only 1MByte/second you could almost get
> away with one of the FTDI USB 1.1 interfaces (FT245R gives an 8 bit
> fifo output).

If the data rate is only a 1-2 Mbytes/S it can be done in a single
MPC5200,
with some (64M, 128M?) DDRAM and a flash chip. The FIFOs are on chip,
one of the serial ports can be set to do 16 or 32 MbpS; it will only
have to be
deserialized - some CPLD or whatever (6 x 74hct164....?). The entire
BOM
would be $50 to $100 for prototype quantities. With DPS running on it,
net access over the 100 Mbps Ethernet port is there. Want a HDD,
connect
one to the 5200 ATA port.
But this is me, the rest of the world seems to consider designs which
do not rely on some intel/linux/MS monstrosity illegal nowadays.

Dimiter

P.S. a similar (doing apr. 1.8 Mbytes/S thing) is at
http://tgi-sci.com/y2demo/

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/f06c1735c181b6cb?dmode=source

Joel Koltner

unread,

Apr 14, 2008, 12:20:55 PM4/14/08

to

"Didi" <d...@tgi-sci.com> wrote in message
news:e7faffe3-e733-4033...@q10g2000prf.googlegroups.com...

> But this is me, the rest of the world seems to consider designs which
> do not rely on some intel/linux/MS monstrosity illegal nowadays.

It's just that finding people to code on those platforms is easier and -- if
you need the complexity of a GUI or full TCP/IP stack or USB host anyway,
there's a huge amount of code, completely free, that you can leverage if you
just "tow the company line" and use one of those "monostrosities." A couple
bucks extra in hardware is a lot easier to get funded than a few more
programmers and another year of development time, after all (especially when
finding GOOD programmers is somewhat tricky these days!)

I bet plenty of people ask why your software doesn't "look" like standard
Windows, don't they?

Didi

unread,

Apr 14, 2008, 1:07:51 PM4/14/08

to

Joel Koltner wrote:
> "Didi" <d...@tgi-sci.com> wrote in message

> ...
> > But this is me, the rest of the world seems to consider designs which
> > do not rely on some intel/linux/MS monstrosity illegal nowadays.
>
> It's just that finding people to code on those platforms is easier and -- if
> you need the complexity of a GUI or full TCP/IP stack

> ....

you get that under DPS as well - with a lot of functionality as a
bonus and
with a few uS worstcase IRQ latency (try *that* on the wintel things).
More, it fits it in a fraction of the flash space the wintelinux
alternatives
take.

> I bet plenty of people ask why your software doesn't "look" like standard
> Windows, don't they?

Actually they typically think this is windows runing underneath on the
first
sight...

As for programmers interfacing to this or that system the overhead is
negligible if you will use DPS. For this particular application -
moving
data from a tcp connection into some fifoed hardware - the job to do
on behalf of DPS can be explained in a message shorther than this one.

So it is all about plain human nature - practiclly all people I know
are
too scared to make a step aside of the known path, it takes some
threat to their existance to venture that. It does not matter how
tempting
those nice things under that tree look, danger may be lurking around
if
the place is so desolated.... :-).

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/0a16cc12dffb1ba5?dmode=source

Joel Koltner

unread,

Apr 14, 2008, 1:24:27 PM4/14/08

to

"Didi" <d...@tgi-sci.com> wrote in message

news:fd9dc69f-eb74-45d8...@p25g2000pri.googlegroups.com...

> you get that under DPS as well - with a lot of functionality as a
> bonus and
> with a few uS worstcase IRQ latency (try *that* on the wintel things).

Sure, it's just that for anyone with DPS experience (where's their web site,
btw?) you'll probably find at least 100 people with Windows or Linux
experience.

> Actually they typically think this is windows runing underneath on the
> first
> sight...

My first guess would have been DOS with an extender. :-)

> So it is all about plain human nature - practiclly all people I know
> are
> too scared to make a step aside of the known path, it takes some
> threat to their existance to venture that.

With most people, yes, this is absolutely true. Businesses as an entity are
the same way -- they'll keep using crappy tools that are well-known to them
because the thought of changing to something else just looks too risky... and
no one wants to be the fall guy if it turns out their evaluation of the
alternatives were wrong.

---Joel

Didi

unread,

Apr 14, 2008, 2:44:35 PM4/14/08

to

Joel Koltner wrote:
> ...
> > you get that under DPS as well - with a lot of functionality as a
> > bonus and
> > with a few uS worstcase IRQ latency (try *that* on the wintel things).
>
> Sure, it's just that for anyone with DPS experience (where's their web site,
> btw?) you'll probably find at least 100 people with Windows or Linux
> experience.

DPS is less popular than that. It is all my work, my property etc.
Comes
on my products. I have been wanting to put a cheap product with it on
the market but I just got a setback and may be unable to do it this
year
yet again....

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/6911f3aa16da2c35?dmode=source

Joel Koltner

unread,

Apr 14, 2008, 3:18:12 PM4/14/08

to

Hi Didi,

"Didi" <d...@tgi-sci.com> wrote in message

news:018c2551-78ca-4749...@1g2000prf.googlegroups.com...

> DPS is less popular than that. It is all my work, my property etc.

You wrote the TCP/IP stack from scratch? That's a lot of effort! Good work.

> Comes
> on my products. I have been wanting to put a cheap product with it on
> the market but I just got a setback and may be unable to do it this
> year
> yet again....

I hope you'll be able to do it eventually; I'm sure many people would be
interested.

---Joel

Didi

unread,

Apr 14, 2008, 3:37:53 PM4/14/08

to

Joel Koltner wrote:
> Hi Didi,
>
> ....

> > DPS is less popular than that. It is all my work, my property etc.
>
> You wrote the TCP/IP stack from scratch? That's a lot of effort! Good work.

Thanks. It took me about 6 months to do (including the DNS and some
high
level clients), and is not such a great part of DPS (about 10%).

>
> I hope you'll be able to do it eventually; I'm sure many people would be
> interested.

So do I hope (hope, that is)! ... :-). I will eventually, it won't
take that much - I
know I'll do it, as for those interested I wish I knew a bit
more.... :-).

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/4ccece298795c6b8?dmode=source

Robert Adsett

unread,

Apr 14, 2008, 9:07:42 PM4/14/08

to

In article <fd9dc69f-eb74-45d8-820e-
5d75b5...@p25g2000pri.googlegroups.com>, Didi says...

> It does not matter how tempting
> those nice things under that tree look, danger may be lurking around
> if the place is so desolated.... :-).

If the place is so desoloated it is dangerous. Such places are away
from support. Also there is extra danger on first approach since the
hazards are unfamiliar.

I rather suspect many in this group prefer the hazards and beauties of
the new, unfamiliar (and desolate) to the dangers of the crowds. If
nothing else for the thrill of treading less travelled paths.

Is the metaphor broken yet :)

Robert
** Posted from http://www.teranews.com **

James Waldby

unread,

Apr 15, 2008, 12:05:17 AM4/15/08

to

On Sat, 12 Apr 2008 09:38:30 -0700, John Larkin wrote:
[re "box that will control a scientific gadget" possibly using "an
Intel-cpu SBC and a custom board" where "SBC would talk gigabit
ethernet to the customer's system and PCI to our board." with SBC
like http://us.kontron.com/index.php?id=226&cat=527&productid=1726

> Our board would have a PCI interface driving a biggish FIFO, say 8k deep

> by 48 bits wide, inside an FPGA. [...]

> OK, we finally get to a question: If we run some flavor of Linux on the
> SBC, what's a good strategy for keeping the fifo loaded? Assuming that
> we have the recipe for an entire experimental shot in program ram, some
> tens of megabytes maybe, we could...

...

> 3. Best, if possible: set up a single DMA transfer to do the entire
> shot. That involves a dma controller that understands that the target is

> sometimes busy [...]

If the linux kernel is given a "mem=n" parameter at boot time, it will use
only n bytes of memory, which leaves the balance of memory to be contiguously
allocated later with "dmabuf = ioremap(...)" (see "Allocating the DMA Buffer"
and "Do-it-yourself allocation" in Chap. 13 of Linux Device Drivers, by
Rubini and Corbet; eg http://www.xml.com/ldd/chapter/book/ch13.html .)

If high memory isn't usable by the DMA controller, you could build a
kernel with your device driver using preallocated fixed buffers,
rather than loading the driver as a module.

User code can access the buffer as a memory-mapped file; see
eg http://www.scs.ch/~frey/linux/memorymap.html for background, and see
eg http://linux.die.net/man/2/mmap for notes on some flags that can lock
the mapped pages in memory, make them private, map to fixed location, etc.

> 4. If it's a dual-core cpu, is it hard (under Linux) to assign one cpu
> to just do the fifo transfers?

Root can use cpusets, described at http://lwn.net/Articles/127936/
to allocate cpu's and memory nodes, or can use system calls as
described in http://linux.die.net/man/2/sched_setaffinity to
control which cpu's given processes will run on.

rickman

unread,

Apr 20, 2008, 11:14:26 PM4/20/08

to

On Apr 12, 11:24 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal

Murray) wrote:
> >When you say that you don't need "real time software", I am missing
> >something. Once started, does the DMA run to completion by itself?
> >Maybe I am not up to speed with current software techniques on the PC,
> >but I thought even DMA required real time response to keep it queued
> >up and running. As far as allocating a block of memory to buffer the
> >data, I have no understanding of what it takes to allocate a buffer of
> >half a GB or more of contiguous memory. But like I said, I am not so
> >familiar with this approach.
>
> The basic idea is that you give the FPGA a pointer and length.
> It reads memory a cache block at a time as it needs it. When
> it's done, it sets a status bit and maybe generates an interrupt.

I understand perfectly how DMA works and all the permutations. But
you are glossing over all of the difficulty of getting physical
addresses from an OS and getting the software to let the hardware use
it.

> It's possible I'm overlooking something critical. Maybe allocating
> huge (as compared to big) chunks of memory is hard. I'm sure
> a good kernel wizard can do it one way or the other. If nothing
> else, you hack the very early part of the kernel to put some memory
> in it's back pocket until you ask for it. Ugly, but effective.

That is the sort of stuff that I would not want to deal with. I am
much more comfortable with offloading the entire real time function
onto the dedicated hardware and keeping the PC out of the time
critical loop. When it is so easy to add a little bit of RAM to the
board, I don't get why anyone would want to have to make a complex OS
run a real time task???

But like I said before, we are all comfortable with different
things.

Didi

unread,

Apr 21, 2008, 4:56:13 AM4/21/08

to

rickman wrote:
> .....

> > It's possible I'm overlooking something critical. Maybe allocating
> > huge (as compared to big) chunks of memory is hard. I'm sure
> > a good kernel wizard can do it one way or the other. If nothing
> > else, you hack the very early part of the kernel to put some memory
> > in it's back pocket until you ask for it. Ugly, but effective.
>
> That is the sort of stuff that I would not want to deal with. I am
> much more comfortable with offloading the entire real time function
> onto the dedicated hardware and keeping the PC out of the time
> critical loop. When it is so easy to add a little bit of RAM to the
> board, I don't get why anyone would want to have to make a complex OS
> run a real time task???

This is a sane reasoning I have heard more than once while discussing
different projects (sometimes hypothetical). At this point my question
usually is "so what is the bloody PC for in the design then?" . The
usual
answer is something like "well but they are so cheap and so
popular..."...

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/e6afa044f684b6e1?dmode=source

rickman

unread,

Apr 21, 2008, 9:22:56 AM4/21/08

to

On Apr 21, 4:56 am, Didi <d...@tgi-sci.com> wrote:
> rickman wrote:
> > .....
> > > It's possible I'm overlooking something critical. Maybe allocating
> > > huge (as compared to big) chunks of memory is hard. I'm sure
> > > a good kernel wizard can do it one way or the other. If nothing
> > > else, you hack the very early part of the kernel to put some memory
> > > in it's back pocket until you ask for it. Ugly, but effective.
>
> > That is the sort of stuff that I would not want to deal with. I am
> > much more comfortable with offloading the entire real time function
> > onto the dedicated hardware and keeping the PC out of the time
> > critical loop. When it is so easy to add a little bit of RAM to the
> > board, I don't get why anyone would want to have to make a complex OS
> > run a real time task???
>
> This is a sane reasoning I have heard more than once while discussing
> different projects (sometimes hypothetical). At this point my question
> usually is "so what is the bloody PC for in the design then?" . The
> usual
> answer is something like "well but they are so cheap and so
> popular..."...

The PC is there as the UI. If the job is using standard interfaces,
then it could be worth a bit of software work to avoid having to build
hardware. But this project is building a special board with an FPGA!
Heck, unless they have highly specialized interfaces to their project
they want to control, they could even do the whole thing on a
standard, COTS evaluation board if it has enough memory! There are
tons of boards available which include high speed interfaces like
Ethernet and USB and will support this project as-is other than a
specialized interface on the project end.

Then you can use the PC as a UI and don't even need to build
hardware!