Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Handling interrupts on PCI-based slave-mode DMA's.

57 views
Skip to first unread message

Clifford van Dyk

unread,
May 16, 2003, 6:44:50 AM5/16/03
to
Hi

I am setting up a DMA using the system processor's DMA engine (as
opposed the the adapter card's bus master DMA engine). Please could
someone describe the recommended way to attach ISR's to the interrupts
which are associated with the system DMA engine.

Thanks,
Clifford

Phil Barila

unread,
May 16, 2003, 12:50:24 PM5/16/03
to
"Clifford van Dyk" <cva...@netscape.net> wrote in message
news:3ec4c12a$0$2...@hades.is.co.za...

Why? The system DMA is only for ISA, it isn't supposed to work on PCI at
all. Additionally, it's glacially slow, as it's designed to manage DMA for
floppy drives and serial ports and other such slow devices.

The short answer is: Don't even try to do what you are doing. If your PCI
card has a BM engine use it. If it doesn't, the CPU will have to do all the
work, but it will still be faster than the system DMA engine.

Phil
--
Philip D. Barila
Seagate Technology, LLC
(720) 684-1842
As if I need to say it: Not speaking for Seagate.
E-mail address is pointed at a domain squatter. Use reply-to instead.


Maxim S. Shatskih

unread,
May 16, 2003, 9:46:03 AM5/16/03
to
System DMA engine is not accessible on PCI, unless you use the thing
called Distributed DMA, which is probably not supported in Windows,
since this is for DOS only.

Also interrupts are not associated with DMA engines in any way, they
are device-specific.

Max

"Clifford van Dyk" <cva...@netscape.net> wrote in message
news:3ec4c12a$0$2...@hades.is.co.za...

Clifford van Dyk

unread,
May 19, 2003, 5:33:54 AM5/19/03
to
Hi

It seems strange that system-mastered DMA's are not supported on PCI,
considering that DMA engines are available on most (if not all) system
controllers/host bridges, and these usually have associated interrupts.

The Windows 2000 DDK seems to provide functions for supporting slave
(system mastered) DMA's through the HAL (as opposed to bus master DMAs,
which use the adapters master DMA engine) including functions for
reading back the current DMA count for "slave DMAs". This implies that a
polled approach should be used, rather than interrupts, but this seems
very inelegant...

Does anyone have a good source of knowledge on this topic that they can
recommend to me?

Regards,
Clifford

Maxim S. Shatskih

unread,
May 19, 2003, 6:49:07 AM5/19/03
to
> It seems strange that system-mastered DMA's are not supported on
PCI,

Nothing strange, there is no DREQ/DACK wires on PCI slot. Also nothing
bad, since PCI has by far better busmaster DMA facility - wider,
faster and free from idiotic legacy limitations.

There is the so-called Distributed DMA which is an emulation of system
DMA on a PCI card, it is implemented in the south (PCI-ISA) bridge
which contains the system (aka ISA) DMA logic. When one of the system
DMA channels is set to D-DMA, the south bridge reflects any writes to
system DMA channel ports as PCI transactions to the card's busmaster
engine. The card must have the necessary descriptor in the config
space for this, the BIOS enumerates all these cards and sets up the
south bridge for D-DMA on startup.

The only users of this technology are some soundcards which emulate
SB16 on hardware level to allow old DOS games (with SB16 hard-coded to
them) to run with this soundcard. Since SB16 is a system DMA device,
emulating it on PCI is a complex task. Some other soundcards used
EMM386 for this (incompatible with some ill-programmed games which use
unreal mode).

Anyway all of these is irrelevant for Windows, since it has the full
power to access the real PCI busmaster engine of the card. Windows
does not need D-DMA.

> controllers/host bridges, and these usually have associated
interrupts.

System DMA controller never had any "associated interrupts". It does
not generate any IRQs. Only the particular hardware (like SB16 or ECP
parallel port) connect to both IRQ wires and system DMA.

> The Windows 2000 DDK seems to provide functions for supporting slave
> (system mastered) DMA's through the HAL

For ISA hardware.

>(as opposed to bus master DMAs,

For PCI hardware.

> reading back the current DMA count for "slave DMAs". This implies
that a
> polled approach should be used, rather than interrupts

This does not implies anything, interrupts can still be used.
HalReadDmaCounter is for things like soundcards, which can use the DMA
byte count as a "master clock". The movie running in Media Player is
running using this clock, and not the OS's clock, since otherwise
un-synchronization between sound and video would occur.

IIRC DirectSound exposes the "DMA clock" of the soundcard for user
apps.

Max


Tom Thornhill

unread,
May 19, 2003, 7:25:56 AM5/19/03
to

--
Tom

q: Can I help you program it?
a: No, you suck. How can I tell? You said 'program'. Someone who might be
able to help would ask, "Can I help you code it?"
from http://www.fastlane.net/~gpwossum/dl/pex_faq.txt

"Clifford van Dyk" <cva...@netscape.net> wrote in message news:3ec8a503$0$2...@hades.is.co.za...


> Hi
>
> It seems strange that system-mastered DMA's are not supported on PCI,
> considering that DMA engines are available on most (if not all) system
> controllers/host bridges, and these usually have associated interrupts.

It's a bus thing - on ISA, the device can assert a DMA request line which
is connected to the ISA DMA controller which does the slave DMA.

On PCI, when a device wants to DMA it requests control of the bus and then
does the DMA itself - it has onboard address counters. The problem with System
DMA on PCI is that (most) PCI devices don't have a way to contact the ISA
DMA controller.

I say most because there is a workaround called Distributed DMA where the
PCI device snoops on accesses to the ISA DMA controller so it can emulate
it . I think some motherboards also have a header with ISA DMA cycles on it
so PCI cards can use them. Both of these are kludges designed to let PCI
cards be register compatible with ISA Soundblasters.

As far as I know, these cards also support normal PCI DMA, so Windows
drivers can use this.

>
> The Windows 2000 DDK seems to provide functions for supporting slave
> (system mastered) DMA's through the HAL (as opposed to bus master DMAs,
> which use the adapters master DMA engine) including functions for
> reading back the current DMA count for "slave DMAs". This implies that a
> polled approach should be used, rather than interrupts, but this seems
> very inelegant...

On ISA, you use an interrupt to tell you when the transfer is done.

I think ReadDmaCounter is for streaming - you allocate a contiguous common buffer
below 16MB, copy some data into it and start the DMA. When the device has transferred
some portion of the buffer it interrupts and you can use ReadDmaCounter to find
out how much data has been sent, so you can copy some more data into the unused
portion. ISA Soundblaster drivers used this - see here.

http://faculty.petra.ac.id/irwankj/ap2/sb16doc.html

>
> Does anyone have a good source of knowledge on this topic that they can
> recommend to me?

I still don't know why you want to use slave DMA on a PCI device.
If at all possible, use PCI native DMA, otherwise you can use the legacy
Hal functions to do it. You'll need to make sure that your PCI device is
actually capable of slave DMA on all the machines you support, which is
non trivial (e.g. will Distributed DMA work behind all PCI to PCI bridges?)

Clifford van Dyk

unread,
May 19, 2003, 8:39:49 AM5/19/03
to
Hi

Thanks Max/Tom. I appreciate the detailed explaination that both of you
have provided.

What I am struggling to understand is why the north bridge, which has a
master/target pci interface, should be incapable of performing a bus
master dma to a target peripheral, with interrupts being generated in
the north bridge. Not all PCI devices have master capability, and it
seems strange that in the case of an adapter card that is a target only,
there is no way of performing a DMA transfer to/from the device. I would
expect that at least at a hardware level, this is supported. Is it then
the windows HAL that does not allow this?

The motivation behind this is that I would like to distinguish between a
burst transaction and a single read/write on the PCI bus from the target
adapter's perspective. A burst transaction to this peripheral implies
DMA from the system controller, and it seems odd that the system
controller is the only PCI master in the system that does not support this.

Please could you comment on this.

Regards,
Clifford

Tom Thornhill wrote:

Clifford van Dyk

unread,
May 19, 2003, 9:49:45 AM5/19/03
to
The term "slave mode" is a bit of a misnomer on my part - I am trying to
implement a bus master DMA where the CPU is the bus master and is
DMA'ing to/from a target peripheral. If this can be achieved without DMA
(some other means of issuing a read/write multiple operation on the PCI
bus from the system host) this would solve my problem.

Thanks.

Regards,
Clifford

Alexander Grigoriev

unread,
May 19, 2003, 10:54:43 AM5/19/03
to
_Any_ access from CPU to the PCI device implies that CPU is a bus master and
the device is a slave. It's called PIO access.

"Clifford van Dyk" <cva...@netscape.net> wrote in message

news:3ec8e0fa$0$2...@hades.is.co.za...

Clifford van Dyk

unread,
May 19, 2003, 11:17:37 AM5/19/03
to
Hi

What I would like to do is efficiently perform the PCI operation of a
read/write multiple (i.e. single address, multiple data cycle) initiated
by the CPU's PCI bus master. Whether it occurs as a PIO access initiated
by the CPU's PCI bus master or as a DMA initiated by the CPU's PCI bus
master is irrelevant, as long as this operation is performed, rather
than multiple single reads/writes. My original assumption (and I have
yet to determine whether it is true or false) was that this could only
be performed using a DMA initiated by the CPU's PCI bus master.

Please advise if you know how this may be accomplished. Thanks.

Regards,
Clifford

Tom Thornhill

unread,
May 19, 2003, 8:40:26 PM5/19/03
to

"Clifford van Dyk" <cva...@netscape.net> wrote in message news:3ec8f592$0$2...@hades.is.co.za...

> Hi
>
> What I would like to do is efficiently perform the PCI operation of a
> read/write multiple (i.e. single address, multiple data cycle) initiated
> by the CPU's PCI bus master. Whether it occurs as a PIO access initiated
> by the CPU's PCI bus master or as a DMA initiated by the CPU's PCI bus
> master is irrelevant, as long as this operation is performed, rather
> than multiple single reads/writes. My original assumption (and I have
> yet to determine whether it is true or false) was that this could only
> be performed using a DMA initiated by the CPU's PCI bus master.

So you have some memory on the device, and you want to do a
PCI burst transfer to read it quickly?

READ_PORT_BUFFER_ULONG should do the trick on a read,
it ends up being a rep movsd on x86, and that's a about as much
of a hint to the chipset that you want a burst transfer as you can give it.

Actually for most PCI devices, this is probably better than bus
master DMA because you don't have any overhead device registers
or letting the HAL set up the transfer.

--
Tom Thornhill
Ridgecrop Consultants Ltd
UK: 07810 572982 / +44 7810 572 982


Clifford van Dyk

unread,
May 20, 2003, 2:36:41 AM5/20/03
to
Thanks, Tom.

I assume then that there is no standard (HAL-based) method for
performing burst transactions from the CPU (south bridge) to targets on
the PCI bus? I am beginning to believe that this is true, although it
seems contradictory that the the PCI bus, which is burst-based, cannot
be used to perform bursting (single address, multiple data accesses) to
target devices. I need to perform a minimum of a 128-bit burst.

Regards,
Clifford.

Michael S

unread,
May 20, 2003, 9:29:24 AM5/20/03
to
"Tom Thornhill" <fakemail...@reply.to.group.com> wrote in message news:<babtho$kom$1$8300...@news.demon.co.uk>...

You probably meant READ_REGISTER_BUFFER_ULONG/WRITE_REGISTER_BUFFER_ULONG.
RtlCopyMemory should do too.

AFAIK, READ_PORT_BUFFER_ULONG ends up being rep inpsd on x86. Since
inpsd doesn't increment source address, it can't produce a PCI burst -
a PCI burst by definition increments both source and destination
addresses. Did I miss something ?

Back to OP question: the PCI design which relies on minimal burst
length >1 for its correctness (as opposed to performance) is broken.
You can assure maximum burst length in hardware (target disconnect) or
in software (access address within the same BAR which is not equal to
previous address + 4). However, in no way can you assure minimum burst
length unless you have full control on both sides of the transaction
_and_ all bridges in-between.

Tom Thornhill

unread,
May 20, 2003, 12:25:40 PM5/20/03
to
> "Tom Thornhill" <fakemail...@reply.to.group.com> wrote in message news:<babtho$kom$1$8300...@news.demon.co.uk>...
> >
> > READ_PORT_BUFFER_ULONG should do the trick on a read,
> > it ends up being a rep movsd on x86, and that's a about as much
> > of a hint to the chipset that you want a burst transfer as you can give it.
> >
>
> You probably meant READ_REGISTER_BUFFER_ULONG/WRITE_REGISTER_BUFFER_ULONG.
> RtlCopyMemory should do too.

Yes I did. Cut'n'paste error.

>
> AFAIK, READ_PORT_BUFFER_ULONG ends up being rep inpsd on x86. Since
> inpsd doesn't increment source address, it can't produce a PCI burst -
> a PCI burst by definition increments both source and destination
> addresses. Did I miss something ?

No, that's a very good point - you can't do burst transfers to IO space
for exactly that reason.

I once worked with a PCI card that aliased one of it's IO ports,the one used
to read data, across 4K of memory in an additional BAR - the driver
would do rep movsd's to read the data if it found the additional BAR.

Interestingly, this was a device designed c.10 years ago, so quite even
then enough PCI chipsets have been able to turn a rep movsd
into a burst to make this addition worthwhile.

>
> Back to OP question: the PCI design which relies on minimal burst
> length >1 for its correctness (as opposed to performance) is broken.

Yeah, that's true too - even if you do everything right, you still might
not get a burst on all systems.

--
Tom Thornhill
http://www.ridgecrop.demon.co.uk/


Tom Thornhill

unread,
May 20, 2003, 1:11:17 PM5/20/03
to
"Clifford van Dyk" <cva...@netscape.net> wrote in message news:3ec9ccfa$0$2...@hades.is.co.za...

> Thanks, Tom.
>
> I assume then that there is no standard (HAL-based) method for
> performing burst transactions from the CPU (south bridge) to targets on
> the PCI bus?
> I am beginning to believe that this is true, although it
> seems contradictory that the the PCI bus, which is burst-based, cannot
> be used to perform bursting (single address, multiple data accesses) to
> target devices.

rep movsd should be OK, but I found this
http://www.pcisig.com/reflector/msg01281.html

Which seems to imply that rep movsd doesn't cause a burst on Intel
chipsets - you need to bus master to do it.. Bummer.

> I need to perform a minimum of a 128-bit burst.

Why? The only reason I can see this would be if you need to stream data
at a rate that you can't reach unless you burst, and the device doesn't have
buffers to handle underruns or PCI bus mastering support. Hardware
engineers strike again!

Phil Barila

unread,
May 20, 2003, 3:23:27 PM5/20/03
to
"Clifford van Dyk" <cva...@netscape.net> wrote in message
news:3ec9ccfa$0$2...@hades.is.co.za...

> Thanks, Tom.
>
> I assume then that there is no standard (HAL-based) method for
> performing burst transactions from the CPU (south bridge) to targets on
> the PCI bus? I am beginning to believe that this is true, although it
> seems contradictory that the the PCI bus, which is burst-based, cannot
> be used to perform bursting (single address, multiple data accesses) to
> target devices. I need to perform a minimum of a 128-bit burst.

1) If you haven't alread done so, it would be educational to read the PCI
spec. Mindshare has an excellent book on the PCI bus as well.
2) Those resources will show you that the PCI bus is device-centric. That
means that it's architected around the device initiating data movement,
although the host has to set it up. This means that data transfer without a
bus-master has to be driven by the host. The host only has PIO with which
to do this. There's no stricture in the PCI spec that prevents a host
bridge from bursting reads initiated by the host CPU, however, there isn't
any way to tell the host bridge how many bytes to read, so that host bridge
isn't going to do such a thing. You would have to have a bus master engine
in the host bridge, and the host bridge designers rightly assume that if you
want high performance, you will have your own bus master engine. So the
common bridges don't have them, and I'm not aware of any that does, though
it's technically possible (I think).
3) If it's possible, add a bus-mastering engine to your hardware. If
that's not possible, then you'll have to live with CPU driven PIO.

Michael S

unread,
May 20, 2003, 8:54:17 PM5/20/03
to
"Tom Thornhill" <fakemail...@reply.to.group.com> wrote in message news:<badnjk$377$1$8300...@news.demon.co.uk>...

> "Clifford van Dyk" <cva...@netscape.net> wrote in message news:3ec9ccfa$0$2...@hades.is.co.za...
> > Thanks, Tom.
> >
> > I assume then that there is no standard (HAL-based) method for
> > performing burst transactions from the CPU (south bridge) to targets on
> > the PCI bus?
> > I am beginning to believe that this is true, although it
> > seems contradictory that the the PCI bus, which is burst-based, cannot
> > be used to perform bursting (single address, multiple data accesses) to
> > target devices.
>
> rep movsd should be OK, but I found this
> http://www.pcisig.com/reflector/msg01281.html
>

Your link is certainly wrong about CPU-initiated PCI writes - they are
not limited to short bursts.
The article is outdated and don't take into account a big disparity in
the peak bandwidth between the CPU's FSB and the PCI (x48 for the
latest and greatest 800MHz FSB), deep store buffers both in the CPU
and in the chipset and non-blocking OoO microarchitecture of the
modern processors. The article is wrong since 440BX chipset and may be
even earlier. I remember very good that it was possible to achieve 90+
MB/s sustained transfer rate from 440BX to PCI video card. Assuming 4
cycle overhead per burst, 90MB means average burst length of 8.5.

I don't know about CPU-initiated PCI read transfers, never tried it
myself. However I think that the article is wrong about reads too.
Most of the same factors which apply to writes would apply alsao to
reads. The only important exceptions are the lack of "load buffers" in
the CPU and the fact that sooner or later CPU will need the result of
the load instruction. Still deep instruction reordering buffers should
allow to modern CPU to issue several 64-bit load transactions to FSB
before it stalls waiting for arrival of the results. Probably, rep
movsd is not the best way to do it. Unrolled MMX or SSE-II loop (eight
loads from the PCI followed by eight stores to the main memory) can
produce better result.

The bottom line:
Long burst writes from CPU to PCI are surely possibly.
Short to medium-length burst reads are probably possible, at least
with faster chipsets.

Clifford van Dyk

unread,
May 21, 2003, 2:58:05 AM5/21/03
to
Actually I was hoping that I could use this to allow for the concept of
a 128-bit word when writing into the target interface of the PCI-to-XXX
bridge chip I am designing. Unfortunately the PCI bus doesn't seem to
have allowed for a wider transfer width than bus width :-(. As with many
things in life, nothing is for free! Thanks anyway for your advice - PCI
steering committee strikes again? :-)

Clifford

0 new messages