DMA Limit due to Limited Map Registers (256 or 16)

326 views

Skip to first unread message

BruteForce

unread,

Oct 9, 2007, 7:51:41 AM10/9/07

to FireAPI

The notion of DMA limit has been introduced in ubCore 5.21 and FireAPI
5.21 that goes with it.

Windows uses a concept called "Map Registers" in order to accommodate
devices with 32-bit PCI addressing capability on systems that have
more than 4GB of addressable PCI space.

You should be aware that the Intel chipset reserves some PCI address
space for the devices installed on a system, usually between 0.5GB and
1GB. So if your system has 4GB of RAM, then you end up needing more
than 32-bits to address the whole range.

Some 32-bit systems can't do that. You install 4GB of RAM, but if you
go to Task Manager you see that the system for example reports only
3.5GB or 3GB of RAM. On these systems, "Visible RAM" + "Intel chipset
reserved space" <= 4GB.

Some more advanced 32-bit systems have the Physical Address Extensions
(PAE) capability which allows the operating system to address more
than 4GB of PCI address space. This can also happen on many 64-bit
systems running a 32-bit OS.

All 64-bit systems running a 64-bit OS can address more than 4GB of
PCI address space, if present.

All 1394 chips available today have a 32-bit register for providing
the PCI physical address of the target memory location for DMA reads/
writes, so how do these devices work on those systems?

The answer is that Windows provides the "Map Register" concept. Map
registers are a transparent concept. Drivers always use them for DMA,
but on x86 systems with up to 4GB addressable PCI space they do
nothing. On the other systems, map registers do something :-)
The bigger the DMA transfer is, the more map registers you need.

Windows sets a limit on the usage of map registers by device drivers.
On x86 systems this limit is very high so it is practically not there,
however when map registers are active the limit is pretty low.
Our testing indicates that on x64 systems (with x64 OS) with less than
4GB of PCI addressable space, there is practically no limit on the
number of map registers, however when more than 4GB of PCI addressable
space exists Windows x64 imposes a limit of 256 map registers, which
means a maximum DMA transfer of 1MB.
x86 Systems with PAE are in big trouble, since the developers of
Windows decided on a mere 16 map registers on this case, which means a
maximum DMA transfer of 64KB.

To summarize:

Windows 32-bit x86 CPU - PCISpace<=4GB --> unlimited map
registers
Windows 64-bit x64 CPU - PCISpace<=4GB --> unlimited map
registers
Windows 64-bit x64 CPU - PCISpace>4GB --> 256 map registers
(1MB)
Windows 32-bit x86 CPU with PAE - PCISpace>4GB --> 16 map
registers (64KB)
Windows 32-bit x64 CPU with PAE - PCISpace>4GB --> 16 map
registers (64KB)

Systems with 16 map registers are practically unusable for heavy iso
operations, under the current DMA implementation in ubCore. What would
be required would be a "Common Buffer DMA" implementation, which
however is very resource intensive and not guaranteed to work all the
time since the drivers would have to allocate big buffers of
physically contiguous pages each time an adapter channel is opened.

FireAPI and ubCore define the "DMA limit" as the maximum DMA transfer
permitted on the a system. You can retrieve this by calling
C1394QueryInformation with the OID_DMA_LIMIT object identifier.
FireCommander in ubCore 5.21 implements the DMALIMIT command to show
you the DMA limit.

When the DMA limit is larger than one image frame, then you have to
break the frame into multiple iso requests. It is suggested that you
break your isochronous requests in ((DMA_LIMIT/2)-4KB) bytes so that
two requests can be programmed to the adapter at a time. So when the
first request completes, the adapter will be processing the second one
while the driver software will be preparing the third one, and so on.
This way you don't run the risk of losing isochronous packets.

UBDCAM.SYS and the Firei DLL interface have been updated in ubCore
5.21 to detect the DMA limit and break each frame in multiple iso
requests according to the ((DMA_LIMIT/2)-4KB) rule of thumb. The
actual implementation does not generate (N-1) requests of ((DMA_LIMIT/
2)-4KB) bytes, and the Nth request with whatever remains, because this
often generates a very small iso request which through some obscure
reasoning involving timing leads to lost iso packets and image jitter.
So the code uses ((DMA_LIMIT/2)-4KB) in order to decide the number of
iso requests per frame, and then make those roughly even with one
another. This provides the best possible iso behaviour.