Failing on DMA transfers

104 views
Skip to first unread message

laus...@gmail.com

unread,
Jan 1, 2013, 7:19:05 AM1/1/13
to ata-sc...@lists.apple.com
Happy New Year, dear list.

I'm dev'ing on 10.8.2 and my driver is inherited from IOSCSIParallelInterfaceController.

I'm successful with I/O transactions via this scheme (which isn't affordable of course):
1) In ProcessParallelTask() routine or callee functions: allocate and prepare physically contiguous buffer in kernel space of size of data transfer with inTaskWithPhysicalMask(). On kSCSIDataTransfer_FromInitiatorToTarget direction, copy data from buffer received via GetDataBuffer() to allocated one, pass physical address and length of allocated buffer (as it was single SG frame) to hardware and do DMA transfer. On kSCSIDataTransfer_FromTargetToInitiator copy data back, complete on that allocated buffer and free it before doing CompleteParallelTask().

I'm failing on I/O transactions via any of these schemes:
1) Retrieve IODMACommand via GetDMACommand(), call prepare(GetDataBufferOffset(), GetRequestedDataTransferCount(), false, false) on it. Generate segments like this (stripped to 64-bit SG version for posting less code, resource freeing omitted too):

bool SGL() {
IODMACommand::Segment64 *segments;
UInt32 numSeg;
UInt64 offset = 0;

numSeg = MaxSGLCountObtainedFromHW;
segments = IONew(IODMACommand::Segment64, numSeg);

while (offset < TransferLen) // Tried doing only single genIOVMSegments() call too
if (cmd->gen64IOVMSegments(&offset, segments, &numSeg) != kIOReturnSuccess)
return false;

for (int i = 0; i < numSeg; i++) {
// Point hardware to segments[i].fIOVMAddr and segments[i].fLength
}

// Tell SGL length to hw

return true;
}

Do DMA transfer, call cmd->complete(false, false) before CompleteParallelTask().

2) Do the same, but allow only one segment per transfer:
kIOMaximumSegmentCount(Read/Write)Key = 1

bool SGL() {
...
while (offset < TransferLen) {
numSeg = 1;
...
}
...
}

3) Bad scheme. Don't generate segments, just try this:
IOByteCount length;
IOPhysicalAddress paddr;
paddr = GetDataBuffer()->getPhysicalSegment(0, &length);
Then, again, "pass physical address and length to hardware and do DMA transfer".

I know constraints of my hardware:

Here's SG frame which my hardware uses
struct {
UInt32 Address;
UInt32 Length;
} SGE32;
struct {
UInt64 Address;
UInt32 Length;
} SGE64;

Transfer size isn't limited, so limit of it's length is what "Length" field can store.
I report the following constraints:
kIOMaximumSegmentCount(Read/Write)Key = MaxSGLCountObtainedFromHW
kIOMaximumSegmentByteCount(Read/Write)Key = UINT_MAX
kIOMinimumSegmentAlignmentByteCountKey = 1 // no alignment requirement
kIOMaximumSegmentAddressableBitCountKey = IOPhysSize

And here's InitializeDMASpecification():
cmd->initWithSpecification(IOPhysSize == 64 ? kIODMACommandOutputHost64 : kIODMACommandOutputHost32,
IOPhysSize, IOPhysSize == 64 ?
UINT_MAX, // Max size per segment
IODMACommand::kMapped,
UINT_MAX // Total max of transfer size
1);

But i'm failing every time with any of the latest three schemes. The best what i've achieved with them are working small transfers, e.g. SCSI inquiry data and read/write of small amounts of data.
What i may doing wrong?

Thanks!
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Ata-scsi-dev mailing list (Ata-sc...@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/ata-scsi-dev/ata-scsi-dev-garchive-72467%40googlegroups.com

This email sent to ata-scsi-dev-...@googlegroups.com

laus...@gmail.com

unread,
Jan 4, 2013, 9:14:59 AM1/4/13
to Chris Sarcone, ata-sc...@lists.apple.com
Hello, Chris.

Thank you for fast reply.

> That's horribly inefficient. You do not need to allocate a physically contiguous buffer if your device supports S/G operations, which you allude to later in your email.

You're right, i've just used it to ensure that driver basically works. I would never do this in final solution.

> When you are called by your superclass for ReportHBASpecificTaskDataSize(), you should think of this as your per-task specific "playground area" for commands and S/G segments. ... you can use GetHBADataPointer() to manipulate this per-task data and GetHBADataDescriptor() to get the IOMemoryDescriptor for it.

Yes, i understand and use this concept.

> And you call initWithSpecification() passing &MyController::OutputSegment as your segment function...

Thanks, i've missed that i may use custom segment function.

> If you advertise the proper maximum byte count, then the fLength should never be greater than UINT32_MAX

Can i also limit total transfer size somehow, or transfer fragmentation is on my own? kIOMaximumByteCount(Read/Write)Key make no sense.

P.S.: One more question. There is a paragraph in "I/O Kit Fundamentals": "If your DMA engine does complicated things, such as performing partial I/Os or synchronizing multiple accesses to a single IOMemoryDescriptor, you should write your driver assuming that the memory will be bounced. You don’t need to add code that checks for bouncing, because IODMACommand functions, such as synchronize, are no-ops when they are unnecessary."
Does this means, that i'm free of doing pre/post- read/write- synchronization operations?

Chris Sarcone

unread,
Jan 4, 2013, 12:14:29 PM1/4/13
to laus...@gmail.com, ata-sc...@lists.apple.com
Hello --

And you call initWithSpecification() passing &MyController::OutputSegment as your segment function...

Thanks, i've missed that i may use custom segment function.

If you advertise the proper maximum byte count, then the fLength should never be greater than UINT32_MAX

Can i also limit total transfer size somehow, or transfer fragmentation is on my own? kIOMaximumByteCount(Read/Write)Key make no sense.

You can limit data transfers in the following ways (via ReportHBAConstraints()):

kIOMaximumSegmentCountReadKey, (required)
kIOMaximumSegmentCountWriteKey, (required)

These are the maximum number of SG elements your hardware can process in a single command. Some hardware can only process say 128 segments of 4K data each pass of the DMA engine (or the driver writer has made a wired memory tradeoff to cap the maximum).


kIOMaximumSegmentByteCountReadKey, (required)
kIOMaximumSegmentByteCountWriteKey, (required)

This is the maximum size per segment. Segments can be any size up to and including the size reported here, but in normal cases they are either 4K (VM page size) or they are very large contiguous groups of pages which represent the entire transfer (unless the reported segment size is less than the transfer size, in which case it will have multiple max segment byte count S/G elements). An example of the latter is when VT-d is enabled.

kIOMinimumSegmentAlignmentByteCountKey, (required)

Alignment constraints for a segment. If your DMA engine can only handle say 4 byte alignment. IODMACommand will "bounce" the targeted DMA memory if the original isn't within this constraint.

kIOMaximumSegmentAddressableBitCountKey, (required)

Constraint the HBA may have on where it can DMA. Some DMA controllers only support 32-bit memory. Some only support 40-bit. You get to choose based on your hardware's support. IODMACommand will "bounce" it if it isn't within this constraint.

kIOMinimumHBADataAlignmentMaskKey (required)

This constraint is for the alignment requirement of the "playground area" for each per-task HBA data. By default, we use 16-byte alignment, but if your DMA engine required page alignment, this is where you can modify the default behavior.


These constraints are honored by two different layers in OS X. The first is the UBC (VFS cluster layer). When a filesystem comes online, the VFS layer records these constraints and will actually "shape" I/O requests such that they fit within these constraints to the best of its ability. For example, say a 16MB I/O was requested from userspace via pread(): if you advertise a max segment count of 256 you will likely see I/Os 1MB in size (256 * 4K). You will see a few 1MB I/O transfers start and then as you complete I/Os, more will be sent down in a pipeline fashion. The second layer that honors these constraints is the IOBlockStorageDriver. It has to be aware of I/Os via the /dev/disk and /dev/rdisk path, so it may break I/Os up at that layer.

Finally, aside from these constraints, if you have additional constraints that cannot be expressed as segment counts, segment sizes, or segment alignments, you can also call setProperty() directly for properties that IOBlockStorageDriver (and the UBC) will honor such as kIOMaximumByteCountReadKey (in case you want to enforce a maximum byte count you would ever "see" requested). Typically, you wouldn't see a DMA controller express things this way, as they typically deal with things like segments, but we have known a few that are just broken and require special attention. YMMV and the properties are there should you require them. You would not use ReportHBAConstraints() for a property like that, just use setProperty() directly.


P.S.: One more question. There is a paragraph in "I/O Kit Fundamentals": "If your DMA engine does complicated things, such as performing partial I/Os or synchronizing multiple accesses to a single IOMemoryDescriptor, you should write your driver assuming that the memory will be bounced. You don’t need to add code that checks for bouncing, because IODMACommand functions, such as synchronize, are no-ops when they are unnecessary."
Does this means, that i'm free of doing pre/post- read/write- synchronization operations?

Yes, IODMACommand handles all of this for you if you specify the constraints properly.

Thanks,

-- Chris


------------------

6 Infinite Loop

M/S 306-2MS

Cupertino CA 95014

phone: (408) 974-4033

fax:   (408) 862-7577

email: sar...@apple.com


laus...@gmail.com

unread,
Jan 5, 2013, 7:18:00 AM1/5/13
to Chris Sarcone, ata-sc...@lists.apple.com
Hello.

Chris, thank you much for the answers and explanative information on keys and layers.

My main problem was in that i've tried to set kIOMaximumByteCountReadKey from ReportHBAConstraints() method. I hope this topic will prevent anyone reading the list from dropping a brick i did in case of similar situation.

Not to praise, but to mention: the whole stack is better than what i've have to deal with in other operating systems.

Chris Sarcone

unread,
Jan 5, 2013, 1:13:38 PM1/5/13
to laus...@gmail.com, ata-sc...@lists.apple.com
Hi --

My main problem was in that i've tried to set kIOMaximumByteCountReadKey from ReportHBAConstraints() method.

I'll refer you to the HeaderDoc for that method

@abstract Called to report the I/O constraints for this controller.
A list of valid keys includes:


kIOMaximumByteCountReadKey is not listed there

If you think we should add it or if the docs should be more explicit or explain things better, please file a bug report at http://bugreporter.apple.com.


Not to praise, but to mention: the whole stack is better than what i've have to deal with in other operating systems.

If you wouldn't mind enumerating the things you found better and the things you've found more difficult, that may help us to improve the support and documentation for future HBA developers.

laus...@gmail.com

unread,
Jan 8, 2013, 9:30:18 AM1/8/13
to Chris Sarcone, ata-sc...@lists.apple.com
Hi, Chris.

> A list of valid keys includes:
> kIOMaximumByteCountReadKey is not listed there

Sorry, that's my inattention then.

> If you wouldn't mind enumerating the things you found better and the things you've found more difficult, that may help us to improve the support and documentation for future HBA developers.

Sure. I may note the following as for things which i find better:
1) No limitation on DMA transfers size (i suppose). Possibility to tell limit to the system, if required;
2) DMA buffers bouncing checks are usually no-ops;
3) Worked out SGE constraints.

Thing that isn't an issue, but which would be nice to see implemented is: RAID cards management and sensors framework.

Back to old topic:

> Segments can be ... or they are very large contiguous groups of pages which represent the entire transfer ... An example of the latter is when VT-d is enabled.

Am i right, that this was introduced with 10.8.2, starting from which VT-d is used as system mapper, like DART on G5 machines was back in time?
Reply all
Reply to author
Forward
0 new messages