At first, I noticed that there is a significant performance hit when
not using the RAW_IO pipe policy. The documentation for
WinUsb_SetPipePolicy states that WinUSB's queuing and error handling
is bypassed when using this policy, but I could find no further
discussion of the queuing and error handling.
After setting the pipe policy to RAW_IO, I found that the MSDN
documentation states that calls to WinUsb_ReadPipe and
WinUsb_WritePipe must satisfy the following conditions:
* The buffer length must be a multiple of the maximum endpoint packet size.
* The length must be less than what the host controller supports.
There is a comment in the WinUSB "How to" guide that states that it is
only for read requests that buffers must be a multiple of the maximum
packet size. In contrast, the MSDN documentation states that this
constraint applies to both read and write requests.
I tested the behavior and the "How to" guide is correct. I can make
calls to WinUsb_WritePipe with an odd-sized buffer and things work as
expected. The MSDN documentation should be updated to only have the
restriction on read requests.
The larger issue is that the read constraint imposes an onerous burden
on the application programmer. There are many situations in which a
partial read is required. As such, I had to work around this by
submitting temporary buffers and copying the result after the
operation completes. This has a clear performance disadvantage over
simply reading a partial packet directly from the driver.
It is worthy to note that Linux and Mac OS X do not impose this
constraint, thus allowing the application to simply read as many bytes
as desired from the USB stream.
Is it possible to remove this constraint in the next WinUSB update?
Thanks,
Paarvai
----------------
This post is a suggestion for Microsoft, and Microsoft responds to the
suggestions with the most votes. To vote for this suggestion, click the "I
Agree" button in the message pane. If you do not see the button, follow this
link to open the suggestion in the Microsoft Web-based Newsreader and then
click "I Agree" in the message pane.
There is no buffering anywhere in the USB driver path. Your buffer is sent
to the host controller driver, where it is filled and returned back to you.
The device has no idea how much data was requested. A device simply
receives a "send data now" signal. If you asked for 128 bytes, and the
device sends 512 bytes, that's called "babble". It is a USB protocol
violation.
If you need buffered access to your device data stream, then you have to
provide the buffers, just as you describe. This is nothing new. It's been
true for USB devices forever.
>It is worthy to note that Linux and Mac OS X do not impose this
>constraint, thus allowing the application to simply read as many bytes
>as desired from the USB stream.
I can't speak for Mac OS, but I am experienced with Linux USB coding. The
exact same restrictions apply: there is no buffering. If you ask for 128
bytes and the device transmits 512, that's a protocol violation. Even the
libusb library, which is not part of the operating system, provides no
buffering.
>Is it possible to remove this constraint in the next WinUSB update?
So, let me get this straight. You want to use "raw IO" so that you can
avoid buffering, but you really want it to buffering?
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.
I think WinUSB also supports this, just not in raw mode.
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
ma...@storagecraft.com
http://www.storagecraft.com
Maybe an example will make the scenario more clear:
1) There is a device that sends exactly 768 bytes. This will arrive
as two packets: 512 bytes + 256 bytes.
2) The application would like to allocate a buffer of exactly 768
bytes to receive this data.
3) Under the RAW_IO pipe policy, there cannot be a partial packet for
the read. Therefore, a buffer of 1024 bytes is required.
My low-level library code does not own the buffer's memory, relying
upon the application code to provide a properly sized buffer. The
application code is not concerned with USB packet sizing. It simply
wishes to receive 768 bytes and provides a buffer to the library that
is sized accordingly.
The current workaround is for the low-level code to allocate an
additional aligned 1024-byte buffer and copy the result into the
application buffer when complete. This has the unnecessary
performance penalty of memory allocation and copying. While this
example is for a small packet, the penalties accumulate over larger
packets and many iterations.
In contrast, the most efficient usage would be the following (assume
that "char *buffer" is the application-supplied buffer of length 768):
1) Receive the first 512 bytes into &buffer[0].
2) Receive the remaining 256 bytes into &buffer[512].
Please note that this works as expected on both Linux and Mac OS X.
Therefore, there is no inherent USB controller limitation. Rather, it
seems that the WinUSB driver has an unnecessary restriction that
prevents step #2 from working with RAW_IO.
Is it possible to remove this constraint in the next WinUSB update?
Best Regards,
Paarvai
"Tim Roberts" wrote:
> Paarvai Naai <Paarv...@discussions.microsoft.com> wrote:
> >...
> >* The buffer length must be a multiple of the maximum endpoint packet size.
> >* The length must be less than what the host controller supports.
> >...
> >The larger issue is that the read constraint imposes an onerous burden
> >on the application programmer. There are many situations in which a
> >partial read is required. As such, I had to work around this by
> >submitting temporary buffers and copying the result after the
> >operation completes. This has a clear performance disadvantage over
> >simply reading a partial packet directly from the driver.
>
> There is no buffering anywhere in the USB driver path. Your buffer is sent
> to the host controller driver, where it is filled and returned back to you.
> The device has no idea how much data was requested. A device simply
> receives a "send data now" signal. If you asked for 128 bytes, and the
> device sends 512 bytes, that's called "babble". It is a USB protocol
> violation.
> ...
> [snip]
Then go away from RAW_IO, this is simplest.
The WinUSB documentation does not sufficiently describe how the
buffering mechanism works. Furthermore, I cannot inspect the WinUSB
source code to characterize the latency behavior. Therefore, the only
choice is to use RAW_IO.
Our application is a mature product that has been working for years on
Windows and Linux. We have strict constraints on throughput and
latency. We would like to transition the Windows version to WinUSB
for better maintenance. However, we need to be in control of USB
communications at the lowest level.
This approach works perfectly on other operating systems. It works
almost perfectly on WinUSB. If WinUSB removed the limitation on read
packets, its features would be brought up to the same level as the
Linux and Darwin user-mode USB stacks.
Best Regards,
Paarvai
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:954972E9-69D1-4F11...@microsoft.com...
This means that we do not want to introduce any unnecessary
inefficiencies. Our current "copy buffer" workaround for the WinUSB
issue is probably sufficient on a fast machine. However, we don't
know how it will manifest on slower machines.
Besides the straight performance numbers, it is also a code
maintainability issue. Both Linux and Darwin allow us to walk the
pointer in a single buffer, reading the precise amount of data
required with no extra copying. Only WinUSB requires that we align
the reads, requiring the inelegant workaround described above. Just
the fact that it works is not reason to say that it is not a problem.
Is it possible for somebody from Microsoft to comment on this? I
suspect that it should be relatively easy to remove the limitation of
packet alignment for RAW_IO reads. I would be interested in hearing
your thoughts.
Thanks,
Paarvai
There are no other ways. USB transfers are aligned, period.
No-raw traffic will result in the same "copy buffer" done for you by WinUSB.
> Besides the straight performance numbers, it is also a code
> maintainability issue. Both Linux and Darwin allow us to walk the
> pointer in a single buffer, reading the precise amount of data
> required with no extra copying.
Then the copying is done in the system library for USB, same way as in WinUSB
in non-raw mode.
>Only WinUSB requires that we align the reads,
Switch off the raw mode and enjoy the same functionality as in UNIXen.
> I suspect that it should be relatively easy to remove the limitation of
> packet alignment for RAW_IO reads.
No, it is just plain impossible. "Raw" means - no extra processing, exactly as
on USB wire. This means the alignment requirement.
If you do not want this requirement - go away from raw mode, you will have the
same functionality as in both UNIXen.
Latency constraints on a bulk pipe???
> almost perfectly on WinUSB. If WinUSB removed the limitation on read
> packets
As you can understand, this is the normal natural limitation on non-buffered
USB traffic.
So, you can either get rid of RAW_IO, or implement your own buffering.
How is your Windows kernel-mode driver implemented? using its own buffering I
believe?
That said, 40 MB/s memory copy is nothing for modern systems. Those from
2005 could achieve that USB throughput easily under WinXP.
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:8EA56C65-B6E4-498C...@microsoft.com...
Regardless of whether a 40MB/s memory copy is significant on a modern
computer, it is simply code bloat to have something that would not be
necessary if the underlying problem was fixed.
Paarvai
As for your question about the kernel driver, there is no buffering
involved. Rather, it simply populates the appropriate fields in the
Windows URB structure:
http://msdn.microsoft.com/en-us/library/ms793340.aspx
http://msdn.microsoft.com/en-us/library/ms793345.aspx
In particular, the TransferBufferLength field is simply set to the
size. The size does not have to be a multiple of 512.
Paarvai
This is incorrect. Please refer to the EHCI specification:
http://www.intel.com/technology/usb/ehcispec.htm
In section 4.10.3 on page 83, it states:
---
The maximum number of bytes a device can send is Maximum Packet
Size. The number of bytes moved during an OUT transaction is either
Maximum Packet Length bytes or Total Bytes to Transfer, whichever is
less.
...
The PID Code field indicates an IN and the device sends more than the
expected number of bytes (e.g. Maximum Packet Length or Total Bytes
to Transfer bytes, whichever is less) (e.g. a packet babble). This
results in the host controller setting the Halted bit to a one.
---
Section 3.5.3 on page 42 has a description of the relevant data
structures.
Therefore, for both OUT and IN packets, partial packet transfers are
fully supported by the EHCI hardware. This makes sense, since Linux
and Darwin both have the same behavior as I previously described. In
fact, the underlying Windows EHCI also has the same behavior. It
seems that WinUSB is introducing the alignment limitation.
Is it possible for somebody at Microsoft with visibility into the
source code to comment on this issue?
Thanks,
Paarvai
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:28A343A0-2B3D-4F1D...@microsoft.com...
Now, the restriction on MaxPacket allignment is not imposed by the lower USB
stack. It is in fact imposed by the WinUSB driver itself. This is to
prevent malicious or malfunctioning software from causing a babble condition
on the bus.
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:AF300D3D-3917-4AB1...@microsoft.com...
Is it possible for the WinUSB or EHCI driver to automatically unstall
the controller in the specific case of babble on an endpoint for which
the user has requested less than MaxPacket bytes?
In the meantime, I will test Linux and Darwin to see how they behave
in this situation and post the results to this discussion group.
Thanks,
Paarvai
Oh, the device is absolutely allowed to send a partial packet. That is
fully supported. If you supply a 1024-byte buffer and the device sends
768, the request will be completed with 768 bytes. A short buffer
terminates a transfer.
No, that's silly. A stall on an endpoint affects ONLY that endpoint.
>Is it possible for the WinUSB or EHCI driver to automatically unstall
>the controller in the specific case of babble on an endpoint for which
>the user has requested less than MaxPacket bytes?
Look, if the limitations of WinUSB aren't acceptable for you, then for
goodness sakes, just throw WinUSB in the trash and write a kernel driver.
Kernel USB drivers are not that hard to write.
WinUSB is a good solution for many USB problems. However, it is NOT the
solution for EVERY USB problem.
>> There are no other ways. USB transfers are aligned, period.
>...
>> No, it is just plain impossible. "Raw" means - no extra processing,
>> exactly as on USB wire. This means the alignment requirement. If
>> you do not want this requirement - go away from raw mode, you will
>> have the same functionality as in both UNIXen.
>
>This is incorrect. Please refer to the EHCI specification:
I am curious to know what part of Maxim's statement you believe is
contradicted by this section.
>In section 4.10.3 on page 83, it states:
>
>---
>The maximum number of bytes a device can send is Maximum Packet
>Size. The number of bytes moved during an OUT transaction is either
>Maximum Packet Length bytes or Total Bytes to Transfer, whichever is
>less.
>...
>The PID Code field indicates an IN and the device sends more than the
>expected number of bytes (e.g. Maximum Packet Length or Total Bytes
>to Transfer bytes, whichever is less) (e.g. a packet babble). This
>results in the host controller setting the Halted bit to a one.
>---
>
>Therefore, for both OUT and IN packets, partial packet transfers are
>fully supported by the EHCI hardware.
!!! What? How did you possibly come to that conclusion? What those
paragraphs say exactly match what the rest of us have been saying: when a
device sends more data than the request asked for, that's "babble", which
is a protocol violation.
"Partial packet transfers" are supported, yes, but only in the sense that
the DEVICE is allowed to send less than the maximum packet size.
>This makes sense, since Linux
>and Darwin both have the same behavior as I previously described.
No, it doesn't.
>Is it possible for somebody at Microsoft with visibility into the
>source code to comment on this issue?
This newsgroup is not an official Microsoft support channel. If you want
an official word, you will have to call Microsoft support and pay for a
support event.
Please see the RAW_IO description in the following page:
http://msdn.microsoft.com/en-us/library/aa476450.aspx
Best Regards,
Paarvai
> >> There are no other ways. USB transfers are aligned, period.
> >...
> >> No, it is just plain impossible. "Raw" means - no extra processing,
> >> exactly as on USB wire. This means the alignment requirement. If
> >> you do not want this requirement - go away from raw mode, you will
> >> have the same functionality as in both UNIXen.
> >
> >This is incorrect. Please refer to the EHCI specification:
>
> I am curious to know what part of Maxim's statement you believe is
> contradicted by this section.
Actually, the entire statement is incorrect.
Here is the detailed explanation:
1) Neither read nor write transfer requests that are submitted to the
EHCI host controller are required to be aligned to MaxPacket. This is
as per EHCI specification and was confirmed by Randy in this thread.
This directly contradicts the above statement "Raw means - no extra
processing, exactly as on USB wire. This means the alignment
requirement."
2) Both Linux and Darwin user-mode USB drivers do not enforce an
alignment requirement on the user-mode application. Furthermore, they
do not internally allocate aligned buffers to satisfy a fictitious
MaxPacket alignment requirement. Since we have visibility into the
source code, we are able to directly confirm this behavior. This
directly contradicts the above statement "If you do not want this
requirement - go away from raw mode, you will have the same
functionality as in both UNIXen."
3) Given the above two points, it is certainly not "plain impossible."
> >Therefore, for both OUT and IN packets, partial packet transfers are
> >fully supported by the EHCI hardware.
>
> !!! What? How did you possibly come to that conclusion? What those
> paragraphs say exactly match what the rest of us have been saying: when a
> device sends more data than the request asked for, that's "babble", which
> is a protocol violation.
First of all, Maxim's post was not talking about babble. The only
posts that mention babble are yours and Randy's. However in Randy's
post, he confirms that there is an artificial restriction at the
WinUSB level to avoid babble.
---
Now, the restriction on MaxPacket allignment is not imposed by the
lower USB stack. It is in fact imposed by the WinUSB driver
itself. This is to prevent malicious or malfunctioning software from
causing a babble condition on the bus.
---
I am not concerned with babble. Rather, I already know in advance
that our device will send a partial packet. I should be allowed to
request exactly that much data through WinUSB, just as we can in our
kernel driver. Of course, given Randy's response, I would like to
understand the consequences of "malicious or malfunctioning software"
in more detail.
> "Partial packet transfers" are supported, yes, but only in the sense
> that the DEVICE is allowed to send less than the maximum packet
> size.
Of course the device can send less than MaxPacket bytes in a packet.
However, the issue in question is whether the host-side software
(driver, application, etc.) should be allowed to request a non-aligned
number of bytes from the host controller.
> >This makes sense, since Linux
> >and Darwin both have the same behavior as I previously described.
>
> No, it doesn't.
I'm happy to clarify this point further.
Both Linux and Darwin allow the host-side software to request a
non-aligned number bytes from the host controller. If the device
responds with less than or equal to the amount requested, everything
goes smoothly with no additional overhead. If a babble condition
occurs, both operating systems gracefully return an error condition so
that the application can deal with it appropriately.
As it stands now, WinUSB does not allow this under RAW_IO, thus
requiring an extra buffer allocation and copy for non-aligned packets.
It seems like a minor issue, but is a perfectly legitimate topic of
discussion for this newsgroup. (We are already in the process of
discussing why non-RAW_IO doesn't work well for us on a different
thread.)
> >Is it possible for somebody at Microsoft with visibility into the
> >source code to comment on this issue?
>
> This newsgroup is not an official Microsoft support channel. If you want
> an official word, you will have to call Microsoft support and pay for a
> support event.
This posting was explicitly filed as a suggestion. We already have a
workaround (allocate + copy) for this issue. Our purpose for
discussing the matter is with the hopes that Microsoft can improve
their products for the benefit of all developers.
In this vein, it is in Microsoft's best interests to be involved with
the discussions of the community. From time to time, when the issue
becomes sufficiently technical, they do reply to postings. This is to
be expected, since they are the only ones with visibility into the
source code. We appreciate Randy's responses to this thread.
Best Regards,
Paarvai
You are correct that the EHCI specification states that only the
particular Queue Head will be stalled.
However, we do not know how the Windows EHCI driver behaves when this
happens. Randy's post seems to suggest that there are cross-endpoint
implications. Otherwise, why would he have expressed a concern for
malicious software affecting the system?
WinUSB only allows a single process to use a device. Therefore, any
behavior that is contained within a single device should not have
security implications for other processes. If the process wants to
continue using the endpoint, it can issue a WinUsb_ResetPipe to clear
the halt condition. Furthermore, when the process terminates, the
Queue Heads are most likely un-linked and the stall will no longer be
a concern.
Could Randy or someone else at Microsoft possibly comment on this?
> Look, if the limitations of WinUSB aren't acceptable for you, then for
> goodness sakes, just throw WinUSB in the trash and write a kernel driver.
> Kernel USB drivers are not that hard to write.
>
> WinUSB is a good solution for many USB problems. However, it is NOT the
> solution for EVERY USB problem.
You seem to have experience with USB driver development. I assure you
that our team also has extensive experience with driver development on
multiple platforms, down to both the device and host controller
hardware. As I had mentioned in a previous post, we already have a
working kernel-mode driver. Since we use user-mode drivers on Linux
and Darwin, we would like to use a user-mode driver on Windows to
achieve a more consistent development and end-user experience.
Our intention on these forums is to use our cross-platform experience
to provide constructive input that can improve WinUSB. Our experience
with the other platforms seems to suggest that what we are asking is
reasonable. We are perfectly willing to be convinced otherwise, but
would appreciate if we could be given some solid technical
justification for why this particular design decision was made.
I don't quite understand your unwillingness to discuss the limitations
that we bring up. We (and probably also you) do not have access to
either WinUSB or the Windows EHCI driver source code. Therefore, all
of our information comes from the sparse documentation provided on
MSDN, and is necessarily second-hand. It is only fair to allow us to
continue this process of inquiry until a resolution is reached, rather
than simply attempt to shut down the discussion by telling us to write
a kernel driver (that we already have).
Best Regards,
Paarvai
(Compare with HeapAlloc: it allocates not the exact amount requested, but
somewhat at leas this amount.
WDF has method WdfCommonBufferCreate - rather than just using
ExAllocate.... )
Regards,
--PA
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:954972E9-69D1-4F11...@microsoft.com...
I knew there was a reason why I put the restriction in WinUsb, but I
couldn't remember. I had to do a bit of research to refresh my memory.
Babble is the condition when a device sends data that the host isn't
expecting. In this condition it is because the buffer programmed into the
host controller wasn't big enough to receive all of the data that the device
sent. If the buffer is a multiple of MaxPacket for the endpoint, then this
condition will never happen.
There are three scenarios that I can think of that would result from the
babble condition. For these scenarios, lets assume that the remaining
buffer in the controller was 1 byte in size, and the endpoint's MaxPacket is
512.
1. Probably the most common result is that the packet would be dropped, the
transfer would be failed, and the endpoint would be halted. This isn't bad
as it only affects that endpoint.
2. The host controller may have seen that it only needed to move 1 byte,
and only reserved enough time for that 1 byte and immediately moved onto the
next endpoint in the schedule. This would cause the IN or OUT token to be
smashed as both the device and the host were transmitting at the same time.
This may be bad, but it would probably be recoverable, unless of course that
IN or OUT was for an ISOCH endpoint, in which case there would be data loss.
3. The host controller may have decided that it had time to squeeze in a 1
byte transaction right before the end of the frame. The time that the
device took to send all 512 bytes crossed the frame boundary. USB Hubs are
required to monitor for this condition. When detected, the hub is required
to disable the port. This is the critical error that I am concerned about.
In this case, if the device is multi-function (composite) the other
functions on the device would obviously be affected as well, which is
unacceptable.
It is primarily because of the 3rd case that the restriction is in WinUsb.
WinUsb does everything it can to isolate the damage that can be done by the
user-mode client to it's function only. Unfortunately in this case, it is
coming at the cost of performance in your scenario. There may be somethings
that could be done in future releases of WinUsb, though they wouldn't help
you in the short-run.
"Paarvai Naai" <Paarv...@discussions.microsoft.com> wrote in message
news:428E063F-9CAE-4440...@microsoft.com...
> I agree that this is a very valid and fruitfull discussion.
I sincerely appreciate your taking the time to thoroughly address our
inquiry.
> 2. The host controller may have seen that it only needed to move 1 byte,
> and only reserved enough time for that 1 byte and immediately moved onto the
> next endpoint in the schedule. This would cause the IN or OUT token to be
> smashed as both the device and the host were transmitting at the same time.
> This may be bad, but it would probably be recoverable, unless of course that
> IN or OUT was for an ISOCH endpoint, in which case there would be
> data loss.
This is probably not going to happen since the host controller will
most likely wait for an EOP (which will be *after* the babble in this
scenario) before moving onto the next queue head in the async queue.
> 3. The host controller may have decided that it had time to squeeze in a 1
> byte transaction right before the end of the frame. The time that the
> device took to send all 512 bytes crossed the frame boundary. USB Hubs are
> required to monitor for this condition. When detected, the hub is required
> to disable the port. This is the critical error that I am concerned about.
> In this case, if the device is multi-function (composite) the other
> functions on the device would obviously be affected as well, which is
> unacceptable.
This is a very interesting scenario and not one that we had
considered. Thank you for explaining the rationale behind the design
decision.
> It is primarily because of the 3rd case that the restriction is in WinUsb.
> WinUsb does everything it can to isolate the damage that can be done by the
> user-mode client to it's function only. Unfortunately in this case, it is
> coming at the cost of performance in your scenario. There may be somethings
> that could be done in future releases of WinUsb, though they wouldn't help
> you in the short-run.
Currently, the performance penalty is only for non-aligned transfers
and occurs once per transfer, and as such is not very severe. Our
desire for WinUSB to accept non-aligned transfers stemmed more from a
code maintenance perspective. However, we can understand why you have
decided to restrict reads to be aligned. If you do come up with a
different solution in the future, I'd be happy to try it out.
The other issue of performance when not using RAW_IO may still be of
interest to you (although we are happy to continue to use RAW_IO mode
for our application). I will be in touch sometime next week regarding
the results of that comparison on the "Buffered performance in WinUSB"
thread.
Best regards,
Paarvai
You are correct. I came at this thread from the viewpoint of "this person
is trying to find a way to get a job done," which is a somewhat different
point of view from the one you describe.
With very few exceptions, "second-hand" is all you will ever get. To the
development teams, XP and Vista are ancient history. They're working on
improvements and design changes for Windows 7 and beyond, working 3 or 4
years in the future. Unless you encounter an outright bug in WinUSB that
prevents you from making some major release and shows evidence that it
might do the same to others, you should not expect this kind of change to
happen. There's just no one assigned to this kind of maintenance coding.
Perhaps the best you can hope for is the legacy of this thread, in the
hopes that others in the future will stumble upon the path you have
pioneered.