[pyOCD] How to improve the pyocd flash download speed? (#23)

433 views
Skip to first unread message

ECNU3D

unread,
Apr 29, 2014, 8:48:55 AM4/29/14
to mbedmicro/pyOCD

Hi all, I'm work on the pyocd flash download issue. I compare the download speed of CMSIS DAP with keil and pyocd:

  1. When you download a 500KB bin file to the mbed lpc1768 =>
    keil -> cmsis_dap debugger -> target lpc1768 spends about 50s
    pyocd(window&linux) -> cmsis_dap debugger -> target lpc1768 spends about 110s

  2. With the help of some python performance analysis tool, we can find that pyocd spend most of its time on usb write&read function
    Whole time: 109.656s
    Usb read&write: 45.946+14.38+23.016+20.969 = 104.311, so pyocd spends more than 95% of its run time on USB read/write function.

  3. After use time cmd to briefly analysis the pyocd run time: User 2.14 System 0.72 Real 102.44 From the analysis above we can see that most of time wasn’t spent in user mode or kernel mode, so most time consumption should lie in IO wait time.

According to the analysis, I’ve changed the behavior of dapTransferBlock, and divided the write and read usb endpoint into two thread to avoid some IO wait time. However, only 10% speed enhancement is gained from this new feature. So do you have any clue about the low download speed of PyOCD?


Reply to this email directly or view it on GitHub.

Bogdan Marinescu

unread,
Apr 29, 2014, 11:29:03 AM4/29/14
to mbedmicro/pyOCD
Hi,

Thanks for your analysis. I think this might have something to do with the
DAP interface enumerating as a HID device and maybe the way we're
configuring its descriptors. Additional information can be found for
example here:

http://janaxelson.com/forum/index.php?topic=983.0

I don't think anyone went into very deep investigations about the
performance issues yet.

Thanks,
Bogdan



On Tue, Apr 29, 2014 at 1:48 PM, ECNU3D <notifi...@github.com> wrote:

> Hi all, I'm work on the pyocd flash download issue. I compare the download
> speed of CMSIS DAP with keil and pyocd:
>
> 1.
>
> When you download a 500KB bin file to the mbed lpc1768 =>
> keil -> cmsis_dap debugger -> target lpc1768 spends about 50s
> pyocd(window&linux) -> cmsis_dap debugger -> target lpc1768 spends
> about 110s
> 2.
>
> With the help of some python performance analysis tool, we can find
> that pyocd spend most of its time on usb write&read function
> Whole time: 109.656s
> Usb read&write: 45.946+14.38+23.016+20.969 = 104.311, so pyocd spends
> more than 95% of its run time on USB read/write function.
> 3. After use time cmd to briefly analysis the pyocd run time: User
> 2.14 System 0.72 Real 102.44 From the analysis above we can see that most
> of time wasn’t spent in user mode or kernel mode, so most time consumption
> should lie in IO wait time.
>
> According to the analysis, I’ve changed the behavior of dapTransferBlock,
> and divided the write and read usb endpoint into two thread to avoid some
> IO wait time. However, only 10% speed enhancement is gained from this new
> feature. So do you have any clue about the low download speed of PyOCD?
>
> —
> Reply to this email directly or view it on GitHub<https://github.com/mbedmicro/pyOCD/issues/23>
> .

Martin Kojtal

unread,
Apr 30, 2014, 2:40:47 AM4/30/14
to mbedmicro/pyOCD

@ECNU3D, can you test also openOCD? how fast is it with you binary?

ECNU3D

unread,
May 7, 2014, 3:52:28 AM5/7/14
to mbedmicro/pyOCD

Closed #23.

ECNU3D

unread,
May 7, 2014, 3:52:28 AM5/7/14
to mbedmicro/pyOCD

@0xc0170 , I've tested it, and the result is less than 1KB/s, much slower than pyocd, which is out of my expectation. I think there may not be any performance tuning for cmsis dap from openocd side.(Which I think is great advantage for pyocd, as it's the fastest gdb server for cmsis-dap under Linux :+1: )

One of the possible reason for this phenomenon is that openocd use different flash algorithm, because when the flash load region is not continuous, openocd's download speed will largely decrease(with any kind of debugger, not only cmsis dap), which I think won't happen for the flash algorithm of Keil and pyocd.

@bogdanm Another thing is that I have checked your url given above, and the solution in that related to hid descriptor modification, which I think is related to cmsis-dap firmware(in order to send fewer hid package, your hid package size need to be enlarged). But the mbed cmsis-dap firmware is based on a arm release, I'm not sure if it's safe to modify that.

ECNU3D

unread,
May 7, 2014, 3:52:39 AM5/7/14
to mbedmicro/pyOCD

Reopened #23.

ECNU3D

unread,
May 7, 2014, 3:54:06 AM5/7/14
to mbedmicro/pyOCD

Sorry for the mis-operation above :(


@0xc0170 , I've tested it, and the result is less than 1KB/s, much slower than pyocd, which is out of my expectation. I think there may not be any performance tuning for cmsis dap from openocd side.(Which I think is great advantage for pyocd, as it's the fastest gdb server for cmsis-dap under Linux :+1: )

One of the possible reason for this phenomenon is that openocd use different flash algorithm, because when the flash load region is not continuous, openocd's download speed will largely decrease(with any kind of debugger, not only cmsis dap), which I think won't happen for the flash algorithm of Keil and pyocd.

@bogdanm Another thing is that I have checked your url given above, and the solution in that related to hid descriptor modification, which I think is related to cmsis-dap firmware(in order to send fewer hid package, your hid package size need to be enlarged). But the mbed cmsis-dap firmware is based on a arm release, I'm not sure if it's safe to modify that.

Martin Kojtal

unread,
May 7, 2014, 10:24:48 AM5/7/14
to mbedmicro/pyOCD

Feel free to modify cmsis-dap firmware, it's available on github. A contributions are welcome!

OpenOCD - I should also mention that there's Kinetis design studio which contains own tweaked openocd , according my info, worth testing also. However, I assume it won't differ much regarding speed.

c1728p9

unread,
Nov 29, 2014, 1:27:30 AM11/29/14
to mbedmicro/pyOCD

Because HID limits the transfer speed to 64KB/s (one 64-byte command every ms) due to the interrupt endpoints, is there any alternative usb device type that could be used (even just for speed testing)? The only portable one that comes to mind is the CDC, which (correct me if I'm wrong) uses bulk endpoints. For testing, the pyOCD interface layer could be swapped out, and special mbed firmware could be created.

Andrii Anpilogov

unread,
Nov 29, 2014, 4:56:07 AM11/29/14
to mbedmicro/pyOCD, mbednotifications
FYI: Latest openocd reports 6-7kb/s for nrf51.



> On 29 íîÿá. 2014, at 15:22, mbednotifications <notifi...@github.com> wrote:
>
> Have you looked at the HID writes and verified that 64 bytes are being
> packed into each transfer? What about every packet containing data? The
> flash download sequence reported from basic test is on the order of 2k-4k.
> Far from the max throughput in a standard configuration. I'd recommend
> looking at this and tracking down before changing out the interface class.
>
> On Sat, Nov 29, 2014 at 12:27 AM, c1728p9 <notifi...@github.com> wrote:
>
> > Because HID limits the transfer speed to 64KB/s (one 64-byte command every
> > ms) due to the interrupt endpoints, is there any alternative usb device
> > type that could be used (even just for speed testing)? The only portable
> > one that comes to mind is the CDC, which (correct me if I'm wrong) uses
> > bulk endpoints. For testing, the pyOCD interface layer could be swapped
> > out, and special mbed firmware could be created.
> >
> > —
> > Reply to this email directly or view it on GitHub
> > <https://github.com/mbedmicro/pyOCD/issues/23#issuecomment-64943252>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mbed-devel" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to mbed-devel+...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> >
> —

Tony Wang

unread,
Nov 29, 2014, 7:02:09 AM11/29/14
to mbedmicro/pyOCD, mbednotifications
Yes, and for Keil cmsis dap interface, it can double the speed of pyOCD. I
think we can start from narrow down the speed between Keil and pyOCD.
> > Reply to this email directly or view it on GitHub.
> >
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/mbedmicro/pyOCD/issues/23#issuecomment-64946937>.

c1728p9

unread,
Nov 29, 2014, 5:19:12 PM11/29/14
to mbedmicro/pyOCD, mbednotifications

@mbednotifications I took a look at the USB traffic a couple months ago. Reads and writes packed as much data they could into each packet (they used 56 of the 64 bytes for data). One thing I did notice is that the interrupt in endpoint was pulled on average 2 times before the data read/written came back (3 ms total for each read/write). This might indicate that the mbed firmware could be optimized, but I didn't investigate it further. I figured it might be easier to see the performance hit of HID (if any) by testing without it. I know flashing over mass storage is significantly faster (~5 seconds to program 128KB on a kl25 freedom board).

@anpilog Do you know what speed pyOCD achieves using the same mbed firmware? If it differs a lot, the USB traffic could be analyzed to determine how to make it faster.

Tony Wang

unread,
Nov 29, 2014, 7:40:45 PM11/29/14
to mbedmicro/pyOCD, mbednotifications
I tested the openocd and pyocd with the same mbed firmware on lpc1768, and the speed of openocd is a little slower than pyocd. pyocd can reach 3~4kb. I think the speed of nrf won't make big difference. On 2014-11-30 06:18 , c1728p9 Wrote: @mbednotifications I took a look at the USB traffic a couple months ago. Reads and writes packed as much data they could into each packet (they used 56 of the 64 bytes for data). One thing I did notice is that the interrupt in endpoint was pulled on average 2 times before the data read/written came back (3 ms total for each read/write). This might indicate that the mbed firmware could be optimized, but I didn't investigate it further. I figured it might be easier to see the performance hit of HID (if any) by testing without it. I know flashing over mass storage is significantly faster (~5 seconds to program 128KB on a kl25 freedom board). @anpilog Do you know what speed pyOCD achieves using the same mbed firmware? If it differs a lot, the USB traffic could be analyzed to determine how to make it faster. — Reply to this email directly or view it on GitHub.

c1728p9

unread,
Dec 21, 2014, 1:24:06 AM12/21/14
to mbedmicro/pyOCD, mbednotifications

I created some hacked up firmware which used the CDC serial port instead of HID. There was a slight speed increase. I had it configured so each packet was still 64 bytes. Without having the DAP process the commands the data rate was almost exactly 64KB/s in both directions which seemed slower than it should be for bulk endpoints. Here are the data rates from a KL25 freedom board:
Read 128KB
CDC 1MHz - 26.7 KB/s
CDC 10MHz - 50.5 KB/s
HID 1MHz - 17.5 KB/s
HID 10MHz - 25.4 KB/s

Flash 128KB
CDC 1MHz - 131.000000 kbytes flashed in 11.435000 seconds ===> 11.462352 kbytes/s
CDC 10MHz - 131.000000 kbytes flashed in 9.102000 seconds ===> 14.400352 kbytes/s
HID 1MHz - 131.000000 kbytes flashed in 17.158000 seconds ===> 7.639119 kbytes/s
HID 10MHz - 131.000000 kbytes flashed in 14.824000 seconds ===> 8.841878 kbytes/s

c1728p9

unread,
Feb 22, 2015, 6:44:56 PM2/22/15
to mbedmicro/pyOCD, mbednotifications

I have two pull requests up to increase flash programming speeds if anyone is interested in trying them.

The first pull request increases the data transfer rate by allowing packets to be queued, and by creating a dedicated reader thread for pyusb. Testing on my local machine this brings data rates from ~25KB/s to ~45KB/s.
#99

The second pull request updates the flash programming code to only program pages as necessary. Mass erase programming will skip pages that would have been programmed to 0xFF (since they will already have been erased). Sector erase programming skips pages that are already the same.
#104

Tim

unread,
May 26, 2016, 8:36:23 AM5/26/16
to mbedmicro/pyOCD, mbednotifications, Mention

These data rates are embarassing low for USB (even 'full speed'). nRF51 programming is about 10 times slower with pyocd-flashtool than nrfjprog. If we are allowed to change the USB interface it would be better to use a bulk endpoint.

I assume HID was used because you don't need to write a kernel driver? If so a better alternative is to set up the device as a WinUSB Device. Basically you add a few USB string descriptors and Windows will automatically install your device using the WinUSB driver (works on Windows 7 and later). Then you can use libusb to access it. I'm not sure how it would work on OSX but it could fall back to HID if necessary.

I have code to turn mBed boards into WinUSB devices if people are interested. Is the USB CMSIS-DAP actually a standard or have the mBed people just made it up?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Sam Grove

unread,
May 27, 2016, 11:52:21 PM5/27/16
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm What boards are you using and which version of firmware. There have been quite a few improvements and still room for a few more I think.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub, or mute the thread.

Tim

unread,
May 28, 2016, 10:44:36 AM5/28/16
to mbedmicro/pyOCD, mbednotifications, Mention

Using nRF51-DK, latest firmware from Nordic. There's an mBed/CMSIS-DAP version and a JLink version (go here, click Downloads then scroll down to nRF5x-OB-JLink/mBed-IF). Using the mBed one with pyocd-flashtool takes about ... 15 seconds maybe to flash the chip. JLink takes about 2.

Using chip erase for both because Nordic say it is faster.

Dawei Shang

unread,
Nov 4, 2016, 2:19:44 AM11/4/16
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm You said, "I have code to turn mBed boards into WinUSB devices if people are interested. "

I am interested.

can you tell me how to turn to Winusb device ?

To modify firmware, uses Winusb descriptor ? Or Modify Host software ?

Thank you!

Tim

unread,
Nov 4, 2016, 6:42:53 PM11/4/16
to mbedmicro/pyOCD, mbednotifications, Mention

It's a work in progress, but it does work. The relevant bits are in WinUSBDevice.cpp/h. Check out the Readme.md for lots of wordy unfinished information. Also this excellent wiki is well worth reading.

Tim

unread,
Nov 4, 2016, 6:47:33 PM11/4/16
to mbedmicro/pyOCD, mbednotifications, Mention

By the way I should say, that code just reads an analogue in every 100 ms and sends it over an interrupt endpoint. I have code to connect to it and read it. It's not on the web anywhere so I'll just paste it here (sorry for the length!).

The device is found via its interface GUID, rather than vendor ID/product ID. The WinUSB driver is automatically installed when you plug it in.

Device.h

#pragma once

#include <initguid.h>
#include <winusb.h>
#include <usb.h>
#include <Windows.h>

#include <vector>
#include <string>

class UsbKnob
{
public:
    UsbKnob();

    bool Open(std::wstring path);
    void Close();

    // Query the device speed. Returns 0 for failure, or LowSpeed (1), FullSpeed (2), or HighSpeed (3).
    UCHAR QuerySpeed();

    // This blocks forever until a value is sent. It returns 0 on success.
    int ReadValue(double& value);

private:
    UsbKnob(const UsbKnob&) = delete;
    UsbKnob& operator=(const UsbKnob&) = delete;


    HANDLE mDeviceHandle = INVALID_HANDLE_VALUE;
    WINUSB_INTERFACE_HANDLE mWinUsbInterfaceHandle = nullptr;

    UCHAR mPipeId = 0;

};

struct UsbKnobInfo
{
    std::wstring path;
};

std::vector<UsbKnobInfo> EnumerateKnobs();

// Device Interface GUID.
// Used by all WinUsb devices that this application talks to.
// Must match "DeviceInterfaceGUIDs" registry value obtained via USB.
DEFINE_GUID(GUID_DEVINTERFACE_Iso, 0xa451588c, 0x7230, 0x4076, 0x84, 0x56, 0x9e, 0x54, 0x41, 0x65, 0xe9, 0x0c);

Device.cpp

#include "stdafx.h"

#include "Device.h"

#include <SetupAPI.h>


#include <tchar.h>
#include <strsafe.h>
#include <fstream>


UsbKnob::UsbKnob()
{
}

bool UsbKnob::Open(std::wstring path)
{
    Close();

    mDeviceHandle = CreateFile(path.c_str(),
        GENERIC_WRITE | GENERIC_READ,
        FILE_SHARE_WRITE | FILE_SHARE_READ,
        NULL,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED,
        NULL);

    HRESULT hr;

    if (mDeviceHandle == INVALID_HANDLE_VALUE)
    {
        hr = HRESULT_FROM_WIN32(GetLastError());
        return false;
    }

    BOOL bResult = WinUsb_Initialize(mDeviceHandle, &mWinUsbInterfaceHandle) != 0;

    if (bResult == FALSE)
    {
        hr = HRESULT_FROM_WIN32(GetLastError());
        Close();
        return false;
    }

    // Find the interrupt pipe.
    USB_INTERFACE_DESCRIPTOR usbInterface;

    bResult = WinUsb_QueryInterfaceSettings(mWinUsbInterfaceHandle, 0, &usbInterface);

    if (bResult == FALSE)
    {
        hr = HRESULT_FROM_WIN32(GetLastError());
        Close();
        return false;
    }

    for (UCHAR i = 0; i < usbInterface.bNumEndpoints; i++)
    {
        WINUSB_PIPE_INFORMATION_EX pipe;
        bResult = WinUsb_QueryPipeEx(mWinUsbInterfaceHandle, 0, i, &pipe);

        if (bResult == FALSE)
        {
            HRESULT hr = HRESULT_FROM_WIN32(GetLastError());
            Close();
            return false;
        }

        if ((pipe.PipeType == UsbdPipeTypeInterrupt) && USB_ENDPOINT_DIRECTION_IN(pipe.PipeId))
        {
            mPipeId = pipe.PipeId;
        }
    }

    return true;
}

void UsbKnob::Close()
{
    if (mWinUsbInterfaceHandle != nullptr)
    {
        WinUsb_Free(mWinUsbInterfaceHandle);
        mWinUsbInterfaceHandle = nullptr;
    }

    if (mDeviceHandle != INVALID_HANDLE_VALUE)
    {
        CloseHandle(mDeviceHandle);
        mDeviceHandle = INVALID_HANDLE_VALUE;
    }
}

UCHAR UsbKnob::QuerySpeed()
{
    if (mDeviceHandle == INVALID_HANDLE_VALUE)
    {
        return 0;
    }

    UCHAR deviceSpeed = 0;
    ULONG length = sizeof(UCHAR);
    BOOL bResult = WinUsb_QueryDeviceInformation(mDeviceHandle, DEVICE_SPEED, &length, &deviceSpeed);
    if (bResult == FALSE)
    {
        printf("Error getting device speed: %d.\n", GetLastError());
        return 0;
    }

    return deviceSpeed;
}

int UsbKnob::ReadValue(double& value)
{
    if (mWinUsbInterfaceHandle == nullptr)
        return -1;

    UCHAR buffer[1024];

    ULONG transferred = 0;

    BOOL bResult = WinUsb_ReadPipe(mWinUsbInterfaceHandle, mPipeId, buffer, sizeof(buffer), &transferred, nullptr);
    if (bResult == FALSE)
    {
        //      hr = HRESULT_FROM_WIN32(GetLastError());
        return HRESULT_FROM_WIN32(GetLastError());
    }

    printf("Transferred: %d\n", transferred);
    if (transferred == 4)
    {
        uint32_t t = buffer[0] +
            (buffer[1] << 8) +
            (buffer[2] << 16) +
            (buffer[3] << 24);
        value = t / double(0xFFFFFFFF);
        return 0;
    }

    return -1;
}

std::vector<UsbKnobInfo> EnumerateKnobs()
{
    PSP_DEVICE_INTERFACE_DETAIL_DATA detailData = NULL;

    std::vector<UsbKnobInfo> knobs;

    // Enumerate all devices exposing the interface
    HDEVINFO deviceInfoSet = SetupDiGetClassDevs(&GUID_DEVINTERFACE_Iso, NULL, NULL, DIGCF_PRESENT | DIGCF_DEVICEINTERFACE);

    if (deviceInfoSet == INVALID_HANDLE_VALUE)
    {
        // hr = HRESULT_FROM_WIN32(GetLastError());
        return knobs;
    }

    SP_DEVICE_INTERFACE_DATA interfaceData;
    interfaceData.cbSize = sizeof(SP_DEVICE_INTERFACE_DATA);

    // Get the first interface (index 0) in the result set
    for (int idx = 0;; ++idx)
    {
        BOOL bResult = SetupDiEnumDeviceInterfaces(deviceInfoSet, NULL, &GUID_DEVINTERFACE_Iso, idx, &interfaceData);
        if (bResult == FALSE)
        {
            if (GetLastError() == ERROR_NO_MORE_ITEMS)
                break;
            //      hr = HRESULT_FROM_WIN32(GetLastError());
            break;
        }

        // Get the size of the path string
        // We expect to get a failure with insufficient buffer
        ULONG requiredLength = 0;
        bResult = SetupDiGetDeviceInterfaceDetail(deviceInfoSet, &interfaceData, NULL, 0, &requiredLength, NULL);

        if (bResult == FALSE && GetLastError() != ERROR_INSUFFICIENT_BUFFER)
        {
            //      hr = HRESULT_FROM_WIN32(GetLastError());
            break;
        }

        // Allocate temporary space for SetupDi structure
        detailData = (PSP_DEVICE_INTERFACE_DETAIL_DATA)LocalAlloc(LMEM_FIXED, requiredLength);

        if (detailData == nullptr)
        {
//          hr = E_OUTOFMEMORY;
            break;
        }

        detailData->cbSize = sizeof(SP_DEVICE_INTERFACE_DETAIL_DATA);
        ULONG length = requiredLength;

        // Get the interface's path string
        bResult = SetupDiGetDeviceInterfaceDetail(deviceInfoSet, &interfaceData, detailData, length, &requiredLength, NULL);

        if (bResult == FALSE)
        {
//          hr = HRESULT_FROM_WIN32(GetLastError());
            LocalFree(detailData);
            break;
        }

        // SetupDiGetDeviceInterfaceDetail ensured DevicePath is NULL-terminated.
        UsbKnobInfo info;
        info.path = detailData->DevicePath;
        knobs.push_back(info);

        LocalFree(detailData);
    }
    SetupDiDestroyDeviceInfoList(deviceInfoSet);
    return knobs;
}

Use it in the obvious way, like this:

    std::vector<UsbKnobInfo> knobs = EnumerateKnobs();
    printf("Found %d knobs\r\n", knobs.size());

    if (knobs.size() != 1)
    {
        printf("Need 1 knob only.\r\n");
        return 1;
    }

    UsbKnob knob;
    if (!knob.Open(knobs[0].path))
    {
        printf("Couldn't open knob.\r\n");
        return 1;
    }

    for (;;)
    {
        double val = -1.0;
        int ret = knob.ReadValue(val);
        printf("Returned %d; Read value: %f\n", ret, val);
    }

Good luck!

Russ Butler

unread,
Nov 23, 2016, 3:28:24 PM11/23/16
to mbedmicro/pyOCD, mbednotifications, Mention

Hi @Timmmm, thanks for the information on WinUSB. This won't be able to replace CMSIS-DAP over HID at least for DAPLink, but may be possible as an alternative. Can WinUSB be used as part of a composite USB device (for example CDC + Mass storage + WinUSB), or does it apply to the whole device?

Chris Reed

unread,
Nov 23, 2016, 5:47:48 PM11/23/16
to mbedmicro/pyOCD, mbednotifications, Mention

@c1728p9 I've been thinking about this, too, where we could add a USB bulk pipe as an alternative transport. DAPLink would default to HID. A vendor specific command could be used to switch the CMSIS-DAP command stream over to the bulk pipe. Or perhaps the bulk pipe is always enabled and simply used if detected in the configuration. I'd prefer to not move away from CMSIS-DAP commands since most CMSIS-DAP commands would benefit from the reduced latency.

Btw, libusb is available for macOS via homebrew, but I've never had success using it with the pyDAPAccess libusb backend.

Tim

unread,
Nov 24, 2016, 11:57:08 AM11/24/16
to mbedmicro/pyOCD, mbednotifications, Mention

@c1728p9 According to the quick search I just did it can work with just one part of a composite device. I might see if I can modify the mBed CMSIS-DAP code to add a WinUSB interface.

Tim

unread,
Dec 1, 2016, 3:50:25 PM12/1/16
to mbedmicro/pyOCD, mbednotifications, Mention

Well I bought a DAPlink (the DipDap; great name btw), but unfortunately it seems that DAPlink doesn't support gcc and you need a £3k Keil licence to compile it, so... good luck guys!

Kyle Manna

unread,
Dec 1, 2016, 6:48:18 PM12/1/16
to mbedmicro/pyOCD, mbednotifications, Mention

I'd be interested in using such feature enhancements on Linux, is it possible to do them in a portable way? I'm not familiar with WinUSB, but it doesn't sound portable.

Russ Butler

unread,
Dec 1, 2016, 7:28:50 PM12/1/16
to mbedmicro/pyOCD, mbednotifications, Mention

It's a bummer that you are blocked on this @Timmmm. Is there anything I can do to help you? If you are able to create the right USB descriptor for a composite device using WinUSB with mbed, I could probably pull it into DAPLink. Switching this codebase over to GCC is definitely something that needs to be done, but this is far out since it requires a lot of work. An attempt was made at this earlier this year, but because the version of RTX DAPLink uses doesn't have GCC support this couldn't be done. Once RTX is updated then GCC support should be straight forward to add.

Tim

unread,
Dec 2, 2016, 4:27:24 AM12/2/16
to mbedmicro/pyOCD, mbednotifications, Mention

@kylemanna: Don't worry, Linux isn't disadvantaged by using a WinUSB device. Maybe I should explain it just to be clear.

When you plug in a USB device it provides a description of its interfaces and what 'class' they are. There are standard classes, and a vendor-defined class. Normally, if you use the standard classes the OS will automatically install a driver for you. If you use a vendor class you have to do it manually (or get it into Microsoft's online driver database).

Because of the hassle of driver installation, many people chose to (ab)use the standard HID class for non-HID-related things. It was meant for things like keyboard and mice, but actually you can send arbitrary messages over it. This is what DAPlink currently does. The problem is that HID uses an interrupt endpoint, which is very low latency, but also low bandwidth (coincidentally the kind of transport you want for keyboards and mice!).

DAPlink doesn't really care about latency, but bandwidth is more important, otherwise flashing takes ages. For that you should really use a bulk endpoint. That is the type used by USB memory sticks. It has much higher bandwidth. The LPC11U35 used on the DipDap has USB Full Speed (12 Mb/s) - USB High Speed (480 Mb/s) is relatively uncommon on microcontrollers. Anyway the maximym bandwidth of USB Full Speed using bulk transfers is around 1.2 MB/s which should be plenty for the tiny firmwares we want to transfer (see the USB 2 spec, table 5-9).

So ideally we want to use a vendor-class bulk endpoint with a libusb-compatible driver. As I said, this means we need to manually install a driver which is a bit of a pain. On Windows you can do that with Zadig, or you make a .inf file and the user has to add it via the control panel. 'WinUSB devices' avoid this faff. They basically add some Microsoft-specific metadata (called WCID; Windows Compatible ID) to the USB descriptors that tell Windows to automatically install the WinUSB driver automatically for that device when you plug it in.

WinUSB is a generic user-space USB driver. If you access your USB device via WinUSB you can basically do any USB communication with it. There's really not much reason to use vendor-specific kernel-space USB drivers any more, unless you really need to do kernel-related things, for example if you're making a USB graphics card. If you're familiar with libusb, then WinUSB is basically Microsoft's version. In fact libusb can use WinUSB as a backend on Windows.

So in the ideal solution, DAPlink would contain the WCID data it needs to be identified as a WinUSB device, Windows would automatically install WinUSB as a driver for it (just for the bulk vendor interface; the HID interface will have to stick around for compatibility and will continue to use the native HID driver), and then pyOCD can use libusb to access it.

On Linux I believe you don't need to install any drivers to use libusb and you can just ignore the WCID data. You might need to set up some udev rules but Linux users are used to inconvenience so they won't mind! :-P

Sorry that turned into a bit of an essay!

Tim

unread,
Dec 2, 2016, 4:27:48 AM12/2/16
to mbedmicro/pyOCD, mbednotifications, Mention

@c1728p9 Ok I'll see if I can make a rough edit to the code without compiling it.

Kyle Manna

unread,
Dec 2, 2016, 12:57:18 PM12/2/16
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm As long as something like libusb fills the void on Linux I'm happy and look forward to your work! I was afraid alot of effort would be invested in building an interface that works fast on Windows then to have to write some more middleware to work on Linux and macOS.

As a timely note, have you see then: http://hackaday.com/2016/12/02/black-magic-probe-the-best-arm-jtag-debugger/ ?

It seems that they moved the gdb JTAG server to the debug probe which I would imagine would have even lower latency:

But the BMP’s killer feature is that it runs a GDB server on the probe. It opens up a virtual serial port that you can connect to directly through GDB on your host computer. No need to hassle around with OpenOCD configurations, or to open up a second window to run [texane]’s marvelous st-util. Just run GDB, target extended-remote /dev/ttyACM0 and you’re debugging.

This could perhaps be an even simpler implementation assuming USB CDC driver in Windows/Linux/macOS is up to the for a different kind of for latency and throughput.

Tim

unread,
Dec 6, 2016, 4:44:12 AM12/6/16
to mbedmicro/pyOCD, mbednotifications, Mention

Quick update: I've modified the USB descriptors so that it has an extra vendor-defined interface with bulk in and bulk out endpoints, but sadly I have discovered that the LPC11U35 only supports 10 endpoints (5 in and 5 out). Two are used for endpoint 0 (mandatory), two for USB Mass Storage Class (MSC), two for HID, and 4 for CDC (Serial). So there are none left for my custom interface. :-(

What do people think? Ditch HID? Is there anything apart from pyOCD that relies on it?

Russ Butler

unread,
Dec 6, 2016, 10:00:07 AM12/6/16
to mbedmicro/pyOCD, mbednotifications, Mention

Could you disable the HID endpoints for testing? It would be good to know if you can still use the other interfaces - MSD and CDC with a WinUSB device.

There are a couple programs which use HID. Offhand I know IAR Embedded Workbench and Keil uVision use CMSIS-DAP over hid. I think Atmel Studio does as well.

Even so, it might be possible with a secondary configuration descriptor or interface descriptor to achieve backwards compatibility.

Chris Reed

unread,
Dec 6, 2016, 12:40:07 PM12/6/16
to mbedmicro/pyOCD, mbednotifications, Mention

We could have a CMSIS-DAP vendor specific command that triggers a reenumeration with the bulk endpoints enabled and HID disabled. Considerably more complicated, but it would work around the lack of endpoints.

Russ Butler

unread,
Jan 4, 2017, 10:50:09 AM1/4/17
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm did you have any luck getting this working? Also, as another potential option data could be sent over the control endpoint, without the need for any dedicated endpoints. This would allow the code to bypass the 64 byte limit, and would also probably allow packets to be send faster than every 1ms. What do you think?

Tim

unread,
Jan 5, 2017, 6:31:40 AM1/5/17
to mbedmicro/pyOCD, mbednotifications, Mention
Sadly I got distracted by other projects but it is still on my todo list!

But yes actually using control transfers sounds like a great idea -
according to the USB 2 spec (Table 5-3) you can get up to about 15 MB/s
which should be plenty. Bit of a hack but it should also be much easier to
implement and means we can keep all the other interfaces.

On 4 January 2017 at 15:50, Russ Butler <notifi...@github.com> wrote:

> @Timmmm <https://github.com/Timmmm> did you have any luck getting this
> <https://github.com/mbedmicro/pyOCD/issues/23#issuecomment-270404283>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AAXACjtaMExcHBg9l4m22uNTpQtLbw06ks5rO7-vgaJpZM4B2ZUN>
> .

Tim

unread,
Jan 5, 2017, 4:39:05 PM1/5/17
to mbedmicro/pyOCD, mbednotifications, Mention
Wait scratch that. Table 5-3 is high speed; full speed is table 5-2 which
says control transfers go up to about 800 kB/s which is ok I guess. Bulk
and interrupt both go to about 1.2 MB/s. I guess HID imposes additional
limits.

Russ Butler

unread,
Jan 5, 2017, 4:58:37 PM1/5/17
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm I think the limitation comes from the endpoint descriptors where the polling rate is specified. I believe the fastest polling rate is 1ms for full speed devices.

Tim

unread,
Jan 6, 2017, 5:33:30 PM1/6/17
to mbedmicro/pyOCD, mbednotifications, Mention

I checked and bInterval is already set to 1 (poll every frame) for the HID endpoints.

Tim

unread,
Jan 7, 2017, 1:12:44 PM1/7/17
to mbedmicro/pyOCD, mbednotifications, Mention

@c1728p9 Or do you mean that DAPlink doesn't queue packets? I guess if every packet is acked before the next one is sent you'd be limited to something like 32 kB/s which would explain why it is so slow.

Chris Reed

unread,
Jan 7, 2017, 2:19:09 PM1/7/17
to mbedmicro/pyOCD, mbednotifications, Mention

Packet queuing is functional in DAPLink. We use it in pyOCD to improve download speeds quite a bit.

Russ Butler

unread,
Jan 8, 2017, 12:19:30 PM1/8/17
to mbedmicro/pyOCD, mbednotifications, Mention

@Timmmm I was trying to elaborate on the additional limits you referenced. You mentioned that interrupt transfers can go up to 1.2 MB/s. The limiting factor here though is the endpoint descriptor polling rate.

Chris Reed

unread,
Oct 19, 2018, 4:41:09 PM10/19/18
to mbedmicro/pyOCD, mbednotifications, Mention

Closing this issue. High speed USB is one option for improving performance while sticking with HID. And in the coming months we will be supporting CMSIS-DAPv2 that uses bulk endpoints and WinUSB to enable userland access on Windows (see #431).

Chris Reed

unread,
Oct 19, 2018, 4:41:10 PM10/19/18
to mbedmicro/pyOCD, mbednotifications, Mention

Closed #23.

Reply all
Reply to author
Forward
0 new messages