Trying to debug the "opencbm stops working after cbmctrl status command" problem

278 views
Skip to first unread message

Martin Thierer

unread,
Mar 29, 2020, 4:18:13 PM3/29/20
to ZoomFloppy Users
I try to build a xum1541 adapter to interface my 1541-II floppy using an arduino pro micro clone with an atmega32u4.

I ran into a problem that I suspect to be the same as mentioned in the beginning of this thread and this issue in the github repository that introduced the pro micro support for the xum1541 firmware.

The symptoms are like this: When I issue a "cbmctrl status" command with the disc drive connected and a disc in the drive, the command seems to work fine the first time after plugging the adapter in. But running the command a second time, the program hangs and and the adapter keeps blinking. On the other hand, multiple consecutive "cbmctrl dir" commands work just fine.

I thought I'll give this a shot, but it turned out to be much harder than I expected...

Here's what I found out so far:

The reason why the adapter locks up on the second try is that for some reason the first 4 bytes of the talk command (09 13 02 00) are not received by the xum1541 firmware. But the 2 data bytes (48 6f) are and are then are processed by USB_ReadBlock() in USB_BulkWorker() because Endpoint_Read_Stream_LE() seems to be happy to read only 2 bytes instead of the 4 that are expected. The "48 6f" plus two garbage bytes are then fed to usbHandleBulk() which can't make sense of 0x48 as a command and bails out with an error. That's why the status response is never sent while on the host xum1541_wait_status() waits for that response. Which is why it blocks and never sends the XUM1541_SHUTDOWN which in turn is why the led keeps blinking.

At the moment I'm completly puzzled why the four command bytes are ignored by the atmega32u4. I hooked up a logic analyzer to the USB data lines and I can see the bytes being on the bus and being acknowledged. But neither the RXOUTI nor the RWAL bit are set in the MCU, That only happens a bit later, when the two data bytes arrive. The atmega32u4 datasheet states that RWAL doesn't get set "if STALLRQ is set, or in case of error" but I found no indication of STALLRQ being set and it doesn't specify which "errors" might lead to RWAL not being set. I checked a few status bits but haven't found anything that explains this behaviour. I suspected that maybe a USB interrupt consumes the data in the background, but there seems to be no interrupt.

This is on a computer running linux with libusb 1.0.23 and the xum1541 variant with the 7406 buffer ic (but I don't think that matters).

Any idea why the first message to the OUT endpoint might get lost (and what could reset that condition so the next message with the data bytes goes through)? I think the problem isn't actually with the start of the second transfer but the end of the first one, because a "cbmctrl dir" after a "cbmctrl status" also fails, while consecutive "cbmctrl dir" commands are no problem. I just haven't found yet what difference in these two commands leads to the different outcomes.

Martin

Spiro Trikaliotis

unread,
Mar 31, 2020, 11:21:30 AM3/31/20
to ZoomFloppy Users
Hello Martin,

* On Sun, Mar 29, 2020 at 01:18:12PM -0700 Martin Thierer wrote:

> I thought I'll give this a shot, but it turned out to be much harder than I
> expected...

Thank you for your efforts!

> The reason why the adapter locks up on the second try is that for some reason
> the first 4 bytes of the talk command (09 13 02 00) are not received by the
> xum1541 firmware. But the 2 data bytes (48 6f) are [...]

Do you have the possibility to check if the first bytes (09 13 02 00)
are not received, OR if they are handled by the previous command? That
is, is it possible that for some reason, the firmware believes the
previous command has not finished yet and handles these bytes as being
part of the previous command?

> At the moment I'm completly puzzled why the four command bytes are ignored by
> the atmega32u4. I hooked up a logic analyzer to the USB data lines and I can
> see the bytes being on the bus and being acknowledged.

I am not an expert on USB. But: Can it be that the USB stack thinks that
these bytes belong to a previous command, or, at least, that they
acknowledge something that has happened before?

> This is on a computer running linux with libusb 1.0.23 and the xum1541 variant
> with the 7406 buffer ic (but I don't think that matters).

Are you using a SparkFun Pro Micro setup? I have one here, too, and I am
currently thinking how I can create a setup so I can see the console
debugging output.

Regards,
Spiro.

--
Spiro R. Trikaliotis
http://www.trikaliotis.net/

Spiro Trikaliotis

unread,
Mar 31, 2020, 11:55:51 AM3/31/20
to ZoomFloppy Users
Hello again Martin,

* On Tue, Mar 31, 2020 at 05:21:26PM +0200 I wrote:

I have had a short look at the xum1541 sources:

Without actually knowing the details (!), I would have a look if the calls to
functions like usbIoDone() and usbIoInit() are balanced.

If they are, I would instrument all functions in xum1514/ieee.c in order
to find out what is called with each command, and have a look at these.

Martin Thierer

unread,
Mar 31, 2020, 2:36:22 PM3/31/20
to ZoomFloppy Users
Hi Spiro,

thanks for sharing your thoughts!


> The reason why the adapter locks up on the second try is that for some reason
> the first 4 bytes of the talk command (09 13 02 00) are not received by the
> xum1541 firmware. But the 2 data bytes (48 6f) are [...]

Do you have the possibility to check if the first bytes (09 13 02 00)
are not received, OR if they are handled by the previous command? That
is, is it possible that for some reason, the firmware believes the
previous command has not finished yet and handles these bytes as being
part of the previous command?

that's what I thought at first, too, but I no longer do. In the logic analyzer trace I can see that after the command data is on the bus, it just isn't picked up by USB_BulkWorker() called from the main loop, because Endpoint_IsReadWriteAllowed() doesn't become true. But it *does* a bit later, when the two data bytes arrive. I don't think there's a different place in the main loop where data is read from the OUT endpoint. I think all the usbXXX() functions in commands.c are only called from usbHandleBulk(), but when the problem happens the code doesn't even get there. As I wrote in my first post, I thought it my be read from a USB ISR, but I found no evidence of that, neither. The code seems to use interrupts only for control messages.

Are you using a SparkFun Pro Micro setup? I have one here, too, and I am
currently thinking how I can create a setup so I can see the console
debugging output.

I use a cheap pro micro from aliexpress. I don't know if it differs from the sparkfun version. But if it doesn't it should be pretty easy for you to reproduce the problem. Doesn't your board lock up when you issue two "cbmctrl status" commands in a row?

At the moment I suspect that the communication is sometimes messed up by the code in EVENT_USB_Device_ConfigurationChanged() when it is called the second time. The comment for USB_ResetConfig() suggests that the author was aware of a similar problem. Seems like the code was developed for an at90usb162 mcu. Maybe the atmega32u4 used in the pro micro board has some subtle differences in its USB handling.

My plan is to go through the LUFA code and the atmega32u4 datasheet and try to understand what exactly happens in the "ConfigurationChanged" code.

Martin

Spiro Trikaliotis

unread,
Mar 31, 2020, 5:39:26 PM3/31/20
to ZoomFloppy Users
Hello Martin,

* On Tue, Mar 31, 2020 at 11:36:22AM -0700 Martin Thierer wrote:

> that's what I thought at first, too, but I no longer do. In the logic analyzer
> trace I can see that after the command data is on the bus, it just isn't picked
> up by USB_BulkWorker() called from the main loop,
[...]

Ok, so you are very deep into the code. That's good.

> Are you using a SparkFun Pro Micro setup? I have one here, too, and I am
> currently thinking how I can create a setup so I can see the console
> debugging output.
>
> I use a cheap pro micro from aliexpress. I don't know if it differs from the
> sparkfun version. But if it doesn't it should be pretty easy for you to
> reproduce the problem. Doesn't your board lock up when you issue two "cbmctrl
> status" commands in a row?

No, I do not have this problem here!

That's the weird thing: I know that some people have this problem, but I
do not know what exactly triggers it.

Oh, are you on Windows or on Linux (or MacOS)? Are you using libusb1 or
libusb0? Are you using the latest git master?

> My plan is to go through the LUFA code and the atmega32u4 datasheet and try to
> understand what exactly happens in the "ConfigurationChanged" code.

Please note that the LUFA code is from 2009 (091223)! We did not update it since
then. The latest one seems to be 170418.

Trying to use a newer LUFA failed for me until now, because there have
some things been changed. It might be a better solution than to try to
debug this issue, if you believe that the problem might be in LUFA.

Spiro Trikaliotis

unread,
Apr 1, 2020, 11:37:24 AM4/1/20
to ZoomFloppy Users
Hello,

* On Tue, Mar 31, 2020 at 11:39:22PM +0200 I wrote:

> Please note that the LUFA code is from 2009 (091223)! We did not update it since
> then. The latest one seems to be 170418.
>
> Trying to use a newer LUFA failed for me until now, because there have
> some things been changed. It might be a better solution than to try to
> debug this issue, if you believe that the problem might be in LUFA.

I tried to work on updating LUFA to the last version (170418). It can be
found on the branch lufa-170418 on github.

Unfortunately, after flashing it, the device does not answer anymore. So
I have a problem, and I do not know what it is at the moment. That's why
I did not add compiled HEX files, because I do not want people to brick
their device with it.

Anyone who has more LUFA experience than me is free to have a look. ;)

Martin Thierer

unread,
Apr 1, 2020, 2:06:12 PM4/1/20
to ZoomFloppy Users
I make some progress: When I put both the USB_ResetConfig() and the Endpoint_ConfigureEndpoint() calls in EVENT_USB_Device_ConfigurationChanged() behind a flag, so they get only called on the first "set configuration" message after the device was plugged in, the problem is gone. I've only tested "cbmctrl status" and "cbmctrl dir" made a disc image with d64copy. So there might be other things that no longer work now.


>     Are you using a SparkFun Pro Micro setup? I have one here, too, and I am
>     currently thinking how I can create a setup so I can see the console
>     debugging output.
>
> I use a cheap pro micro from aliexpress. I don't know if it differs from the
> sparkfun version. But if it doesn't it should be pretty easy for you to
> reproduce the problem. Doesn't your board lock up when you issue two "cbmctrl
> status" commands in a row?

No, I do not have this problem here! 
 
That's the weird thing: I know that some people have this problem, but I
do not know what exactly triggers it.

Oh, are you on Windows or on Linux (or MacOS)? Are you using libusb1 or
libusb0? Are you using the latest git master?

That's interesting. I thought the people that are not able to reproduce it are using different boards. I'm using git master from github on linux with libusb 1.0.23. 

Are you by any chance on Windows? Because the comment for USB_ResetConfig() suggests that only linux and macos send multiple multiple "set configuration" messages. So if windows doesn't send them (because maybe it caches the last value and doesn't send the message if it hasn't change) it would be plausible that the problem isn't triggered there. If it otherwise works on windows it might also suggest that the usb reset in EVENT_USB_Device_ConfigurationChanged() isn't necessary at all.

Right now I still think that's there's a bug somewhere that's only triggered by the multiple usb resets, so I'll keep investigating. Reading the atmega32u4 datasheet I can't see why it shouldn't be possible to do a reset. But if I can't come up with a proper fix, my current hack might be an option :)

> My plan is to go through the LUFA code and the atmega32u4 datasheet and try to
> understand what exactly happens in the "ConfigurationChanged" code.

Please note that the LUFA code is from 2009 (091223)! We did not update it since
then. The latest one seems to be 170418.

Trying to use a newer LUFA failed for me until now, because there have
some things been changed. It might be a better solution than to try to
debug this issue, if you believe that the problem might be in LUFA.

I think updating LUFA opens an even bigger can of worms and I doubt it would help in this case. If it really turns out to be a bug in LUFA (right now I doubt it), it's probably easier to patch the old version as a first step and maybe update LUFA later.

Martin
 

Martin Thierer

unread,
Apr 1, 2020, 2:09:45 PM4/1/20
to ZoomFloppy Users
I tried to work on updating LUFA to the last version (170418). It can be
found on the branch lufa-170418 on github.

Unfortunately, after flashing it, the device does not answer anymore. So
I have a problem, and I do not know what it is at the moment. That's why
I did not add compiled HEX files, because I do not want people to brick
their device with it.

What do you mean "the device does not answer anymore"? Are you talking about the pro micro? Does it just not work as a XUM1541 device or do you think you bricked it? If the latter: How did you flash it? At least the bootloader should still work?

Martin

Spiro Trikaliotis

unread,
Apr 1, 2020, 3:04:47 PM4/1/20
to ZoomFloppy Users
Hello Martin,

* On Wed, Apr 01, 2020 at 11:09:45AM -0700 Martin Thierer wrote:

> What do you mean "the device does not answer anymore"? Are you talking about
> the pro micro? Does it just not work as a XUM1541 device or do you think you
> bricked it?

I bricked it.

> If the latter: How did you flash it? At least the bootloader should
> still work?

I programmed it via the bootloader. Unfortunately, the bootloader did
not respond, either. I do not know what this happened.

However, I used another Arduino as "Arduino as ISP" to read it out.
The bootloader was still intact, so I am not sure why it did not react.
I deleted the firmware, leaving only the bootloader, and everythings
works again as expected.

But, of course, I want to prevent others from this experience,
especially if they do not know how to fix it.

Spiro Trikaliotis

unread,
Apr 1, 2020, 3:09:57 PM4/1/20
to ZoomFloppy Users
Hello Martin,

* On Wed, Apr 01, 2020 at 11:06:12AM -0700 Martin Thierer wrote:
> I make some progress: When I put both the USB_ResetConfig() and
> the Endpoint_ConfigureEndpoint() calls in EVENT_USB_Device_ConfigurationChanged
> () behind a flag, so they get only called on the first "set configuration"
> message after the device was plugged in, the problem is gone. I've only tested
> "cbmctrl status" and "cbmctrl dir" made a disc image with d64copy. So there
> might be other things that no longer work now.

Can you show a diff of what you have done exactly?

> That's the weird thing: I know that some people have this problem, but I
> do not know what exactly triggers it.
[...]
> That's interesting. I thought the people that are not able to reproduce it are
> using different boards. I'm using git master from github on linux with libusb
> 1.0.23. 

It does not seem to depend on the board. I even have had a report from
someone where an original ZoomFloppy never worked. She sent it to me, it
worked here on Windows and Linux, but when she got it back, it did not work
there.

That's weird, and we never found a solution to it.

> Are you by any chance on Windows?

I can test on Windows, but my main development machine is Linux, Debian
Buster at the moment.

> Because the comment for USB_ResetConfig()
> suggests that only linux and macos send multiple multiple "set configuration"
> messages. So if windows doesn't send them (because maybe it caches the last
> value and doesn't send the message if it hasn't change) it would be plausible
> that the problem isn't triggered there. If it otherwise works on windows it
> might also suggest that the usb reset in EVENT_USB_Device_ConfigurationChanged
> () isn't necessary at all.

I believe I have had more reports from Windows users than from Linux
users. This might be because more people are using Windows, though.


> I think updating LUFA opens an even bigger can of worms and I doubt it would
> help in this case. If it really turns out to be a bug in LUFA (right now I
> doubt it), it's probably easier to patch the old version as a first step and
> maybe update LUFA later.

To me, it is not completely unplausible. When the version of LUFA that
we are currently using came out in 2009, m32u4 boards were not very
common. So, it is a possibility in my opinion.

Martin Thierer

unread,
Apr 1, 2020, 3:39:49 PM4/1/20
to ZoomFloppy Users
Hi Spiro,


Can you show a diff of what you have done exactly?

This change is all I need to make it work (diff -w here for brevity, I'll attach the full patch as a file):
 
diff --git a/xum1541/main.c b/xum1541/main.c
index 0680308..7e113a8 100644
--- a/xum1541/main.c
+++ b/xum1541/main.c
@@ -105,11 +105,14 @@ EVENT_USB_Device_Disconnect(void)
     board_set_status(STATUS_INIT);
 }
 
+static int endpoints_configured = 0;
+
 void
 EVENT_USB_Device_ConfigurationChanged(void)
 {
     DEBUGF(DBG_ALL, "usbconfchg\n");
 
+    if (endpoints_configured == 0) {
         // Clear out any old configuration before allocating
         USB_ResetConfig();
 
@@ -123,6 +126,9 @@ EVENT_USB_Device_ConfigurationChanged(void)
         Endpoint_ConfigureEndpoint(XUM_BULK_OUT_ENDPOINT, EP_TYPE_BULK,
             ENDPOINT_DIR_OUT, XUM_ENDPOINT_BULK_SIZE, ENDPOINT_BANK_DOUBLE);
 
+        endpoints_configured = 1;
+    }
+
     // Indicate USB connected and ready to start event loop in main()
     board_set_status(STATUS_READY);
     device_running = true;

>     That's the weird thing: I know that some people have this problem, but I
>     do not know what exactly triggers it.
[...]
> That's interesting. I thought the people that are not able to reproduce it are
> using different boards. I'm using git master from github on linux with libusb
> 1.0.23. 

It does not seem to depend on the board. I even have had a report from
someone where an original ZoomFloppy never worked. She sent it to me, it
worked here on Windows and Linux, but when she got it back, it did not work
there.

That's weird, and we never found a solution to it.

> Are you by any chance on Windows?

I can test on Windows, but my main development machine is Linux, Debian
Buster at the moment.

> Because the comment for USB_ResetConfig()
> suggests that only linux and macos send multiple multiple "set configuration"
> messages. So if windows doesn't send them (because maybe it caches the last
> value and doesn't send the message if it hasn't change) it would be plausible
> that the problem isn't triggered there. If it otherwise works on windows it
> might also suggest that the usb reset in EVENT_USB_Device_ConfigurationChanged
> () isn't necessary at all.

I believe I have had more reports from Windows users than from Linux
users. This might be because more people are using Windows, though.

Ok, so it's more complex than I hoped. Maybe there's some sort of timing issue.
 
Martin
xum1541.patch

Spiro Trikaliotis

unread,
Apr 2, 2020, 11:14:00 AM4/2/20
to ZoomFloppy Users
Hello,

* On Wed, Apr 01, 2020 at 09:04:44PM +0200 I wrote:

> > What do you mean "the device does not answer anymore"? Are you talking about
> > the pro micro? Does it just not work as a XUM1541 device or do you think you
> > bricked it?
>
> I bricked it.

This is not completely true. Reprogramming it with the firmware again,
it did not respond, but I could reprogram it via USB again (after doing
a double-reset to go into the bootloader).

I do not know why this did not work the last time..

In the meantime, the commit 51a08db fixes the most prominent problems.
The new LUFA does not activate interrupts by itself anymore, but the
firmware has to do it. Additionally, the USB version was given wrong (1,
10, 0) instead of (1, 1, 0).

https://github.com/OpenCBM/OpenCBM/commit/51a08db3c255bfd5b4360feae9eb2bb7bbf47b56

Now, xum1541cfg finds the device and gives the features correctly.
However, doing an operation (like cbmctrl reset) results in an -ENOPIPE
answer on the USB bus (and, BTW, a crash of OpenCBM). I will investigate
further.

Spiro Trikaliotis

unread,
Apr 3, 2020, 4:35:33 PM4/3/20
to ZoomFloppy Users
Hello,

the LUFA-170418 update is working now.

It is in the lufa-170418 branch on github.

You might want to give it a try. Does your problem with the consecutive
calls still occur?

I did not compile the firmware, please do on your own until I have some
feedback that it works as expected.

Martin Thierer

unread,
Apr 4, 2020, 3:30:25 AM4/4/20
to ZoomFloppy Users
Hi Spiro,

thanks for your work!

I gave the version from the lufa-170418 branch a quick try and it seems to work like the old version. I guess that's both a good and a bad thing :)

What worked with the old version still seems to work, but unfortunately cbmctrl still hangs after the first "cbmctrl status". (Multiple "cbmctrl dir" still work). I haven't checked if there's a difference in what is happening on the usb bus. My "fix" still works, though. (I still don't consider it a real fix, I think it just somehow doesn't trigger the real bug).

Martin 

Spiro Trikaliotis

unread,
Apr 5, 2020, 3:22:22 PM4/5/20
to ZoomFloppy Users
Hello Martin,

* On Wed, Apr 01, 2020 at 12:39:49PM -0700 Martin Thierer wrote:
 
> +static int endpoints_configured = 0;
> +
>  void
>  EVENT_USB_Device_ConfigurationChanged(void)
>  {
>      DEBUGF(DBG_ALL, "usbconfchg\n");
>  
> +    if (endpoints_configured == 0) {
>          // Clear out any old configuration before allocating
>          USB_ResetConfig();

Can you just comment out USB_ResetConfig() (and leave everything else
as-it-is, without your patch!)

Does it solve your problem? Does it introduce any other problems you are
aware of?

Having a look at USB_ResetConfig(), the whole function looks suspicious
to me.

The configuration of the EndPoint seems to be necessary, though, if I
look at the provided sample BulkVendor of LUFA.

Spiro Trikaliotis

unread,
Apr 5, 2020, 3:26:07 PM4/5/20
to ZoomFloppy Users
Hello Martin,

* On Sat, Apr 04, 2020 at 12:30:25AM -0700 Martin Thierer wrote:

> thanks for your work!

At least, the xum1541 firmware still compiles and works with the latest
LUFA git master. So, we are future-proof. ;)

> What worked with the old version still seems to work, but unfortunately cbmctrl
> still hangs after the first "cbmctrl status". (Multiple "cbmctrl dir" still
> work). I haven't checked if there's a difference in what is happening on the
> usb bus. My "fix" still works, though. (I still don't consider it a real fix, I
> think it just somehow doesn't trigger the real bug).

As the BulkVendor example does not even use Endpoint_ResetEndpoint() (=
Endpoint_ResetFIFO(), as it is called in the older LUFA) and
Endpoint_ResetDataToggle(), nor Endpoint_IsStalled() or
Endpoint_ClearStall(), I suspect using them might be the problem in the
first place.

Martin Thierer

unread,
Apr 5, 2020, 4:08:09 PM4/5/20
to ZoomFloppy Users
Can you just comment out USB_ResetConfig() (and leave everything else
as-it-is, without your patch!)

Does it solve your problem? Does it introduce any other problems you are
aware of?

I'm pretty sure I've already tried that, but I just checked (with the new LUFA version), and no that doesn't help. It doesn't seem to make it worse, at least "cbmctrl dir" still works. Which might suggest that USB_ResetConfig() really isn't needed at all.

I also tried (not now but before) to leave either USB_ResetConfig() or the two Endpoint_ConfigureEndpoint() calls out of the "if (endpoints_configured == 0)" block and that doesn't work neither. Both must be skipped for the patch to work.

Martin

Martin Thierer

unread,
Apr 5, 2020, 4:13:28 PM4/5/20
to ZoomFloppy Users
As the BulkVendor example does not even use Endpoint_ResetEndpoint() (=
Endpoint_ResetFIFO(), as it is called in the older LUFA) and
Endpoint_ResetDataToggle(), nor Endpoint_IsStalled() or
Endpoint_ClearStall(), I suspect using them might be the problem in the
first place.

I also tried to comment out some of the calls in USB_ResetConfig() and if I remember right, even just calling Endpoint_ResetFIFO() was enough to trigger the problem. All that does is set the reset bit for the specified endpoint in UERST. The datasheet of the atmega32u4 says about that

An endpoint can be reset at any time by setting in the UERST register the bit corresponding to the endpoint
(EPRSTx). This resets:
- the internal state machine on that endpoint
- the Rx and Tx banks are cleared and their internal pointers are restored
- the UEINTX, UESTA0X and UESTA1X are restored to their reset value
The data toggle field remains unchanged.
The other registers remain unchanged.
The endpoint configuration remains active and the endpoint is still enabled.

Sounds like a valid thing to do to me. Of course, the fifos are cleared, but at the time of the configuration message the endpoints should be idle anyway.

That said, I would be willing to believe that resetting the endpoints just messes things up and should not be done, but then why does "cbmctrl dir" work? That really puzzles me.

Right now I try to strip down the communication in both cbmctrl and the firmware in the hope to make it simply enough to make it easier to spot the problem. I have code that just sends 4 bytes back and forth which should be simple enough to work but that still locks up on the second invocation.

I already went through a lot of suspicions what might be the cause but so far they all turned out to be wrong...

Martin

Spiro Trikaliotis

unread,
May 12, 2020, 1:14:38 PM5/12/20
to ZoomFloppy Users
Hello,

* On Wed, Apr 01, 2020 at 12:39:49PM -0700 Martin Thierer wrote:
>
> Ok, so it's more complex than I hoped. Maybe there's some sort of timing issue.

due to a discussion on the German Forum64, I got some more insight into
the "opencbm stops working after cbmctrl status".

There was one user that had exact the same problem on an original ZF. I
gave him a modified firmware, with lufa-170418 und the patch by Martin,
and it worked for him. Let's call this firmware v08-PATCHED.

Then, he gave me another hint: He had his problems running opencbm in an
lxc container (cf. https://linuxcontainers.org/)

This made me curious. I wanted to find out if installing lxc changes
anything. Indeed, it did, even for me!

But, it was surprising to me:

1. ZoomFloppy with V07: worked
2. ZoomFloppy with V08: worked
3. ZoomFloppy with V08-MOD: did NOT work!

That is, my behaviour is exactly the opposite to what you, Martin, have
encountered!

For me, it makes me believe with even more confidence that the issue is
some timing problem!

Regards,
Spiro.

--
Spiro R. Trikaliotis
http://spiro.trikaliotis.net/

Martin Thierer

unread,
May 12, 2020, 2:43:47 PM5/12/20
to ZoomFloppy Users
Hi Spiro,

I actually found the problem a few weeks ago and it's not a timing issue...

Turns out USB bulk transfers have this thing called data toggle. The toggle is either 0 or 1 and gets inverted with with every ack that's either sent or received on an endpoint. When the receiving end receives a packet with a toggle value it doesn't expect, it just ignores the packet. The rationale behind this behaviour is a safeguard against a possible loss of an ack packet. If the receiver acks a packet but the ack doesn't make it to the sender, the sender will just send it again and the receiver will ignore the packet (which would otherwise be a duplicate).

Of course this means that the data toggles of the sending and the receiving end have to be in sync. And that's where the problem starts...

My computer has both USB2 and USB3 ports. The USB2 ports are managed by the linux xhci_hcd driver and the usb2 ports by the ehci-pci driver. When I first experienced the problem, I had the xum1541 port plugged into an USB3 port. I later found that it works if I plug it into an USB2 port.

The reason is that apparently the xhci_hcd driver (or libusb when using this driver, I'm not sure) does not reset its data toggles when a "set configuration" message is issued, while the ehci-pci driver does. As the xum1541 *does* reset its data toggles when it receives a "set configuration" message, that makes the data toggles go out of sync when using the xhci_hcd driver. This is also the reason why my "fix" isn't really a fix as it only makes it work for the xhci_hcd driver and breaks it for the ehci (USB2) driver (because in that case the host *does* reset its data toggles while the xum1541 no longer does). I guess what you are experiencing with the lxc container is a similar issue. Maybe the container handles the USB ports in a different way or it is running a different linux version with different bugs :) (I guess you were using the same usb ports for your tests from within the container?).

Sorry that I didn't get back to you with this information earlier. I planned to do some more tests and then post to either the libusb or the linux-usb mailing list. I'm still not sure if it's a problem with linux or libusb (or maybe something else). I have a hard time believing that such a bug would go unnoticed in the linux kernel. That said, now that I know what to search for I found multiple reports of similar problems, but all without a solid resolution. I'm not even sure what the *correct* behaviour is. The docs for libusb_set_configuration() suggest that 'set configuration' should reset the data toggles, but reading the usb specification, I'm not so sure. It *does* state that setting the configuration should reset them, but it does that while talking about bringing a device from an unconfigured into a configured state, which isn't really the case when setting a device to a configuration it already is in.

But regardless what the actual problem is, waiting for its fix won't in the short term anyway, so these are possible fixes I came up with:

1. Get rid of the libusb_set_configuration() call in xum1541_init(). This fixes the problem for me on all my usb ports. The device gets automatically configured when it's plugged in anyway, so the call is not needed. This might not be true for other configurations, though.
2. So to work around that possible problem, it would be possible to only make the libusb_set_configuration() call if the device isn't already configured. This also worked for me, but unfortunately, there are two problems: 1. libusb_get_configuration() isn't imported by dynlibusb.c and 2. it's not available in libusb0 at all. So it would be necessary to send a custom control message to check if the device is already configured.
3. So I propose a simpler solution: What also works for me is disconfiguring the device in xum1541_close() by calling usb.set_configuration() with a configuration of -1.

The last fix also worked for me with both USB port types on linux, but there is one caveat: I also did some checks on windows. I don't use windows a lot, so I didn't bother to get opencbm to compile there, but I wrote a small python script that simulates the "cbmctrl status" command with pyusb (that shows the same behaviour). Playing with it on Windows 10 (using libusb1 / Winusb) I found that trying to deconfigure the device returned an error. That might be a problem of using libusb via pyusb, but I'm not sure. If the error also happens when using libusb from c, then it's probably save to just ignore the error. Windows seems to silently ignore libusb_set_configuration() calls otherwise, anyway.

If you give me a few days I'll prepare patches for each of the 3 possible fixes to show what worked for me.

Again, sorry for not reporting back earlier...

Martin

P.S. The reason why the problem shows with "cbmctrl status" but not eg "cbmctrl dir" is that it only happens when an odd number of messages is sent. Otherwise, the data toggles are already zero, so it doesn't matter if they are reset or not. "cbmctrl status" does exactly 5 writes (2 for "TALK", 1 to read the status and 5 for "UNTALK"), so it triggers the problem. OTOH the protocol for "cbmctrl dir" seems to inherently always use an even number of packets, so there's never a problem... (I didn't research that entirely, that's one of the things I still plan to do, but from what I found so far that seems to be the case).

Spiro Trikaliotis

unread,
May 12, 2020, 4:54:05 PM5/12/20
to ZoomFloppy Users
Hallo Martin,

thank you for sharing your insights.


I must admit that I do not know enough on USB to tell you what should be
right, and what should be wrong.

There is one thing, though, that I stumbled upon

In xum1541/descriptor.c, we tell that the USB Version is 1.10
(VERSION_BCD(01.10) or VERSION_BCD(1, 1, 0) in LUFA-170418).

However, in the Makefile, we specify:

-DUSE_STATIC_OPTIONS="(USB_DEVICE_OPT_FULLSPEED | USB_OPT_REG_ENABLED | USB_OPT_AUTO_PLL)

So, we have FULLSPEED with USB 1.1 - which does not make sense. Can this
be a problem?


* On Tue, May 12, 2020 at 11:43:47AM -0700 Martin Thierer wrote:

> Turns out USB bulk transfers have this thing called data toggle. The toggle is
> either 0 or 1 and gets inverted with with every ack that's either sent or
> received on an endpoint. When the receiving end receives a packet with a toggle
> value it doesn't expect, it just ignores the packet. The rationale behind this
> behaviour is a safeguard against a possible loss of an ack packet. If the
> receiver acks a packet but the ack doesn't make it to the sender, the sender
> will just send it again and the receiver will ignore the packet (which would
> otherwise be a duplicate).
>
> Of course this means that the data toggles of the sending and the receiving end
> have to be in sync. And that's where the problem starts...
>
> My computer has both USB2 and USB3 ports. The USB2 ports are managed by the
> linux xhci_hcd driver and the usb2 ports by the ehci-pci driver. When I first
> experienced the problem, I had the xum1541 port plugged into an USB3 port. I
> later found that it works if I plug it into an USB2 port.

That's interesting. There were some times where I suspected USB3 port to
be the problem. However, it seems, these were not always involved.

> The reason is that apparently the xhci_hcd driver (or libusb when using this
> driver, I'm not sure) does not reset its data toggles when a "set
> configuration" message is issued, while the ehci-pci driver does.

https://wiki.osdev.org/Universal_Serial_Bus#Data_Toggle_Synchronization
states:

Data toggle synchronization works differently depending on the type of transfer used:

* Control transfers initialize the endpoint's data toggle bits to 0 with a SETUP packet.
* Interrupt and Bulk endpoints initialize their data toggle bits to 0 upon any configuration event.
* Isochronous transfers do not perform a handshake and thus do not support data toggle synchronization.
* High-speed, high-bandwidth isochronous transfers do support data sequencing within a microframe.

The 2nd point seems to suggest that it should reset it, beause "set
configuration" should be "any configuration event", should'nt it?

> As the
> xum1541 *does* reset its data toggles when it receives a "set configuration"
> message, that makes the data toggles go out of sync when using the xhci_hcd
> driver. This is also the reason why my "fix" isn't really a fix as it only
> makes it work for the xhci_hcd driver and breaks it for the ehci (USB2) driver
> (because in that case the host *does* reset its data toggles while the xum1541
> no longer does). I guess what you are experiencing with the lxc container is a
> similar issue. Maybe the container handles the USB ports in a different way or
> it is running a different linux version with different bugs :) (I guess you
> were using the same usb ports for your tests from within the container?).

Yes, exactly the same. In fact, I setup an lxc container for the first
time in my life.


That's all so Greek to me... ;)

Nate Lawson

unread,
May 13, 2020, 7:00:39 PM5/13/20
to ZoomFloppy Users


> On Mar 29, 2020, at 1:18 PM, Martin Thierer <mthi...@gmail.com> wrote:
>
> I try to build a xum1541 adapter to interface my 1541-II floppy using an arduino pro micro clone with an atmega32u4.

I’m against the proliferation of new boards due to the maintenance and test overhead. We should be qualifying every firmware or protocol change against all supported boards and drives, which creates the need for B (boards) * D (drives) tests. Right now, no one does that qualifying and the test suite doesn’t exist, so we frequently have bugs that affect only one device and the author isn’t around to fix it.

It’s ok to create new form factors, where necessary, but the cost of supporting any random dev board just because one person happens to have it is too high.

-Nate

Spiro Trikaliotis

unread,
May 14, 2020, 11:45:07 AM5/14/20
to ZoomFloppy Users
Hello Nate,

* On Wed, May 13, 2020 at 04:00:33PM -0700 Nate Lawson wrote:
>
> > On Mar 29, 2020, at 1:18 PM, Martin Thierer <mthi...@gmail.com> wrote:
> >
> > I try to build a xum1541 adapter to interface my 1541-II floppy using an arduino pro micro clone with an atmega32u4.
>
> I’m against the proliferation of new boards due to the maintenance and test overhead.

The ZoomFloppy is the only supported board, even now. From my point of
view, this does not forbid adding other boards, too, as long as they
keep compiling.

The only thing: We could add a warning about being unsupported.

The PRO MICRO boards are already in the repository. In fact, my firmware
work as done on a PRO MICRO board, as it allows for easier debugging.
Once everything was up and running, I started testing the ZF.


However, to come back to the problem at hand: It is NOT restricted to
the PRO MICRO boards. There are reports on exactly the same problem for
the ZF. There are people who cannot use it at all, they get that error
message on error access to the ZF. There are people that get it only
when using two consecutive cbmctrl calls. And so on.

I had an exchange here, and let someone send me her ZF that did not work
on her machine at all. Here, it worked without any problem.


So, I am happy that Martin has done his analysis, and it sounds
reasonable to me, although I cannot tell for sure at the moment. I
think, we should not shot the messenger because he uses an unsupported
device or setup, but try to understand his findings and look if we can
use it to improve the device, shouldn't we?

Martin Thierer

unread,
May 21, 2020, 9:29:27 AM5/21/20
to ZoomFloppy Users
Hi Spiro

I prepared three branches to show what I meant:
 
2. So to work around that possible problem, it would be possible to only make the libusb_set_configuration() call if the device isn't already configured.


This is a just a hacky demon that uses libusb_get_configuration() from libusb1 (so it doesn't work with libusb0) and only works on linux (as I only exported the call in the linux version of dynlibusb.c).
 
3. So I propose a simpler solution: What also works for me is disconfiguring the device in xum1541_close() by calling usb.set_configuration() with a configuration of -1.
 

As already stated in my previous post, I think that 3. is the cleanest solution, but all 3 work for me on my linux system in the sense that "cbmctrl status" doesn't lock up on multiple, consecutive invocations, regardless if the promicro board is connected to a port that's handled by the xhci_hcd or the ehci-pci driver.

There is one thing, though, that I stumbled upon 
In xum1541/descriptor.c, we tell that the USB Version is 1.10 
(VERSION_BCD(01.10) or VERSION_BCD(1, 1, 0) in LUFA-170418). 
However, in the Makefile, we specify: 
-DUSE_STATIC_OPTIONS="(USB_DEVICE_OPT_FULLSPEED | USB_OPT_REG_ENABLED | USB_OPT_AUTO_PLL) 
So, we have FULLSPEED with USB 1.1 - which does not make sense. Can this 
be a problem? 

Good catch. I don't know what the implications are, though. When I changed this to "VERSION_BCD(02.00)" "VERSION_BCD(03.10)" it still didn't work without one of my patches.

> The reason is that apparently the xhci_hcd driver (or libusb when using this 
> driver, I'm not sure) does not reset its data toggles when a "set 
> configuration" message is issued, while the ehci-pci driver does. 
https://wiki.osdev.org/Universal_Serial_Bus#Data_Toggle_Synchronization 
states: 
Data toggle synchronization works differently depending on the type of transfer used: 
 * Control transfers initialize the endpoint's data toggle bits to 0 with a SETUP packet. 
 * Interrupt and Bulk endpoints initialize their data toggle bits to 0 upon any configuration event. 
 * Isochronous transfers do not perform a handshake and thus do not support data toggle synchronization. 
 * High-speed, high-bandwidth isochronous transfers do support data sequencing within a microframe. 
The 2nd point seems to suggest that it should reset it, beause "set 
configuration" should be "any configuration event", should'nt it? 

I also *think* it should, I'm just not sure as this isn't the standard and I find the standard not to be this explicit about it.

Martin

Martin Thierer

unread,
May 21, 2020, 9:35:50 AM5/21/20
to ZoomFloppy Users
So, I am happy that Martin has done his analysis, and it sounds
reasonable to me, although I cannot tell for sure at the moment. I
think, we should not shot the messenger because he uses an unsupported
device or setup, but try to understand his findings and look if we can
use it to improve the device, shouldn't we?

I didn't take Nate's comment as an offense. I'd be happy to take this discussion over to zyonee's github repository where the port for the pro micro board originated, but I also don't think the problem is specific to that board.

RETRO Innovations

unread,
May 21, 2020, 10:26:46 AM5/21/20
to zoomflop...@googlegroups.com
--

Given the low volume of posts on this list, the overarching nature of these issues, and the fact the discussion is already started, feel free to keep it here.

Jim

-- 
RETRO Innovations, Contemporary Gear for Classic Systems
www.go4retro.com
store.go4retro.com

Spiro Trikaliotis

unread,
May 22, 2020, 12:57:51 PM5/22/20
to ZoomFloppy Users
Hello Martin,

I am trying to read on the USB spec, but I did not get very far.

Interestingly, on the libusb 1 samples, I did not find ANY that uses
SetConfiguration. It seems that every single sample relies on the OS to
set the configuration of the device.


But, I just found the following: A call to Endpoint_ResetDataToggle() in
USB_ResetConfig() of xum1541.c.

Does it make sense at that place? Should we be missing with the data
toggle at all?



Interestingly, at first read, I did not understand your proposals.
Having read in the USB spec and some free resources, I came to the same
conclusions as you:

* On Thu, May 21, 2020 at 06:29:27AM -0700 Martin Thierer wrote:
> Hi Spiro
>
> I prepared three branches to show what I meant:
>
>
> 1. Get rid of the libusb_set_configuration() call in xum1541_init().
>
> https://github.com/thierer/OpenCBM/tree/no_explicit_configuration

That's what the samples of libusb1 seem to be doing.

> 2. So to work around that possible problem, it would be possible to only
> make the libusb_set_configuration() call if the device isn't already
> configured.
>
>  https://github.com/thierer/OpenCBM/tree/check_if_device_already_configured

That's what came to mind to me, too.

> 3. So I propose a simpler solution: What also works for me is
> disconfiguring the device in xum1541_close() by calling
> usb.set_configuration() with a configuration of -1.
>  
> https://github.com/thierer/OpenCBM/tree/deconfigure_device_on_close

This would be one possible way to do it. Another one might be to
disconfigure it immediately before configuring it, wouldn't it?


> There is one thing, though, that I stumbled upon 
> In xum1541/descriptor.c, we tell that the USB Version is 1.10 
> (VERSION_BCD(01.10) or VERSION_BCD(1, 1, 0) in LUFA-170418). 
> However, in the Makefile, we specify: 
> -DUSE_STATIC_OPTIONS="(USB_DEVICE_OPT_FULLSPEED | USB_OPT_REG_ENABLED |
> USB_OPT_AUTO_PLL) 
> So, we have FULLSPEED with USB 1.1 - which does not make sense. Can this 
> be a problem? 
>
> Good catch. I don't know what the implications are, though. When I changed this
> to "VERSION_BCD(02.00)" "VERSION_BCD(03.10)" it still didn't work without one
> of my patches.

Not a good catch, full speed is already available in USB 1.1. It seems I
was a little bit confused. ;)

Martin Thierer

unread,
May 22, 2020, 2:10:29 PM5/22/20
to ZoomFloppy Users
Hi Spiro,


But, I just found the following: A call to Endpoint_ResetDataToggle() in
USB_ResetConfig() of xum1541.c.

Does it make sense at that place? Should we be missing with the data
toggle at all?

 That depends. USB_ResetConfig() is called when a "set configuration" message is received so if it is correct to reset the data toggles in response to that message then it does make sense. If you remove that call you should get the same effect (not tested, but I'm pretty sure) as with my first shot at a "fix", that removed the call to USB_ResetConfig() for all but the first "set configuration" message: It works if the host does not reset its data toggles when it sends the message (like my computer with the xhci driver) but it stops working if the host does (like my computer with the ehci driver and probably what you experienced with the system in the lxc container).

>     3. So I propose a simpler solution: What also works for me is
>     disconfiguring the device in xum1541_close() by calling
>     usb.set_configuration() with a configuration of -1.
>  
> https://github.com/thierer/OpenCBM/tree/deconfigure_device_on_close

This would be one possible way to do it. Another one might be to
disconfigure it immediately before configuring it, wouldn't it?

Probably yes. But I find it a cleaner solution to put it in the end, before the command exists anyway. That shouldn't do any harm. De-configuring at the start feels a lot more like a hack to me.

Have you tried my third variant on windows btw? I'm curious if it reports an "USB deconfig device" error like it did when I tried the same thing via pyusb on windows. (I don't think it would be a big deal if it does. I'd just get rid of the whole error check. It happens just before the program exists and there's nothing we could do anyway).
 
>     There is one thing, though, that I stumbled upon 
>     In xum1541/descriptor.c, we tell that the USB Version is 1.10 
>     (VERSION_BCD(01.10) or VERSION_BCD(1, 1, 0) in LUFA-170418). 
>     However, in the Makefile, we specify: 
>     -DUSE_STATIC_OPTIONS="(USB_DEVICE_OPT_FULLSPEED | USB_OPT_REG_ENABLED |
>     USB_OPT_AUTO_PLL) 
>     So, we have FULLSPEED with USB 1.1 - which does not make sense. Can this 
>     be a problem? 
>
> Good catch. I don't know what the implications are, though. When I changed this
> to "VERSION_BCD(02.00)" "VERSION_BCD(03.10)" it still didn't work without one
> of my patches.

Not a good catch, full speed is already available in USB 1.1. It seems I
was a little bit confused. ;)

Interesting. I also thought that full speed was introduced with usb 2.0.

Martin

Spiro Trikaliotis

unread,
Jun 11, 2020, 4:54:55 PM6/11/20
to ZoomFloppy Users
Hello Martin,

* On Thu, May 21, 2020 at 06:29:27AM -0700 Martin Thierer wrote:

> 3. So I propose a simpler solution: What also works for me is
> disconfiguring the device in xum1541_close() by calling
> usb.set_configuration() with a configuration of -1.
>
>  
> https://github.com/thierer/OpenCBM/tree/deconfigure_device_on_close
>
> As already stated in my previous post, I think that 3. is the cleanest
> solution, but all 3 work for me on my linux system in the sense that "cbmctrl
> status" doesn't lock up on multiple, consecutive invocations, regardless if the
> promicro board is connected to a port that's handled by the xhci_hcd or the
> ehci-pci driver.

I merged this approach. I am currently building a 0.4.99.100 for
Windows, so people can test it there if they like. For Linux, the
sources are there.

Martin Thierer

unread,
Jun 12, 2020, 5:59:27 AM6/12/20
to ZoomFloppy Users
Hi Spiro,


> https://github.com/thierer/OpenCBM/tree/deconfigure_device_on_close
>
> As already stated in my previous post, I think that 3. is the cleanest
> solution, but all 3 work for me on my linux system in the sense that "cbmctrl
> status" doesn't lock up on multiple, consecutive invocations, regardless if the
> promicro board is connected to a port that's handled by the xhci_hcd or the
> ehci-pci driver.

I merged this approach. I am currently building a 0.4.99.100 for
Windows, so people can test it there if they like. For Linux, the
sources are there.

Thanks! However that was only meant for discussion; had I known you would go ahead and commit it I would have written a proper commit message :)

Martin

Martin Thierer

unread,
Jun 13, 2020, 11:03:05 AM6/13/20
to ZoomFloppy Users
To get a better understanding for how configuration and de-configuration of usb devices should be handled, I wrote a small demo using WebUSB, that does the equivalent of "cbmctrl status" when a xum1541 device is attached:


I tested with chrome 83, but it should also work with MS Edge and Opera.

What I learned so far:
  • WebUSB supports an equivalent to libusb_set_configuration() but as with libusb on windows at least using chrome it seems to be silently ignored when the device is already configured, even though standard draft states to send a "set configuration" message without making that dependent on the configuration state. "Example 2" checks if the device is configured before selecting a configuration.
  • The demo stops working after the new opencbm version that de-configures the device is used. The error is "Unable to claim interface", but I think that's misleading. But anyway the de-configuration seems to mess up chrome enough to allow no more communication until the device is re-plugged.
  • But the most interesting discovery: After using WebUSB, cbmctrl still works, even though the WebUSB version doesn't de-configure the device! So something else must be happening that prevents the mismatch in the usb data toggles.
So I think de-configuring the device is ok as a quick fix, but it's not the ideal solution. If we find out why the command line tool works even after using the WebUSB version that sends the exact same messages (except the configuration/de-configuration), that will hopefully lead to a proper fix. Btw the behaviour is the same on both usb 2 and usb 3 ports on my machine.

Martin

Dan Gahlinger

unread,
Jun 13, 2020, 11:10:46 AM6/13/20
to ZoomFloppy Users
I don't know if it would be helpful, but I've been thinking it might help your efforts,

For windows, there's a piece of software called "Wireshark",
Wireshark now "ships" with USB port sniffer capture capabilities.

I can read a sniffer capture (data packet) but have never looked at a USB one,
But this would show you everything going in and out of the USB port at all protocol layers.

But I thought it may be useful in debugging such work,
Check out the project homepage: https://desowin.org/usbpcap/tour.html
And limitations: https://desowin.org/usbpcap/capture_limitations.html

Sorry if this is obvious,

Dan.

--
You received this message because you are subscribed to the Google Groups "ZoomFloppy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zoomfloppy-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zoomfloppy-users/2e65c6f9-87b3-46db-82fa-07b8b7bd40a1o%40googlegroups.com.

Martin Thierer

unread,
Jun 13, 2020, 11:23:05 AM6/13/20
to ZoomFloppy Users
Hi Dan,

thanks for the tip, I'm already using wireshark (it's how I found out
that chrome doesn't send the "set configuration" messages).

Unfortunately, for a few areas of this problem that isn't low level
enough. It reports messages on the driver level, but that's not
necessarily what's really happening on the bus. For that you have to
use an logic analyzer, which unfortunately gets quite tedious fast, as
a lot of messages are involved...

Martin

Spiro Trikaliotis

unread,
Jun 15, 2020, 11:53:37 AM6/15/20
to ZoomFloppy Users
Hello Martin,

* On Thu, May 21, 2020 at 06:29:27AM -0700 Martin Thierer wrote:
> Hi Spiro
>
> I prepared three branches to show what I meant:
>
>
> 1. Get rid of the libusb_set_configuration() call in xum1541_init().
[...]
> 2. So to work around that possible problem, it would be possible to only
> make the libusb_set_configuration() call if the device isn't already
> configured.
[...]
> 3. So I propose a simpler solution: What also works for me is
> disconfiguring the device in xum1541_close() by calling
> usb.set_configuration() with a configuration of -1.

I just had a look into the USB stack of the xu1541 device, how it
handles these things.

It is interesting:

1. usbtiny completely ignores SET_CONFIGURATION and always returns 1 on
GET_CONFIGURATION (xu1541/usbtiny/usb.c, line 278 for
GET_CONFIGURATION; you see that the value 9 for SET_CONFIGURATION is
not even handled in this case!)

2. AVRUSB (xu1541/xu1541/usbdrv/usbdrv.c; newer versions are called
V-USB) handles SET_CONFIGURATION (lines 397ff) and GET_CONFIGURATION
(lines 395ff), but it does NOT reset the data toggle!
It is reset with the SET_INTERFACE, though. To me, it does not look
plausible this way.



Please also note

https://stackoverflow.com/questions/17955017/please-clarify-when-a-usb-device-resets-or-changes-its-data-toggle

and especially

https://www.microchip.com/forums/m473721.aspx

which discuesses exactly our problem!
However, the last link also says that the data toggle must be reset on
SET_CONFIGURATION - exactly the opposite of what the XU1541 is doing.



Then, I came accross this one:
https://sourceforge.net/p/libusb/mailman/message/21975413/

> Try set_configuration. The USB 2.0 spec requires the device to reset the
> data toggle for any non-zero set_configuration request (section
> 9.1.1.5).

Set-Config is also rather heavy-handed. Furthermore, the USB spec also
requires devices to reset the data toggle in response to Set-Interface
or Clear-Halt (section 9.4.5) -- devices don't always follow the spec.

Thus, from my understanding, clear-Halt would also work to reset the data toggle.

The good news: xum1541 does make use of clear-halt (xum1541_clear_halt()).
The bad news: It uses it *after* using the endpoint IN in order to get
the XUM1541_INIT information!

So, if I am not mistaken,

1. moving the xum1541_clear_halt() before the request for XUM1541_INIT
could heal our problem, or

2. or, perhaps, doing an xum1541_clear_halt() instead of deconfiguring
the device might help, too.

The attached patch implements both approaches. Define
TEST_CLEAR_ENDPOINT_AT_BEGINNING for variant 1, or
TEST_CLEAR_ENDPOINT_AT_EXIT for variant 2.

Note that in variant 1, I did not move the xum1541_clear_halt(), but I
added another that is done unconditionally. This might be overkill here.
xum1541-clear_halt.diff.gz

Spiro Trikaliotis

unread,
Jun 15, 2020, 4:36:30 PM6/15/20
to ZoomFloppy Users
Hello again,

* On Mon, Jun 15, 2020 at 05:53:33PM +0200 I wrote:

> I just had a look into the USB stack of the xu1541 device, how it
> handles these things.

Even more weird: I just re-realized that
opencbm/lib/plugin/xum1541/xum1541.c has some hard-coded exit(-1) in it!

If I stop rpm1541, for example, the cbm_reset() fails and the command
ends because of this exit command! Thus, the driver is never closed (cf.
function reset() in opencbm/demo/rpm1541/rpm1541.c.) - the ZF flashes in
fast succession, and the next command fails. That's similar to what you
are experiencing, right?

If I replace these exit(-1) with return -1, then the problem goes away.

Could it be that we have a similar issue with cbmctrl, which is
prematurely ended because of the libusb_set_configuration() trying to
set configuration 0, resulting in the flashing of the devices some
people report?

Can it also be a part of the problem with the data toggle, because the
USB device is not correctly taken down / closed?

Nate Lawson

unread,
Jun 16, 2020, 3:13:14 AM6/16/20
to zoomflop...@googlegroups.com

On May 21, 2020, at 6:35 AM, Martin Thierer <mthi...@gmail.com> wrote:



I appreciate the work you and Spiro are doing to debug things. My problem is that without a test suite and full matrix of boards, operating systems, and floppy drives, any changes you make may destabilize some other configuration. 

I developed on Windows originally and test (periodically) on Mac. I don’t use Linux. I only have the ZoomFloppy board with 1541 and 1571. So I am concerned about new boards showing up and users expecting support when there’s not enough testing going on. 

It might be nice some day to build a drive simulator that behaved as a group of drives (including IEEE) in order to do regression testing. It could be software selectable so the full test suite could run unattended. 

-Nate 

Martin Thierer

unread,
Jun 16, 2020, 2:50:51 PM6/16/20
to ZoomFloppy Users
Hi Spiro.


Even more weird: I just re-realized that
opencbm/lib/plugin/xum1541/xum1541.c has some hard-coded exit(-1) in it!

If I stop rpm1541, for example, the cbm_reset() fails and the command
ends because of this exit command! Thus, the driver is never closed (cf.
function reset() in opencbm/demo/rpm1541/rpm1541.c.) - the ZF flashes in
fast succession, and the next command fails. That's similar to what you
are experiencing, right?

If I replace these exit(-1) with return -1, then the problem goes away.

Could it be that we have a similar issue with cbmctrl, which is
prematurely ended because of the libusb_set_configuration() trying to
set configuration 0, resulting in the flashing of the devices some
people report?

Can it also be a part of the problem with the data toggle, because the
USB device is not correctly taken down / closed?

No, I think these are different problems. When the device keeps blinking, that just means that it wasn't properly closed. That isn't necessarily a big problem and can have all kinds of reasons, like
  1. The host command was interrupted, so it never sent the close command.
  2. The communication stalls, so the close command is never sent (like the problem caused by "set configuration" / the data toggles).
  3. Sending the message to close the device failed, so it isn't actually closed (that is what happens to Ruben and the others that get the "USB request for XUM1541 close failed, continuing" error.
It's usually possible to recover from the state when the led is blinking, unless it's just the symptom of a bigger underlying problem. (You can see that in the debug log that Ruben posted on June 14th: On every invocation he gets the "previous command was interrupted" and "USB request for XUM1541 close failed" errors, but apart from that everything seems to work).

I agree that the error handling could (and should) be improved, but I also think at least the exit(-1)'s in lib/xum1541 aren't a big problem (there might be others elsewhere that I haven't seen yet), because they are in places where something else went wrong already.

I don't understand your example with rpm1541.c: From a quick glance (I never saw or used that program) it does seem to close the device before calling exit()?

And regarding the links you sent in your other post: I know most of these, but I don't think they give a clear picture and are not authoritative. For example the post on the libusb mailing list cites section 9.1.1.5 of the usb 2.0 standard as proof that a "set configuration" message should reset the data toggles. Except for me this is not what this section says (highlights by me):

Before a USB device’s function may be used, the device must be configured. From the device’s
perspective, configuration involves correctly processing a SetConfiguration() request with a non-zero
configuration value. Configuring a device or changing an alternate setting causes all of the status and
configuration values associated with endpoints in the affected interfaces to be set to their default values.
This includes setting the data toggle of any endpoint using data toggles to the value DATA0.

So yes, it says that the data toggles should be reset, but it says so in the context of bringing the device from an unconfigured into a configured state. But that's not what's happening with the set_configuration call in xum1541_init(). Here the device is already configured and we're sending a message requesting the device to select the one and only configuration it has and that it is already in. That isn't the same for me. It might still be true that it should, but my problem here is that people write a lot of stuff on the internet, that's not necessarily true, so I don't always trust random guys posting things...

After a quick look at the usbtiny code I don't consider that exactly a reference implementation, neither. It seems to me that they implement the whole usb communication in software and cut a few corners to do that. The code doesn't seem to correctly implement the handling of the data toggles at all, it just takes whatever is on the bus. (Which means that in the case when the ack package is lost and resend it would process a duplicate package). 

I haven't looked into the patches you propose in regard to clearing a halt condition yet. I did some experiments with stalling the connection, but only as a means to recover from a failed communication (that's what it's meant for). There were also a few things that I didn't understand and most importantly, clearing a halt didn't seem to reset the data toggles on my machine neither. But that might be because I did something wrong stalling the connection (that's one of the areas that I didn't understand) because in contrast to the "set configuration" message the standard is very clear that clearing a halt *should* reset the data toggles. So yes, that might be a route to take, but my feeling is that it should work without it.

Martin

Spiro Trikaliotis

unread,
Jun 16, 2020, 4:33:25 PM6/16/20
to ZoomFloppy Users
Hello Martin,

* On Tue, Jun 16, 2020 at 11:50:50AM -0700 Martin Thierer wrote:

> I agree that the error handling could (and should) be improved, but I also
> think at least the exit(-1)'s in lib/xum1541 aren't a big problem (there might
> be others elsewhere that I haven't seen yet), because they are in places where
> something else went wrong already.

But here, trying to reset the configuration also fails and results in an
exit(-1). This is wrong and prevents the device from properly being
taken down.

> I don't understand your example with rpm1541.c: From a quick glance (I never
> saw or used that program) it does seem to close the device before calling exit
> ()?

But the library calls exit() because the taking down failed.

> And regarding the links you sent in your other post: I know most of these, but
> I don't think they give a clear picture and are not authoritative. For example

You are right, we already discussed this.


Also very interesting is the warning here:
https://www.kernel.org/doc/Documentation/driver-api/usb.rst:

USBDEVFS_SETCONFIGURATION
Issues the :c:func:`usb_set_configuration()` call for the
device. The parameter is an integer holding the number of a
configuration (bConfigurationValue from descriptor). File
modification time is not updated by this request.

**Warning**

*Avoid using this call* until some usbcore bugs get fixed, since
it does not fully synchronize device, interface, and driver (not
just usbfs) state.


If I take this warning seriously, the approach (GetConfiguration to test
if we are already configured the right way, and SetConfiguration only if
not) starts looking better to me. Perhaps, this is the better way, and
everything else we are just doing is just too much work for no benefit!


So, this patch
https://github.com/thierer/OpenCBM/commit/88f2a705a6152fa7f92c8fd197085660b373cc16#diff-4daf10310be7ed8344d882a79444d1dc
might be the best one instead of variant 3.

The only change I would do: Instead of testing if config == 0 (line
564), I would test for config != 1.



But, if we take the current approach, then the following applies:

> But that's not what's happening with the set_configuration call in xum1541_init
> (). Here the device is already configured and we're sending a message
> requesting the device to select the one and only configuration it has and that
> it is already in. That isn't the same for me. It might still be true that it
> should, but my problem here is that people write a lot of stuff on the
> internet, that's not necessarily true, so I don't always trust random guys
> posting things...

neither the linux kernel nor the xum1541 have a look if we are comming
from the unconfigured state and are getting configured: The
SetConfiguration always results in clearing the data toggle!

Linux Kernel:
https://docs.huihoo.com/doxygen/linux/kernel/3.7/drivers_2usb_2core_2message_8c_source.html#l01706

You can see that it is not even interested in the previous
configuration, it always executes the same commands.


xum1541/LUFA/Drivers/USB/LowLevel/DevChapter9.c,
USB_Device_SetConfiguration() lines 146-204 does not check, either. It
always calls EVENT_USB_Device_ConfigurationChanged(), and this is in
xum1541/main.c, where there is no test either.


Interesting: LUFA has only one place (for our device) where it calls
USB_ResetDataToggle(). In
xum1541/LUFA/Drivers/USB/LowLevel/DevChapter9.c, when SetFeature is
issued with FEATURE_ENDPOINT_HALT. Thus, the call to
xum1541_clear_halt() should work to reset the data toggle on the device
side. We only have to make sure that it is also reset on the PC side.


> After a quick look at the usbtiny code I don't consider that exactly a
> reference implementation, neither. It seems to me that they implement the whole
> usb communication in software and cut a few corners to do that. The code
> doesn't seem to correctly implement the handling of the data toggles at all, it
> just takes whatever is on the bus. (Which means that in the case when the ack
> package is lost and resend it would process a duplicate package). 

Yes, USBTINY and AVRUSB are hacks that were used in times when the AVR8 did
not have any USB chips. I did not want to take them as reference, I
wanted only to see what they are doing, as they seem not to have the
same issue that we have.

> I haven't looked into the patches you propose in regard to clearing a halt
> condition yet. I did some experiments with stalling the connection, but only as
> a means to recover from a failed communication (that's what it's meant for).
> There were also a few things that I didn't understand and most importantly,
> clearing a halt didn't seem to reset the data toggles on my machine neither.
> But that might be because I did something wrong stalling the connection (that's
> one of the areas that I didn't understand) because in contrast to the "set
> configuration" message the standard is very clear that clearing a halt *should*
> reset the data toggles. So yes, that might be a route to take, but my feeling
> is that it should work without it.

My biggest problem here at the moment is that I do not have the problem,
so I can only guess and I cannot test.

Spiro Trikaliotis

unread,
Jun 16, 2020, 5:02:42 PM6/16/20
to zoomflop...@googlegroups.com
Hello Nate,

* On Tue, Jun 16, 2020 at 12:13:08AM -0700 Nate Lawson wrote:
>
> I appreciate the work you and Spiro are doing to debug things. My problem is
> that without a test suite and full matrix of boards, operating systems, and
> floppy drives, any changes you make may destabilize some other configuration.

Well, what about my proposal (if it works - Martin would have to test):

1. Roll back the changes so far w.r.t. deconfiguration at the end of
using the xum1541.

2. Only change: Before setting the configuration, check with
getconfiguration if the xum1541 is already configured. If it is,
leave it alone.

From my understanding, that's exactly what is being done on Windows,
as it does not support setting another configuration but 1. So, Linux
would adjust its behaviour to be more Windows like.

Think this approach would be least invasive, only changes the PC side,
and it does not change significantly its behaviour.

I know, we cannot test because we do not have a full suite. But, the PC
side can be changed rather easily (compared to the AVR side).

> I developed on Windows originally and test (periodically) on Mac. I don’t use
> Linux. I only have the ZoomFloppy board with 1541 and 1571. So I am concerned
> about new boards showing up and users expecting support when there’s not enough
> testing going on.

New boards showing up are not our problem at hand, the problem exists
also for the ZF.

Martin Thierer

unread,
Jun 17, 2020, 2:57:18 PM6/17/20
to ZoomFloppy Users
Hi Spiro,

> I don't understand your example with rpm1541.c: From a quick glance (I never
> saw or used that program) it does seem to close the device before calling exit
> ()?

But the library calls exit() because the taking down failed.

so you refer to xum1541_control_msg() called from opencbm_plugin_reset() calling exit(-1)?

But that only happens because the libusb_control_transfer failed (nBytes < 0). And that means that something else is very wrong already. I agree that it's bad practice, I just don't think it makes anything worse here. You're right that the exit(-1) prevents the cbm_driver_close() to be called from rpm1541.c, but that call would fail anyway, so it doesn't make a big difference.

Also very interesting is the warning here:
https://www.kernel.org/doc/Documentation/driver-api/usb.rst:

        USBDEVFS_SETCONFIGURATION
            Issues the :c:func:`usb_set_configuration()` call for the
            device. The parameter is an integer holding the number of a
            configuration (bConfigurationValue from descriptor). File
            modification time is not updated by this request.

                **Warning**

                *Avoid using this call* until some usbcore bugs get fixed, since
                it does not fully synchronize device, interface, and driver (not
                just usbfs) state.

Thanks, this document is really interesting, I haven't found it yet. 
 
If I take this warning seriously, the approach (GetConfiguration to test
if we are already configured the right way, and SetConfiguration only if
not) starts looking better to me. Perhaps, this is the better way, and
everything else we are just doing is just too much work for no benefit!


So, this patch
https://github.com/thierer/OpenCBM/commit/88f2a705a6152fa7f92c8fd197085660b373cc16#diff-4daf10310be7ed8344d882a79444d1dc
might be the best one instead of variant 3.

The only change I would do: Instead of testing if config == 0 (line
564), I would test for config != 1.

I tend to think that, too. But the reason why I leaned towards the de-configuration in the first place still stands: It is just a one-line change while checking for the configuration means
  1. extending the interface of dynlibusb (don't know if that has any real consequences) 
  2. there's no equivalent to libusb_get_configuration() available in libusb0
  3. it's just generally a bigger change with the chance to expose more bugs/intricacies of usb implementations. The recent problems with the control message calls failing imho shows how fragile everything is.
Regarding the implementation details I don't think "== 0" or "!= 1" really matters, as 0 and 1 are the only valid values for the xum1541 firmware. I would however suggest to take the "Device already configured" message out, I only put that there for testing.

But, if we take the current approach, then the following applies:

> But that's not what's happening with the set_configuration call in xum1541_init
> (). Here the device is already configured and we're sending a message
> requesting the device to select the one and only configuration it has and that
> it is already in. That isn't the same for me. It might still be true that it
> should, but my problem here is that people write a lot of stuff on the
> internet, that's not necessarily true, so I don't always trust random guys
> posting things...

neither the linux kernel nor the xum1541 have a look if we are comming
from the unconfigured state and are getting configured: The
SetConfiguration always results in clearing the data toggle!

Linux Kernel:
https://docs.huihoo.com/doxygen/linux/kernel/3.7/drivers_2usb_2core_2message_8c_source.html#l01706

You can see that it is not even interested in the previous
configuration, it always executes the same commands.

I looked at the linux kernel before, and I don't really understand what's going on. Plus, this is from the 3.7 kernel, which was released in 2012. Who knows what the code looks like today. The file has some interesting information, though. It does seem to reset the data toggles in the usb_set_interface() function. I noticed that the avrusb code you linked to did that, too. Maybe that would be a way to reliably reset the data toggles on linux. It refers to the same section 9.1.1.5 of the usb specs as the post on the libusb mailing list, that also only says about "changing an alternate setting" which I think doesn't necessarily apply to selecting the same setting again, but I'll try and report back.

Martin

Martin Thierer

unread,
Jun 17, 2020, 3:26:58 PM6/17/20
to ZoomFloppy Users
I looked at the linux kernel before, and I don't really understand what's going on. Plus, this is from the 3.7 kernel, which was released in 2012. Who knows what the code looks like today. The file has some interesting information, though. It does seem to reset the data toggles in the usb_set_interface() function. I noticed that the avrusb code you linked to did that, too. Maybe that would be a way to reliably reset the data toggles on linux. It refers to the same section 9.1.1.5 of the usb specs as the post on the libusb mailing list, that also only says about "changing an alternate setting" which I think doesn't necessarily apply to selecting the same setting again, but I'll try and report back.

Well, what do you know, this work well on my linux machine on both usb 2 and usb 3 ports:


(This is only for demonstration, the code is missing necessary dynlibusb changes for both windows and libusb0).

I think I would now prefer this solution. It also requires changes to dynlibusb, but the function to set the interface alt setting seems to exists in the libusb0 api, too.

Martin

Spiro Trikaliotis

unread,
Jun 17, 2020, 5:00:36 PM6/17/20
to ZoomFloppy Users
Hello Martin,

* On Wed, Jun 17, 2020 at 12:26:57PM -0700 Martin Thierer wrote:

> toggles on linux. It refers to the same section 9.1.1.5 of the usb specs as
> the post on the libusb mailing list, that also only says about "changing an
> alternate setting" which I think doesn't necessarily apply to selecting the
> same setting again, but I'll try and report back.

Note that according to the source, the Linux kernel does not sent out
the set configuration if the configuration is not changed. However, it
does reset the configuration.

> Well, what do you know, this work well on my linux machine on both usb 2 and
> usb 3 ports:
>
>
> https://github.com/thierer/OpenCBM/tree/set_interface_no_deconfiguration

Why are you using libusb_set_configuration_alt_setting? Why don't you
use the check_if_device_already_configured branch? I thought that one is
reliable, too?

> (This is only for demonstration, the code is missing necessary dynlibusb
> changes for both windows and libusb0).

Yep, I understand. :)

> I think I would now prefer this solution. It also requires changes to
> dynlibusb, but the function to set the interface alt setting seems to exists in
> the libusb0 api, too.

Ah, ok, this is the reason. :)

For me, this patch would be ok.

Do you want to create a PR?

Martin Thierer

unread,
Jun 19, 2020, 1:54:51 PM6/19/20
to ZoomFloppy Users
Hi Spiro,
 
Do you want to create a PR?

I can do that (tomorrow), but I won't be able to test the windows part.

Martin 

Spiro Trikaliotis

unread,
Jun 19, 2020, 2:38:52 PM6/19/20
to ZoomFloppy Users
Hello Martin,

* On Fri, Jun 19, 2020 at 10:54:51AM -0700 Martin Thierer wrote:
 
>> Do you want to create a PR?
>
> I can do that (tomorrow), but I won't be able to test the windows part.

No problem, I can test it (in the VM, of course).

I wanted to ask because the last time I merged some change from you, you
told me that I should ask beforehand. ;)

If we are unsure about this, we can also make the change conditional and
let it in only on Linux. But for a start, I would like to include it in
all variants.

Martin Thierer

unread,
Jun 19, 2020, 2:51:37 PM6/19/20
to ZoomFloppy Users
>>    Do you want to create a PR?
>
> I can do that (tomorrow), but I won't be able to test the windows part.

No problem, I can test it (in the VM, of course).

So you only have a windows vm, not a real system? I have a dual boot windows 10 install, but I didn't get around trying to compile opencbm for windows, yet. 

I wanted to ask because the last time I merged some change from you, you
told me that I should ask beforehand. ;)

Oh no, that wasn't meant as criticism at all! I'm sorry you took it that way. It was more that *I* felt bad because the commit message didn't satisfy the standard that I try to meet when committing to other people's repositories, because it was only meant as a showcase.

If we are unsure about this, we can also make the change conditional and
let it in only on Linux. But for a start, I would like to include it in
all variants.

I think it should work on windows.

Martin 

Martin Thierer

unread,
Jun 20, 2020, 6:30:23 AM6/20/20
to ZoomFloppy Users
Hi Spiro,

Do you want to create a PR?

I can do that (tomorrow), but I won't be able to test the windows part.

this is what I came up with:


 (Please note that I rebased the changed branch on top of your current HEAD, so if you fetched it before, you probably have to do a force-pull now).

What I checked:
  • It compiles on linux on my machine with all of libusb 1.0.23, libusb-compat and the original libusb0.
  • It works on linux on all 3 machines I can test on. ("works" == "cbmctrl status" gives the correct output and can be invoked multiple times without unplugging and the xum1541 adapter connected to either a usb2 or a usb3 port).
  • My rough pyusb equivalent to "cbmctrl status" still works on windows 10 using libusb1 after adding the set_interface_altsetting() call. (It also works without it, this was only to check for regressions).
What I did not check out of lack of a build environment for windows:
  • I made what I consider the necessary changes to opencbm/libmisc/WINDOWS/dynlibusb.c, but couldn't check if it actually compiles and works. (I catched a stupid mistake just before posting this, so everything is possible...).
Martin

frank128

unread,
Jan 5, 2022, 12:58:21 PM1/5/22
to ZoomFloppy Users
"There was one user that had exact the same problem on an original ZF. I
gave him a modified firmware, with lufa-170418 und the patch by Martin,
and it worked for him. Let's call this firmware v08-PATCHED."

Hi Spiro,

During the holidays I had some time to test my CBM equipment. Lo and behold, the ZoomFloppy no longer works with your v08-PATCHED. It hangs already on the first command - only cbmctrl reset works. Nothing has changed on my hardware beside Linux Mint update to version 19.3. But since I did not touch the LXC container for the CMB stuff (old Debian version), and the same symptoms occur there as in the host system, the new kernel (which the LXC container also uses) could be the only reason for it. What currently works for me is xum1541-ZOOMFLOPPY-v07-nate.hex with opencbm 0.4.99.99. All above this opencbm version .100 - .103 hangs on 
[XUM1541] xum1541_wait_status checking for status BULK SUBMIT

more details sent as PM in the forum64.de Forum. 
 
Frank

Martin Thierer

unread,
Jan 5, 2022, 2:09:51 PM1/5/22
to ZoomFloppy Users
> "There was one user that had exact the same problem on an original ZF. I
> gave him a modified firmware, with lufa-170418 und the patch by Martin,
> and it worked for him. Let's call this firmware v08-PATCHED."
>
> Hi Spiro,
>
> During the holidays I had some time to test my CBM equipment. Lo and behold, the ZoomFloppy no longer works with your v08-PATCHED. It hangs already on the first command - only cbmctrl reset works. Nothing has changed on my hardware beside Linux Mint update to version 19.3. But since I did not touch the LXC container for the CMB stuff (old Debian version), and the same symptoms occur there as in the host system, the new kernel (which the LXC container also uses) could be the only reason for it. What currently works for me is xum1541-ZOOMFLOPPY-v07-nate.hex with opencbm 0.4.99.99. All above this opencbm version .100 - .103 hangs on
> [XUM1541] xum1541_wait_status checking for status BULK SUBMIT

That's no surprise, because this patch was just a workaround for what turned out to be a bug in the linux xhci usb driver, which has since been fixed. With a fixed kernel the "patched" firmware is expected to show the same symptoms as an unpatched firmware before the fix.

I don't understand what you mean when you write "all above this ... hangs"? Are you talking about the firmware or the opencbm version? If the firmware: What revisions *did* you actually test? Probably not "all"? Did you compile the firmware or did you just test with hex files you already had? Imho any xum1541 firmware from the git master branch should work with an up to date linux kernel.

frank128

unread,
Jan 7, 2022, 12:20:09 PM1/7/22
to ZoomFloppy Users
Hi Martin,

> I don't understand what you mean when you write "all above this ... hangs"?

I mean all current opencbm versions starting from 0.4.99.100 till 0.4.99.103 showing the same Symptomes with the zoom floppy.  xu1541 all opencbm versions from  0.4.99.99 till 0.4.99.103 works without issues (same hardware, same cables, same OS) . 
  
> Are you talking about the firmware or the opencbm version?

I tested all combinations between openCBM and the 4 xum1541 firmware versions below:
xum1541-ZOOMFLOPPY-v07-spiro
xum1541-ZOOMFLOPPY-v07-nate
xum1541-ZF-v08-EXPERIMENTAL-v2.zip (the patched version that I got from Spiro)
xum1541-ZOOMFLOPPY-v08.hex (delivered from git with the current openCBM)


> Did you compile the firmware or did you just test with hex files you already had?

The files that I got from Spiro in the forum64.de Forum.


> Imho any xum1541 firmware from the git master branch should work with an up to date linux kernel.

looks like there are open issues.

What works with BOTH ZoomFloppies that I own is the xum1541-ZOOMFLOPPY-v07-nate firmware and the opencbm version 0.4.99.99.


Frank

frank128

unread,
Jan 7, 2022, 2:19:01 PM1/7/22
to ZoomFloppy Users
> What works with BOTH ZoomFloppies that I own is the xum1541-ZOOMFLOPPY-v07-nate firmware and the opencbm version 0.4.99.99.

correction: I have not touched my new ZoomFloppy - they came with a v8 firmware.

[XUM1541] firmware version 8, library version 8

this virgin xum1541 works with opencbm version 0.4.99.99, but not with opencbm version 0.4.99.103

Martin Thierer

unread,
Jan 9, 2022, 9:29:17 AM1/9/22
to ZoomFloppy Users
> > I don't understand what you mean when you write "all above this ... hangs"?
> I mean all current opencbm versions starting from 0.4.99.100 till 0.4.99.103 showing the same Symptomes with the zoom floppy.
[...]

> I tested all combinations between openCBM and the 4 xum1541 firmware versions below:
> xum1541-ZOOMFLOPPY-v07-spiro
> xum1541-ZOOMFLOPPY-v07-nate
> xum1541-ZF-v08-EXPERIMENTAL-v2.zip (the patched version that I got from Spiro)
> xum1541-ZOOMFLOPPY-v08.hex (delivered from git with the current openCBM)

So you tested all of the mentioned opencbm versions (the tagged versions from git?) with these firmwares (so "all combinations" would imply 20 tests)?


> correction: I have not touched my new ZoomFloppy - they came with a v8 firmware.
[...]

> this virgin xum1541 works with opencbm version 0.4.99.99, but not with opencbm version 0.4.99.103

The xum1541 plugin from the original, tagged 0.4.99.99 would not work with a v08 firmware (it would complain about "xum1541 firmware version too high", because it's expecting a v07 firmware).

But, ignoring that for a moment, what you're saying is your opencbm version 0.4.99.99 works with the v08 firmware on your stock ZF, but *not* with the v08 from git master?

I find that hard to believe. There haven't been any changes to the firmware between the first v08 and current git HEAD that I could imagine to cause this problem. (The "experimental" version you got from Spiro is an exception).

That's not true for the plugin, which between .100 and current HEAD contains various experiments to fix the original "stops working after first command" problem. These all worked for me on linux at the time and only caused problems on Windows and/or MacOS, though. Now, there could have been changes on the linux side, but, as the changes are distinct between the versions, I very much doubt they would suddenly *all* start to make problems.


> > Imho any xum1541 firmware from the git master branch should work with an up to date linux kernel.
>
> looks like there are open issues.

I'm pretty sure they are in your local setup, neither in the xum1541 plugin nor the firmware.

frank128

unread,
Jan 9, 2022, 10:03:44 AM1/9/22
to ZoomFloppy Users
> The xum1541 plugin from the original, tagged 0.4.99.99 would not work with a v08 firmware (it would complain about "xum1541 firmware version too high", because it's expecting a v07 firmware).
 
For this reason I of course removed xum1541_check_version () function call in xum1541.c - otherwise I would not have been able to test.

> But, ignoring that for a moment, what you're saying is your opencbm version 0.4.99.99 works with the v08 firmware on your stock ZF, but *not* with the v08 from git master?

I don't know what specific firmware 8 version is on my new ZoonFloppy, which I bought on eBay at the end of December 2021.

# XUM1541_DEBUG=99 cbmctrl reset
...

[XUM1541] firmware version 8, library version 8
...

I want to leave this ZoomFloppy untouched from a firmware perspective so that I can use this device as a reference.

> I'm pretty sure they are in your local setup, neither in the xum1541 plugin nor the firmware.

From my point of view I can live with the OpenCBM version 0.4.99.99. It may be that there is a specific problem with my setting. The funny thing is that with the XU1451 there are no problems with different OpenCBM versions on my system.

RETRO Innovations

unread,
Jan 9, 2022, 2:56:27 PM1/9/22
to zoomflop...@googlegroups.com
On 1/9/2022 9:03 AM, 'frank128' via ZoomFloppy Users wrote:

I don't know what specific firmware 8 version is on my new ZoonFloppy, which I bought on eBay at the end of December 2021.

Hmmm, this seems odd.  As far as I know, I'm the only source for ZoomFloppy units, and I do not offer products for sale on eBay.  I wonder what device you have?  Was it new?  Is there a link or a pic?

There was a v8 beta at one point that had issues, so perhaps that was what was installed initially.

Not sure, just trying to nail down the details.

Jim

Nate Lawson

unread,
Jan 9, 2022, 5:59:10 PM1/9/22
to ZoomFloppy Users
The only firmware bug I know of was in a v8 in opencbm git. It was fixed in 2019, unfortunately while retaining the v8 version so there’s no way to tell them apart. The bug caused nibread and nibwrite to fail in 1571 SRQ mode, but other features worked fine. So it doesn’t sound relevant to these problems.

-Nate
Message has been deleted

frank128

unread,
Jan 9, 2022, 9:24:50 PM1/9/22
to ZoomFloppy Users
I bought it from eBay user "retrontech" - and that user had already sold a lot of ZoomFloppies. And they are expensive, almost 150 EUR.
zoom-floppy.jpg

RETRO Innovations

unread,
Jan 9, 2022, 11:26:54 PM1/9/22
to zoomflop...@googlegroups.com
On 1/9/2022 8:14 PM, 'frank128' via ZoomFloppy Users wrote:
> > As far as I know, I'm the only source for ZoomFloppy units, and I do
> not offer products for sale on eBay.
>
> I bought it from eBay user "retrontech"

I found the listing.  Wow, kinda pricey!

https://www.ebay.com/itm/165232818726?ViewItem=&item=165232818726

Thanks for the link.  Evidently, someone else is selling them as well.

Jim


Message has been deleted

frank128

unread,
Jan 15, 2022, 8:49:49 PM1/15/22
to ZoomFloppy Users
The forum64.de user "GI-Joe" was able to reproduce the issue with the same symptoms. The only difference is that this user has a Pro-Micro based ZoomFloppy clone.

https://www.forum64.de/index.php?thread/122335-zoomfloppy-unter-linux-mit-opencbm-version-0-4-99-103/


frank128

unread,
Jan 28, 2022, 4:44:26 PM1/28/22
to ZoomFloppy Users
Correction

frank128 wrote: 
I mean all current opencbm versions starting from 0.4.99.100 till 0.4.99.103 showing the same Symptomes with the zoom floppy.  xu1541 all opencbm versions from  0.4.99.99 till 0.4.99.103 works without issues (same hardware, same cables, same OS) . 

I have again compiled and tested all tagged OpenCBM versions (https://github.com/OpenCBM/OpenCBM/tags) between version 0.4.99.99 and 0.4.99.103. The problem started with the switch from version 0.4.99.102 to 0.4.99.103.

Spiro Trikaliotis

unread,
Jan 29, 2022, 4:09:54 PM1/29/22
to zoomflop...@googlegroups.com
Hello,

just for the records:

* On Fri, Jan 28, 2022 at 01:44:26PM -0800 'frank128' via ZoomFloppy Users wrote:

> frank128 wrote:
[...]
> I have again compiled and tested all tagged OpenCBM versions (https://
> github.com/OpenCBM/OpenCBM/tags) between version 0.4.99.99 and 0.4.99.103. The
> problem started with the switch from version 0.4.99.102 to 0.4.99.103.

With this, I was able to find the culprit, it is commit #c6babdf983.

Regards,
Spiro

--
Spiro R. Trikaliotis
https://spiro.trikaliotis.net/
Reply all
Reply to author
Forward
0 new messages