HL2 Random Loss of Connection

508 views
Skip to first unread message

Mike Lewis

unread,
Dec 6, 2021, 7:45:56 PM12/6/21
to philip.j.s...@gmail.com, Hermes-Lite

Moving this to a new thread, I will add some info.

 

I have had many loss of connections for several months now and been trying to capture the cause with no luck.  This is in connection with trying to track down the cause of excessive but usually random sequence errors in piHPSDR in a stable controlled network situation.  That is another topic.

 

For me this happens on SparkSDR, Quisk, piHPSDR and Thetis running on a PC or RPi4B.  It happens on 2 different networks and (rarely) on a direct link.  I have changed to smart GiGE switches and found no errors reported there.

 

For the connection loss I have determined this is a loss of UDP layer connection.   There are Zero Rx packet loss on the CPU ends or switch.  Cables can be old or new CAT7, no difference.  It can be minutes or days between connection loss events.

 

When the connection drops, there are no app standard log file errors and no Linux or Windows network statistics I have found that point to any problem.  Not even UDP RX packet/buffer errors change on the lost connection moment.

 

Any other app can take control of the HL2 at this point, or if there is a restart button like in the DL1YCF build of piHPSDR, the app picks up where it left off.  In fact we (DL1YCF and I) have found some interesting interplay in this scenario where the original app can take back the connection from the 2nd app (any SDR app) since a busy condition from the HL2 in this state does not seem to be sent, asked for or received.

 

Since I have not observed this connection loss on 1 of my Pi controllers and the newest HL2 that I can recall, I plan to do a long term test to verify this is true or not by pairing them up and then swapping them.  I have tried this before with no conclusions because as soon as I think I see the problem follow a piece of gear/app, it stops and/or shows up elsewhere.  I just changed my QTH for a few months and need to finish getting things installed and stable to begin such tests again.  

 

I have not found any measurement yet to truly know a successful test other than wait several weeks, very difficult to do since I am working on app feature modification and testing frequently.  Need a way to capture the transaction that causes the HL2 to drop the connection.  I assume it is the HL2 side drop since 5 or more separate SDR apps on multiple CPU hosts Linux and Windows, and multiple networks, have the same problem with that particular HL2 (so far).  That is where the new 2nd HL2 may help sort things out.  But then why is this never observed on my 2nd Pi controller (I think) with same HL2?  Tough one.

 

 

Mike

K7MDL

CN88sf and EL87sm

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of philip.j.s...@gmail.com
Sent: Monday, December 6, 2021 15:05
To: Hermes-Lite <herme...@googlegroups.com>
Subject: Re: Future SDR Project Survey

 

One minor point -- my HL-2 does not need to be power-cycled to recover, I press and unpress the soft button in SparkSDR. I still haven't managed to capture what goes wrong as it happens infrequently and I don't have a good way to capture all the traffic between those two boxes. 

 

Philip

On Monday, December 6, 2021 at 12:29:52 PM UTC-5 softerh...@gmail.com wrote:

Hi Philip,

 

Yes, that is helpful information. Since this is now on the list, I've posted the rest of this thread below.

 

I also run with a long wire and currently "hermes-lite" has more spots than n1dq on pskreporter. ;) But I think that has more to do with the depth of decode done by SparkSDR versus the depth of the decode possible on the Zynq's ARM A9 cores. I ran some experiments last year with Pavel's FT8 decoder recompiled for a standard PC. I had to change settings in the code related to decode depth to get more spots. There are more details in this github issue.

 

Sorry to hear that your HL2 still drops ethernet connection. I remember you submitted some patches to try and fix this. Unfortunately I can't replicate the problem. My HL2 on my home network with DHCP and the one at a remote location with direct connection are both rock solid and up for weeks. I haven't heard of any recent reports similar to yours. Do you have suggestions for how I might replicate your problem? Do you want to send your HL2 to me for possible fix or exchange?

 

Regarding a soft processor, I am a big fan of the work by enjoy-digital with the vex risc-v processor. I would probably use that setup for cpu/ethernet/etc in a larger FPGA. This will run Linux, but you need about 32MB or more to do that reasonably. That requires the FPGA to have external DRAM which requires significant pins and PCB area. I might be able to squeeze in a hyperram which can be used for linux. Bare metal code is more likely. At my day job we use a soft RISC-V for DRAM controller training. It should be able to do DHCP, ICMP and other similar tasks like a MCU would.

 

73,

 

Steve

kf7o

 

Steve Haynal

unread,
Dec 6, 2021, 11:58:40 PM12/6/21
to Hermes-Lite
Hi Mike and Philip,

Maybe we can narrow down the problem. One common reason for disconnect is the watchdog timer timing out. The watchdog timer can be disabled in SparkSDR, Quisk and for any software using hermeslite.py. Since you are using SparkSDR, try disabling the watch dog timer as shown in the picture below.

Another possibility is that something happened during DHCP renewal. Are you using DHCP or fixed IP? Please try a fixed IP on your subnet and see if the problem goes away. See this wiki page for details:
Which version of gateware are you using?

Can you ping the computer at the expected address after the event? Can you access the computer with hermeslite.py after the event?

Mike, the last build which used the KSZ9021 had 3 or 4 units with a wrong resistor value. The problem can manifest as you describe. Is your ethernet phy a KSZ9021 or KSZ9031?

73,

Steve
kf7o

sparksdr2.png

si...@sdr-radio.com

unread,
Dec 7, 2021, 2:02:45 AM12/7/21
to Hermes-Lite

Mike,

 

Maybe you have an application / security software / worse which is port scanning. This will attempt to open every port it can find and just take it from there. A good router / switch with anti-virus can stop and log this.

 

It could also be a bad switch.

 

I have DOS attacks at least once a month, a good TP-Link switch / router takes care of this.

 

Simon Brown, G4ELI

https://www.sdr-radio.com

"Christoph v. Wüllen"

unread,
Dec 7, 2021, 3:19:43 AM12/7/21
to Steve Haynal, herme...@googlegroups.com


> Am 07.12.2021 um 05:58 schrieb Steve Haynal <softerh...@gmail.com>:
>
> Hi Mike and Philip,
>
> Maybe we can narrow down the problem. One common reason for disconnect is the watchdog timer timing out. The watchdog timer can be disabled in SparkSDR, Quisk and for any software using hermeslite.py. Since you are using SparkSDR, try disabling the watch dog timer as shown in the picture below.
>

Is this setting ADDR=0x39 bits 27:24 to 1001b? Since Mike is using piHPSDR I could
provide a checkbox that lets him disable the WDT. I think the default all
other bits of ADDDR=0x39 should be zero by default.

Yours Christoph DL1YCF.

Mike Lewis

unread,
Dec 7, 2021, 6:43:07 AM12/7/21
to Steve Haynal, Hermes-Lite

I have the KSZ9031 chip.  Gateway code version 72, ID 6.

 

I have been using the default settings in Quisk and others for the watchdog so far, which appears to be enabled.  I will set them to off when I use them next.  I am primarily running piHPSDR on 2 controllers with Pi4B and 7” touchscreens as SparkSDR and Quisk do not support the encoders and switch hardware on these units.   Over the last week since my return to this QTH I have only seen few disconnects, all within a 1 hour time span.

 

Both homes I use DHCP reservations, are setup similar, with the same newer TP-Link smart GigE switches for local connections in the shack, and the same router models.  I have swapped cables, GigE switches, and bypassed them, as well as used VLANS, same problems.   Since I can see these many times in hours or a day (at times, other times can go many days), DHCP lease expirations would not be the issue, they are 1 week or longer and is not in the picture for direct and VLAN test conditions.

 

I did have an IP address set in the older HL2’s EEPROM to match the DHCP reservation at my previous QTH.  I was reminded of that when I moved them to the current QTH network and it showed up after discovery being on a subnet 😊.   I use a different subnet at each QTH to allow for easier VPN config.  There is no VPN active during any of this. 

 

After a connection loss the HL2 and controllers are all active, you can ping them, and restart a connection within the app.

 

As mentioned, I ran them for a time direct connected, and more recently on smart switches with VLANs isolating each controller-HL2 pairing.  This would eliminate outside forces such as port scanning based intrusions.  My router is set to block most inbound ports and well-known ones are changed and port forwarded to specific endpoints. No ports are forwarded to the Pi controllers or HL2s so in theory they and the HL2s should not see any outside influences.  Since the disconnects are proven to happen in isolated networks, the problem lies inside the 2 endpoints.

 

The unpredictable nature of these does make testing difficult.  After a connection loss the HL2 LEDs revert to their normal pattern - address acquired and ready for a connection.  Perhaps a test build of the gateware could rapid flash a set of LEDs when a WD timeout occurs, and stay that way until power is cycled and/or a new connection takes place.  We can know when improvement attempts are positive or negative, and rule out if the WD is or is not being activated.

 

What is the WD timeout period?  Is it stored in EEPROM?   If one app like Quisk sets the WD timeout to OFF, and another app does not specifically change it, will the setting of OFF stick through each new connection and each power cycle?  A real example might be using Quisk to set the WD then start piHPSDR (assuming it does not change that state today). What state will the WD value be in on the HL2?

 

  • Mike

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of Steve Haynal
Sent: Monday, December 6, 2021 20:59
To: Hermes-Lite <herme...@googlegroups.com>
Subject: Re: HL2 Random Loss of Connection

 

Hi Mike and Philip,

 

Maybe we can narrow down the problem. One common reason for disconnect is the watchdog timer timing out. The watchdog timer can be disabled in SparkSDR, Quisk and for any software using hermeslite.py. Since you are using SparkSDR, try disabling the watch dog timer as shown in the picture below.

 

Another possibility is that something happened during DHCP renewal. Are you using DHCP or fixed IP? Please try a fixed IP on your subnet and see if the problem goes away. See this wiki page for details:

Which version of gateware are you using?

 

Can you ping the computer at the expected address after the event? Can you access the computer with hermeslite.py after the event?

 

Mike, the last build which used the KSZ9021 had 3 or 4 units with a wrong resistor value. The problem can manifest as you describe. Is your ethernet phy a KSZ9021 or KSZ9031?

 

73,

 

Steve

kf7o

 

--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hermes-lite/22f8e1e6-8ed0-4ef4-b355-34e5adc8ebe7n%40googlegroups.com.

20211123_210227.jpg

si...@sdr-radio.com

unread,
Dec 7, 2021, 7:05:12 AM12/7/21
to Hermes-Lite

Ah,

 

Another project on another continent has a problem with loss of connection when the firmware’s DHCP lease is renewed. Maybe use a static address and see if this solves the problem?

 

Simon Brown, G4ELI

https://www.sdr-radio.com

 

Mike Lewis

unread,
Dec 7, 2021, 7:10:50 AM12/7/21
to si...@sdr-radio.com, Hermes-Lite

Would not account dropped connection multiple times a day or in an hour. I did have static IP assigned to the matching reservation address, which was useful for operating with a direct connection and in the VLAN setup.  Problem still occurred.

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of si...@sdr-radio.com
Sent: Tuesday, December 7, 2021 04:05
To: 'Hermes-Lite' <herme...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

si...@sdr-radio.com

unread,
Dec 7, 2021, 7:14:30 AM12/7/21
to Mike Lewis, Hermes-Lite

Temperature?

 

There’s a very good reason why I stick to software; when that breaks I can blame the users.

 

Simon Brown, G4ELI

https://www.sdr-radio.com

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of Mike Lewis
Sent: 07 December 2021 12:11
To: si...@sdr-radio.com; 'Hermes-Lite' <herme...@googlegroups.com>
Subject: RE: HL2 Random Loss of Connection

 

Would not account dropped connection multiple times a day or in an hour. I did have static IP assigned to the matching reservation address, which was useful for operating with a direct connection and in the VLAN setup.  Problem still occurred.

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of si...@sdr-radio.com
Sent: Tuesday, December 7, 2021 04:05
To: 'Hermes-Lite' <herme...@googlegroups.com>
Subject: RE: HL2 Random Loss of Connection

 

Ah,

 

Another project on another continent has a problem with loss of connection when the firmware’s DHCP lease is renewed. Maybe use a static address and see if this solves the problem?

 

Simon Brown, G4ELI

https://www.sdr-radio.com

 

Both homes I use DHCP reservations, are setup similar, with the same newer TP-Link smart GigE switches for local connections in the shack, and the same router models

--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hermes-lite/01a401d7eb62%24adc5e930%240951bb90%24%40sdr-radio.com.

--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

Ward Cunningham

unread,
Dec 7, 2021, 10:11:00 PM12/7/21
to Hermes-Lite
Steve,
Thank you for pointing this out. It has been my experience that SparkSDR stops decoding FT8 after a few hours. I disabled the Watchdog timer as you suggest and have seen 24 hours of uninterrupted decoding.  If it runs for a week I will report back here, maybe with some new visualizations.
Best regards -- Ward K9OX

Radio details:

Firmware version 71
Firmware patch 3
Board ID 5
Receivers 4

SparkSDR details:

Version 2.0.7.4
Avalonia Version 0.10.2.0

Message has been deleted

Mike Lewis

unread,
Dec 7, 2021, 11:45:50 PM12/7/21
to Steve Haynal, Hermes-Lite

Yes I have different MAC addresses set.  This was a problem long before the 2nd HL2 arrived, and It happens on direct connect and VLAN so the HL2 + SDR app machine are isolated from everything.  I have been listening to the radio several times when it suddenly stops.  I cannot tell if it is after 13 seconds since I only know when it happens when I cannot hear audio anymore (or see the spectrum display go blank), there are no log entries or packet errors to timestamp anything, the apps still run happily but with no spectrum display or audio.

 

At the moment it is not happening, just a few times at this QTH the other day.  It comes and goes.  I will keep the WD setting in mind once it starts dropping again.  I am running these 24/7. 

 

 

From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of Steve Haynal
Sent: Tuesday, December 7, 2021 20:07
To: Hermes-Lite <herme...@googlegroups.com>

Subject: Re: HL2 Random Loss of Connection

 

Hi Group,

 

Yes, setting ADDR=0x39 bits 27:24 to 1001b will disable the watchdog. You can also disable it when you start the radio from software as described here:

 

Yes, the value is sticky after software stops the radio. You can disable the watchdog with one software, disconnect, and then connect with other software and the watchdog will still be disabled. A complete power cycle of the HL2 does reset the value to the default of watchdog on.

 

The watchdog counts the packets sent to the PC versus packets received from the PC. If the number of missed but expected received packets from the PC reaches 4096, the HL2 automatically disconnects and turns transmit off. This is currently at almost 13 seconds, and is independent of the number and bandwidth of receivers in use. This means that something must have blocked PC->HL2 packets for 13 seconds, which is very unlikely in my opinion and indicates other problems with your network, computer or software.  The intent of the watchdog it to prevent runaway transmit and keep the radio accessible if software crashes. I never disable it.

 

Mike, I may be asking a question here that is obvious to you, but have you set different ethernet MAC addresses for your two units? All HL2s clone the same ethernet MAC, and different ethernet MAC addresses must be used for multiple HL2s on the same network. Do you see the problem if only one HL2 is on your network at a time?

 

73,

 

Steve

kf7o

 

 

 

 

On Tuesday, December 7, 2021 at 7:11:00 PM UTC-8 Ward Cunningham wrote:

Steve,
Thank you for pointing this out. It has been my experience that SparkSDR stops decoding FT8 after a few hours. I disabled the Watchdog timer as you suggest and have seen 24 hours of uninterrupted decoding.  If it runs for a week I will report back here, maybe with some new visualizations.

Best regards -- Ward K9OX

 

Radio details:

Firmware version 71

Firmware patch 3

Board ID 5

Receivers 4

 

SparkSDR details:

 

Version 2.0.7.4
Avalonia Version 0.10.2.0

On Monday, December 6, 2021 at 8:58:40 PM UTC-8 softerh...@gmail.com wrote:

Hi Mike and Philip,

 

Maybe we can narrow down the problem. One common reason for disconnect is the watchdog timer timing out. The watchdog timer can be disabled in SparkSDR, Quisk and for any software using hermeslite.py. Since you are using SparkSDR, try disabling the watch dog timer as shown in the picture below.

 

Another possibility is that something happened during DHCP renewal. Are you using DHCP or fixed IP? Please try a fixed IP on your subnet and see if the problem goes away. See this wiki page for details:

Which version of gateware are you using?

 

Can you ping the computer at the expected address after the event? Can you access the computer with hermeslite.py after the event?

 

Mike, the last build which used the KSZ9021 had 3 or 4 units with a wrong resistor value. The problem can manifest as you describe. Is your ethernet phy a KSZ9021 or KSZ9031?

 

73,

 

Steve

kf7o

 

--

You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

Steve Haynal

unread,
Dec 8, 2021, 1:13:43 AM12/8/21
to Hermes-Lite
Hi Mike, Philip and Group,

I took a quick look at the RTL tonight and there is another condition dating back to code from openhpsdr when the HL2 will disconnect: if it receives a destination unreachable message.

I suspect at times, maybe even very briefly, the host computer can't be reached and the HL2 receives one of these messages and disconnects. Which gateware variant are you using that exhibits the issue most often? I can build a test gateware image with this disabled and we can see if the problem is fixed.

73,

Steve
kf7o

Mike Lewis

unread,
Dec 8, 2021, 1:27:53 AM12/8/21
to Steve Haynal, Hermes-Lite

Looks like 72  I see the newest HL2 is at 73.  I should probably upgrade the older one.  I have yet to see the newest one lose connection but it has relatively little run time so too early to know for sure.

 

My older HL2 (built this summer)

 

 

The new HL2:

"Christoph v. Wüllen"

unread,
Dec 8, 2021, 4:32:28 AM12/8/21
to Steve Haynal, herme...@googlegroups.com

>
> The watchdog counts the packets sent to the PC versus packets received from the PC. If the number of missed but expected received packets from the PC reaches 4096, the HL2 automatically disconnects and turns transmit off.


Does this only happen during TX?

I think the number of missed packets since system start is not a good measure,
it makes a great difference if this limis is reached during one evening in the
shack, or during weeks of unattended man-made-noise-survey operation in the
field.



"Christoph v. Wüllen"

unread,
Dec 8, 2021, 4:33:25 AM12/8/21
to Ward Cunningham, herme...@googlegroups.com
Interesting observation. It makes me consider to add "disable watchdog for HL2"
to piHPSDR.

Steve Haynal

unread,
Dec 8, 2021, 11:27:45 AM12/8/21
to Hermes-Lite
Hi Christoph,

The watchdog is not from system start, but from the last packet received from the host. Every time a packet is received form the host the watchdog timer is reset. So there must be >12 seconds of time without receiving any packets from the host for the watchdog to stop the HL2.

The watchdog runs during RX too as there were cases in the early days when software would crash on a system but the HL2 would continue to send packets as the system was still up. There was no easy way to recover from this situation, especially since we limit stopping the HL2 to the MAC which started the HL2.

73,

Steve
kf7o 

Ward Cunningham

unread,
Dec 8, 2021, 11:46:15 AM12/8/21
to Hermes-Lite
What keeps the watchdog happy when skimming ft8 that runs on a 15 second interval?

Aside: I left my traffic rate skimmer running overnight only to find that Apple had remotely rebooted my computer with a system upgrade. 

Steve Haynal

unread,
Dec 8, 2021, 11:57:52 AM12/8/21
to Hermes-Lite
Hi Ward,

Software sends packets to the HL2 even when not transmitting. Historically these are to send back the audio data which can then be played through a headphone jack on the radio. This data is at a constant rate of 48kHz. Each packet from the host contains 126 samples. So the HL2 should receive a packet from the host every 126 * (1/48000) = ~3.4ms. A timeout of ~12 seconds means over 3000 of these packets were missed, a good indication that the host may be unresponsive. The only exceptions to this rule I know of are CW skimmer and the hermeslite.py based receivers. The watchdog disable was added to support those.

73,

Steve
kf7o

"Christoph v. Wüllen"

unread,
Dec 8, 2021, 1:02:06 PM12/8/21
to Steve Haynal, herme...@googlegroups.com
OK, then 12 secs are even on the long side, if it comes to PA protection.

Normally SDR software sends "empty" TXIQ/audio samples to the radio while
RXing (in case there is an audio code in the radio)

(Note 126/48000 is 2.6 msec)

Note further that you need these packets normally because they contain
C&C data such as frequency changes or whatever.

It appears strange to me that it has reported here
that disabling WDT makes a difference.
For piHPSDR at least, it does not seem necessary to add a WDT disable
button, since it should never trigger.
> --
> You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hermes-lite/522f3fdc-f8e9-4665-b8e0-f0693bdffc9an%40googlegroups.com.

Mike Lewis

unread,
Dec 8, 2021, 1:05:24 PM12/8/21
to "Christoph v. Wüllen", Steve Haynal, herme...@googlegroups.com
Perhaps making one of the HL2 LEDs rapid flash when the WD triggers will help identify if it ever triggers.

-----Original Message-----
From: herme...@googlegroups.com <herme...@googlegroups.com> On Behalf Of "Christoph v. Wüllen"
Sent: Wednesday, December 8, 2021 10:02
To: Steve Haynal <softerh...@gmail.com>
Cc: herme...@googlegroups.com
Subject: Re: HL2 Random Loss of Connection

> To view this discussion on the web visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fhermes-lite%2F522f3fdc-f8e9-4665-b8e0-f0693bdffc9an%2540googlegroups.com&amp;data=04%7C01%7C%7C372dae325e9b44246aa908d9ba74d92d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637745833303045939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=TRs%2Fwt7Kq3u%2BaQVUR%2BBTokHFUk1Trf1FF%2BNr3xx9LTY%3D&amp;reserved=0.

--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.
To view this discussion on the web visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fhermes-lite%2FC57A3F40-5184-4154-B31C-51EF2E7677DE%2540darc.de&amp;data=04%7C01%7C%7C372dae325e9b44246aa908d9ba74d92d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637745833303055891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=U2l%2FFQ4F%2BtE6bzJLOHfE9sMizzltN0CsONrSZLSqw0U%3D&amp;reserved=0.

DL1SIG

unread,
Dec 8, 2021, 4:10:49 PM12/8/21
to Steve Haynal, Hermes-Lite
Steve Haynal:
> I took a quick look at the RTL tonight and there is another condition
> dating back to code from openhpsdr when the HL2 will disconnect: if it
> receives a destination unreachable message.
> http://www.networksorcery.com/enp/protocol/icmp/msg3.htm
>
> I suspect at times, maybe even very briefly, the host computer can't be
> reached and the HL2 receives one of these messages and disconnects. Which
> gateware variant are you using that exhibits the issue most often? I can
> build a test gateware image with this disabled and we can see if the
> problem is fixed.

But such an ICMP packet would only be generated if there is a router in
between, no? Otherwise there's nothing that detects the unreachable host
and generates the packet.

On Linux the kernel generates such packets internally for example when
ARP fails, but as far as I know the HL2 doesn't do this.

73, Simon DL1SIG

Steve Haynal

unread,
Dec 9, 2021, 10:56:20 PM12/9/21
to Hermes-Lite
Hi Simon and Group,

Good point. This may not be the issue for Mike as there are no routers. Mike reported to me that his problem seems to have gone away with a gateware update. Philip may still have this problem and I think he has a setup with routers.

73,

Steve
kf7o

philip.j.s...@gmail.com

unread,
Dec 12, 2021, 8:21:26 PM12/12/21
to Hermes-Lite
I don't have a router in the path, but there are a bunch of other systems on the same network. I've disabled the watchdog and am going to see how long it runs for. If it can manage a week, then I'm going to re-enable the watchdog and see if it can make another week. 

Philip

Steve Haynal

unread,
Dec 13, 2021, 12:38:40 AM12/13/21
to Hermes-Lite
Hi Philip,

Okay. I will hold off on the experimental release that ignores the unreachable message until we hear if disabling the watchdog solves your problem.

73,

Steve
kf7o

philip.j.s...@gmail.com

unread,
Jan 8, 2022, 1:52:12 PM1/8/22
to Hermes-Lite
It still isn't totally reliable and I don't understand why. I do have the watchdog enabled and I had wireshark capturing all packets from the WIndows box to the HL2 that were not part of the regular chatter. All I would see would be ARPs and then it suddenly goes silent. There was (I think) a DHCP renewal shortly after that, but I'm not sure which is cause and which is effect. I'm beginning to wonder if the power to the HL2 is clean.... 

Philip

Mike Lewis

unread,
Jan 8, 2022, 2:35:27 PM1/8/22
to philip.j.s...@gmail.com, Hermes-Lite
I have had only 2 disconnects since i updated the gateware around Dec 9th. I have 2 hl2 paired to 2 pihpsdr controllers. Only 1 of them has the disconnects. When I return to my FL QTH in a month it will be interesting to see if the problem increases again, suggesting environmental causes.

Sent from my T-Mobile 4G LTE Device
Get Outlook for Android

From: herme...@googlegroups.com <herme...@googlegroups.com> on behalf of philip.j.s...@gmail.com <philip.j.s...@gmail.com>
Sent: Saturday, January 8, 2022 10:52:11 AM

To: Hermes-Lite <herme...@googlegroups.com>
Subject: Re: HL2 Random Loss of Connection
--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

Steve Haynal

unread,
Jan 10, 2022, 12:26:18 AM1/10/22
to Hermes-Lite
Hi Philip,

Didn't you send me private e-mail saying it was much more stable for you with the watchdog disabled? Are you just trying to track down the reason with the watchdog enabled now?

73,

Steve
kf7o

Ward Cunningham

unread,
Jan 10, 2022, 2:43:28 PM1/10/22
to Hermes-Lite
I turned off the watchdog timer and found it more reliable for two days, then returning to its intermittent freezing in the SparkSDR. A few on-off cycle of SparkSDR will bring it back to life without loosing the websocket connections already in place. Note: this is the "normal" failure. It will occasionally lock up to the point that killing and restarting SparkSDR is required. I don't know if this is a problem with SparkSDR on mac. I have no easy way to look upstream.

From this chart you can get an idea of how long traffic flows and how quickly I notice failure and "resuscitate" the connection.
Screen Shot 2022-01-10 at 11.26.34 AM.png
You can try this yourself by substituting your own domain, maybe localhost, into this url.
http://ft8.ward.asia.wiki.org/assets/pages/spark-decodes/bands.html?domain=nr.local

Steve Haynal

unread,
Jan 11, 2022, 1:12:27 AM1/11/22
to Hermes-Lite
Hi Ward,

Thanks for the graph. It would be good to get to the bottom of this. I can say that I run SparkSDR on a Linux box with 10 receivers for weeks at a time. I do not remember 1 disconnect in over a year. My biggest problem is my brother in law flipping a circuit breaker and turning everything off. Search for hermes-lite on https://pskreporter.info/pskmap.html

73,

Steve
kf7o

Ward Cunningham

unread,
Jan 12, 2022, 12:29:46 PM1/12/22
to Hermes-Lite
I have tried to correlate my disconnects with network traffic reported by my Unifi network console. This hasn't been illuminating as Unifi is more a corporate tool than an engineering tool. I do see EP6 count go up on SparkSDR but don't have a time series to which I might correlate. If SparkSDR reported that with other payloads I would plot it.

My son, who got me into HL-2, has suggested inserting a unix box with two network cards next to the HL-2 and appropriately isolate the radio from my sprawling and in some places aging LAN. I countered by suggesting he write me a fully instrumented go-language program as bridge between HL-2 and my client software. Should we do either, we will report results here.
Reply all
Reply to author
Forward
0 new messages