HLv2 Beta3 unreliable starting

510 views
Skip to first unread message

Alan Hopper

unread,
Aug 29, 2017, 7:02:36 AM8/29/17
to Hermes-Lite
Hi Beta testers,
I find it can take a few goes of stopping and starting to get receive working, so far I've only been using Spark so it could be a Spark only issue, I just wondered if anyone else has seen this.
73 Alan M0NNB

in3otd

unread,
Aug 29, 2017, 3:01:35 PM8/29/17
to Hermes-Lite
Hello Alan,
I've used Spark SDR just a few times but also with the H-Lv2b3 I saw that often I have to start/stop/exit/relaunch to get it working. Similar behavior I saw with the v2b2 - don't know if it's a Windows issue, I did not try to investigate. IIRC also PowerSDR did not start reliably but it's a long time since I last used it. Quisk always works fine with the same HW in Linux.

73 de Claudio, IN3OTD / DK1CG

Steve Haynal

unread,
Aug 29, 2017, 11:55:41 PM8/29/17
to Hermes-Lite
Hi Alan,

I see an occasional startup issue, maybe <10% of the time. I often have issues when I run other software such as Quisk and then try to run Spark SDR without rebooting the HL2. I think other software may leave the HL2 in an unexpected state for Spark SDR. The same software restarts over and over again without issue. I also have problems connecting over wireless even though my wireless network should be fast enough. 

When you see the problem, is the HL2 pingable on your network, or is nothing responding? I will try to keep track of what happens when I start software. I just started Spark SDR without issue on my HL2 beta2.

73,

Steve
KF7O

Alan Hopper

unread,
Aug 30, 2017, 6:01:52 AM8/30/17
to Hermes-Lite
Claudio, Steve,

thanks for the reports, my problem seems more severe, I get a response to a discovery 99% of the time but it can take many goes to get it to start. In the fault condition it appears to be sending data and can be pinged. 

My current theory is that Spark is a bit more sensitive to a possible problem in the firmware.  The spec is a little unclear over which port number is responded to by the radio for the receive packet.  I believe the port used should be the one used by the start packet and not the discovery packet. Using the discovery packet prevents being able to run multiple radios/software on the same network.  Spark uses separate sockets for discovery and normal operation, for the HL1 and apache orion this seems to work fine here.
I caught the HL2b3 in the fault state a few times and it was either sending packets to where the discovery request came from or I think to a corrupt port number(need to confirm this). Other software may use the same socket for both operations so might be less sensitive.  I remember a similar issue with the early cva9 firmware.

Once it is running I can stop and start without a problem, if I send another discovery it will sometimes then fail on the next start.

73 Alan M0NNB

Takashi K

unread,
Aug 30, 2017, 4:55:26 PM8/30/17
to Hermes-Lite
Hi Steve,

There is a problem that it does not start well by combination of PiHPSDR and HL (both v1 and v2).
Redpitaya also had this problem, but Redpitaya firmware has been fixed.

I guess it's different from the problem with SparkSDR, but I hope to fix this issue, too.


73, Taka  ji1udd

Steve Haynal

unread,
Aug 31, 2017, 11:31:30 PM8/31/17
to Hermes-Lite
Hi Taka and Alan,

The HL2 is using firmware similar to the CVA9, but I don't think there were any changes after I forked. I just checked and all HL1 changes appear to be in the HL2 repository. I added this thread as a github issue for the next time I work on the firmware.

73,

Steve
KF7O

Jack Generaux

unread,
Sep 8, 2017, 9:14:52 PM9/8/17
to Hermes-Lite
I don't know if this is related.  I have been having startup problems with Quisk on my desktop computer;  Spark seems to be less of a problem.  Also, sometimes when trying to ping the HL2, it would take multiple tries before finding the device.  I have compiled the HL2 with Static IP so I can use it in the field.  I have two Gigabit ports on the computer and, on a whim, I tried the other port and it appears to have eliminated ot at least greatly reduced the problem.  Quisk now seem to reliably connecting.  The original port was an Atheros port and the one I change to is an Intel port.  FWIW may be a driver issue at least for my situation.

73,
Jack (W0FNQ)

Steve Haynal

unread,
Sep 9, 2017, 12:34:29 AM9/9/17
to Hermes-Lite
Hi Jack,

I saw similar behavior when I was using my HL2beta3 during the eclipse. I had a static IP programmed and a direct connection to a USB ethernet adapter on my laptop. This was with Windows. Quisk would not reliably connect. I had two USB ethernet adapters (1 USB2 and 1 USB3) and the USB3 adapter was much worse. The USB3 adapter was inexpensive and may have a poor Windows driver. It works well under Linux though. I suspect that some drivers are adding more latency than Quisk knows how to deal with. There are some parameters in Quisk to tweak. Maybe Jim has some advice. 

73,

Steve
KF7O

Alan Hopper

unread,
Oct 13, 2017, 5:54:38 AM10/13/17
to Hermes-Lite
Hi List,
I've noticed that when the HL2 is in the bad  state where it appears to be sending data to somewhere other than the pc running spark my home  wifi collapses (with shouts from the family!). The HL is on a wired network so I wonder if it is sending all its data direct to the wifi router (which is the dhcp provider) or somewhere at random on the wifi network?  

73 Alan M0NNB

Steve Haynal

unread,
Oct 14, 2017, 10:47:50 PM10/14/17
to Hermes-Lite
Hi Alan,

We can add a watchdog timer to prevent sending to some address without appropriate return packets from that address. I will have to try with my wifi to see if I can get it to send packets to some bogus address. I have seen this occasionally.

73,

Steve
KF7O

Alan Hopper

unread,
Oct 29, 2017, 8:39:58 AM10/29/17
to Hermes-Lite
Hi List,
I managed to capture the HL3 in a condition where on start it appears to send data but the radio sw shows nothing, sending stop causes the leds to change but data is still being sent.  The udp packets were being sent to the correct ip and port and the packets were the correct length but full of zeros, the packets were also appearing at a rate of approx 65158 per sec vs the expected 380 (for 48k 1 rx).  It feels like it can get stuck in a state where it just sends as fast as it can.
73 Alan M0NNB

Alan Hopper

unread,
Oct 30, 2017, 8:50:52 AM10/30/17
to Hermes-Lite
Hi list, 
I've played a little more with this, there seem to be two fault states, one sending continuous empty 1032 byte udp packets ( no hpsdr header or sync number) and one sending nothing. Once in these states  powering down seems necessary even though the run led will follow stop/start commands.  It only seems to go into these states when the run command is sent, once started I've never seen it fall into these states. I've attached an ethernet.v file that so far has worked for me, it was a shot in the dark so I don't know if it is a real solution or just lucky timing, it tries to sync the run command to the tx_clock.  

Start up can also appear to fail if you try to connect too soon after powerup (before dhcp), retrying works here.

73 Alan M0NNB
ethernet.v

Takashi K

unread,
Oct 30, 2017, 5:39:17 PM10/30/17
to Hermes-Lite
Hi Alan,

Thank you for your investigation.
I tried to use your RTL with my 100M firmware. Unfortunately PC SDR software could not find HL2. there was no ping response.
I have not check the detail yet. I will check later.

73, Taka  ji1udd

Takashi K

unread,
Oct 31, 2017, 8:58:32 AM10/31/17
to Hermes-Lite
Hi Alan,

Sorry, my modified ethernet.v for 100M operation was not perfect when I compiled last time.
I have fixed and recompiled firmware.
I checked power ON/OFF test ten times with piHPSDR.
Now HLv2 did not hang up. Improved! But five times, It took about ten seconds for the signal and sound to come out.
And there is still no signal and sound when using Spark SDR.
The active LED of ethernet was blinking at that time.
I changed single latch to double latch for tx_clock sync and added double latch for 76.8MHz domain. but nothing changed.
I attaced my revised firmware.
Does Spark SDR work well in your house ?

73, Taka  ji1udd
CTRX_HLv2b3_NR2_DHCP_171031.jic

Alan Hopper

unread,
Oct 31, 2017, 10:25:26 AM10/31/17
to Hermes-Lite
Hi Taka,
since adding the mod to the gigabit code Spark has not failed to start in 250+ start stop commands and 50+ power ups but the problem has always been intermittent so I could just be lucky.  The only failures I have had are if I connect before dhcp (6secs approx after power up) when Spark is still listing the radio from a previous discovery, a retry always works here.  When it fails for you does it need powering off or does a sw stop start work?

I tried your firmware and it has started every time so far, going back to the old firmware still fails sometimes.

I suspect I fooled myself about the number of failures initially by retrying too soon after powering down/up after real failures. 

Does Spark ever work for you?  

73 Alan M0NNB

Takashi K

unread,
Nov 1, 2017, 7:21:08 AM11/1/17
to Hermes-Lite
Hi Alan,

Now I confirmed that Spark SDR works well.
The cause is my wrong AGC setting on Spark. I was not aware of it because Spark works well with HLv1 and the same setting.
Thanks for your nice countermeasure.

73, Taka  ji1udd

Rick Koch

unread,
Nov 5, 2017, 2:31:24 PM11/5/17
to Hermes-Lite
Alan, I was having issues w/ piHPSDR (probably same as Taka) using a version1 HL CVA9. It would
more times than not start up incorrectly giving this on the console ": process_ozy_input_buffer: did not find sync: restarting".

So I tried recompiling the HL1 rtl with your ethernet.v changes (merged to HL1) and I'm happy to report that the issue is no
longer presenting itself. Although as you pointed out, if I don't wait long enough for the DHCP to complete
it does occur.

Thanks for providing a fix to this issue not only on HL2 but also HL1 !

-Rick / N1GP

Steve Haynal

unread,
Nov 5, 2017, 11:45:21 PM11/5/17
to Hermes-Lite
Hi Alan, Taka, Rick and Group,

I haven't been following this thread closely, but it sounds like you have some good fixes for ethernet. Would someone be willing to isolate these fixes and send me standard unix patch files for both the HL1 and HL2 branches? That way I won't have to figure out exactly what the changes are. I will apply them to both branches.

73,

Steve
KF7O

Rick Koch

unread,
Nov 6, 2017, 11:12:28 AM11/6/17
to Hermes-Lite
Alan's changes have improved my situation but have not fixed it completely.
Before the changes I rarely could complete a startup to piHPSDR (1.1.3) and I had 
to power sequence the HL1 each try  (as it was still sending packets) until a proper
startup occurred.

Now with the changes in the attached patch (deduced from Alans ethernet.v file) it always
fails the first time trying to connect to piHPSDR, but now I don't have to power sequence
the HL1 as it's in a listen state, and ALWAYS now on the second attempt it succeeds as
well as every time afterwards. Something about it's power on state piHPSDR doesn't like.
eth.patch

Steve Haynal

unread,
Nov 6, 2017, 11:52:29 PM11/6/17
to Hermes-Lite
Thanks Rick!

73,

Steve
KF7O

Alan Hopper

unread,
Nov 7, 2017, 2:52:15 AM11/7/17
to Hermes-Lite
Hi Rick,
when you get the failure on first start does the run light come on and does there appear to be ethernet traffic? I still have a suspicion that there is another failure mode that causes data to be sent to the wrong place, I've not been able to recreate it recently on HL2 but can on Orion with similar firmware (protocol 2).

Steve, my fix was a bit of 'high level debugging' ie a guess and I've not tracked it down to an exact low level cause so treat it with suspicion.

73 Alan M0NNB

Rick Koch

unread,
Nov 7, 2017, 7:40:53 AM11/7/17
to Hermes-Lite
Hi Alan,

The run (or Ethernet TX active) LED is LED6 (starting from LED1: ADC Positive Clip) correct? That LED does come on, both in the initial start
or subsequent starts when it works. There is ethernet traffic. I've attached a zipped up tcpdump pcap of an initial start where it didn't work. I
captured based on the mac, "tcpdump -i eth0 ether host 00:1c:c0:a2:22:dd -w dump.pcap"
dump.zip

Alan Hopper

unread,
Nov 7, 2017, 8:28:08 AM11/7/17
to Hermes-Lite
Hi Rick,
from the dump file it looks like pihpsdr is repeatedly  sending a start directly followed by a stop.  The radio appears to start( as it sometimes sends a packet) but is stopped instantly, I suspect this is a pihpsdr bug.
73 Alan M0NNB

Rick Koch

unread,
Nov 7, 2017, 6:09:18 PM11/7/17
to Hermes-Lite
Alan,

Thanks for looking at that pcap dump. I checked John's piHPSDR github site and low and behold he has a
new branch 1.2. I tried it and the behavior is very much different. Mostly good in that it connects and starts
right away. Still some bugs to be worked out. But I think your right, startup problems are most likely on
the piHPSDR side. Will update when I know more.

Takashi K

unread,
Nov 18, 2017, 9:09:13 PM11/18/17
to Hermes-Lite
Hi Steve,

Description in hermeslite.v for genarating C122_rst from rst ;
cdc_sync #(1) reset_C122 (.siga(rst), .rstb(rst), .clkb(clock_76p8_mhz), .sigb(C122_rst));
I think C122_rst is alway 0, because rstb is asserted while siga(rst) is 1.

I changed it with the following, and use C122_rst for 76.8MHz domain.
cdc_sync_rst reset_C122 (.rsta(rst), .clkb(clock_76p8_mhz), .rstb(C122_rst));
But I cannot find the difference between before and after modification by now. piHPSDR issue is more serious for me.

73, Taka  ji1udd
cdc_sync_rst.v

Steve Haynal

unread,
Nov 27, 2017, 1:52:09 AM11/27/17
to Hermes-Lite
Hi Taka, Rick and Alan,

I've reviewed these changes. Adding a CDC synchronizer or register on the run signal really messes up the timing. The run signal is stable so should be fine without a CDC sync. I haven't had any issues with starts of Quisk using my branch of the HL2 RTL. Can you provide the exact details of the failures you see?

* What software?
* What firmware version?
* What Hermes-Lite platform?
* What sequence of steps to trigger the problem?

73,

Steve
KF7O

Alan Hopper

unread,
Nov 27, 2017, 3:14:57 AM11/27/17
to Hermes-Lite
Steve,

the issue I see is after powering up the HLv2 Beta3 it sometimes goes into 2 possible fault states after sending a run command:-
1. The radio sends receiver packets at a very high rate to the correct place but full of zeros(including the sync no and rest of the hpsdr header).
2. The radio sends nothing but still appears to be in run mode.
Once in either state the lights will change on a stop but proper operation requires powering down and up.

Software will typically show a frozen empty waterfall.

I've seen it with Quisk and old and new versions of Spark (the unreleased new version has a completely re written udp code).

I've only seen it go into the fault state on a run command i.e. never once it is running.  

Firmware 20170904 mainly but also seen with previous versions.

The fault is intermittent and can sometimes take many stop starts to recreate.  My feeling is it happens more often after power up but It will happen on later stop/starts.  

I can well believe my fix is just a timing fluke but so far It has worked without fail so might provide a clue.  For my education, If the run command is already stable on the tx clock why would adding a reg mess up timing?( I don't have a good handle on timing and have yet to find a good beginners guide)

I did send some time playing with timing in Spark around the run command and turning off the wideband packet but had no effect.

I do remember sometimes having to reset the cva9 but never investigated.

73 Alan M0NNB

Takashi K

unread,
Nov 27, 2017, 4:58:09 PM11/27/17
to Hermes-Lite
Hi Steve,

> What software?
PowerSDR v3.3.9  : rarely
piHPSDR v1.1.3 : veriy often. v1.1.3 maybe has bug, but I have never heard this issue about other radios (Redpitaya was fixed).

>What firmware version?
Latest official v1 and v2.

My understanding is,

in network.v :
always @(posedge tx_clock)
 if (!run) begin
  run_destination_port <= udp_destination_port_sync;
  run_destination_ip <= udp_destination_ip_sync;
  run_destination_mac <= udp_destination_mac_sync;
 end

Since udp_destination_port_sync, udp_destination_ip_sync and udp_destination_mac_sync are parallel singnals,
I think that run_destination_port,  run_destination_ip and run_destination_mac have wrong value when run signal changes at tx_clock (meta stable).

73, Takashi  ji1udd

Steve Haynal

unread,
Dec 1, 2017, 11:53:58 PM12/1/17
to Hermes-Lite
Hi Alan,

Normally adding a pipeline register will not cause a problem. But that particular area of the RTL has multicycle paths inherited from the original RTL. There are timing constraints in the .sdc file to ignore paths in that area that are multicycle. Adding a register breaks these constraints. So many timing violations show up, some real and others false. The constraints have to be updated to identify and allow real multicycle paths so that the tool can optimize correct paths.

Since this area is sensitive, it is a good clue that there may be problems here...

73,

Steve
KF7O
Reply all
Reply to author
Forward
0 new messages