HL2 stops responding to commands

518 views
Skip to first unread message

Alan Hopper

unread,
Jan 21, 2018, 2:17:24 AM1/21/18
to Hermes-Lite
Hi List,
my HL2b3 got into a state where it was still receiving but not responding to commands like gain change. Stopping and starting worked (or appeared to) but did not fix it, powering of and on did. I've only seen this once in weeks of up time.  It could be my software and I've put some logging in to check .  I just wondered if anyone else had seen the same thing.  I did see this a long time ago with HL1 a few times.
73 Alan M0NNB

Alan Hopper

unread,
May 10, 2018, 2:45:32 PM5/10/18
to Hermes-Lite
Hi Group,
I've been seeing this happen often whilst testing the linux version of Spark. My Linux machine is an old laptop with only a wifi connection so I wonder if the issue is triggered by the more erratic delivery of packets over the network i.e. is some buffer/fifo under run or over run condition causing it.  It would be good to know if anyone else has seen this.  The symptoms are the radio still sending valid data but ignoring commands such as tuning, gain change and sample rate change. 
73 Alan M0NNB

Steve Haynal

unread,
May 11, 2018, 12:59:47 AM5/11/18
to Hermes-Lite
Hi Alan,

I've actually rewritten and simplified all the upstream and downstream queueing and control. I found and fixed some things that I imagine could lead to the problems you are seeing. Also, as with any major changes, I may have introduced some new issues, but I have been using this beta firmware on my HL2 without issues so far. I am close to making a new release, maybe this weekend. Jim is waiting and ready to add back the VNA functionality. Now that you have a better way to make the HL2 fail, I'd like to test for and fix if necessary these problems in this latest firmware. Github is up to date and you can try it now, although there are still a few loose ends that I want to take care of before releasing binary bitfiles.

73,

Steve
KF7O

Alan Hopper

unread,
May 11, 2018, 10:49:58 AM5/11/18
to Hermes-Lite
Hi Steve,
This is looking good.  I built the current code for beta 3 and so far it has behaved without fault. There is a bug in my current Linux code that appears in the same way ( accompanied by an increase in memory usage)  but goes away if you restart the software. So far I've not had to power cycle the radio which I had to before.

73 Alan M0NNB

Roger Jamieson ZL1AMI

unread,
Jul 17, 2018, 12:31:59 AM7/17/18
to Hermes-Lite
Hi Steve,

I am also having trouble with the gain control for the LNA failing to respond. Stays at minimum gain. It was intermittent but now it is permanently not working.I am using the latest Quisk, Linux Mint, and hl2b5.jic Is there anything I should check or change. I have tried hl2b5up but still have the problem. Tried older versions of Quisk. 
The hl2 goes really well when the LNA is working so I am looking forward to getting it on the air.
Thanks for all the work that goes into a project like this.
73 Roger ZL1AMI

Steve Haynal

unread,
Jul 17, 2018, 2:28:36 AM7/17/18
to Hermes-Lite
Hi Roger,

The LNA is set by the SPI bus between the FPGA and the AD9866. See AD9866 pins SDIO SCLK, SEN* and RESET*. Check those pins on the FPGA and AD9866 with a microscope or magnifying glass to make sure they are still soldered well.

Does it help if you apply slight pressure with your finger to the FPGA and/or AD9866? Apply the pressure before powering up as on power up the SPI bus is initialized. Keep the pressure applied during the test. This can help determine if it is an assembly issue or other.

Does only the LNA stop working or does the TX gain setting also stop working? They both go over SPI. If only the LNA has stopped working, then it can indicate a problem with the AD9866.

You can stick with the hl2b5up firmware. Firmware or software should not make a difference for this problem.

73,

Steve
KF7O

Steve Haynal

unread,
Jul 26, 2018, 12:23:18 AM7/26/18
to Hermes-Lite
Hi Roger,

Any luck with this? I'm wondering if the connection to your antenna was broken. A way to test if there is an LNA problem or antenna disconnect is to whether there is a change in the noise floor when you adjust the LNA. In Quisk you should see the noise floor change when the LNA is changed if the LNA is still working. Other software will compensate so you won't see the noise floor change.

73,

Steve
KF7O

Roger Jamieson ZL1AMI

unread,
Jul 26, 2018, 4:11:10 AM7/26/18
to Hermes-Lite
Hi Steve,

Apologies for not responding sooner but I have been out of town and busy with my work.
I have tried to find a bad connection but even with a microscope it is difficult to see, I have tried resoldering but nothing seems to fix it. When I change the RF gain slider the noise floor remains at about -130 db. In the past when it came to life the noise floor would change. It is still working but with very low gain as strong signals still come through. I have also tried SparkSDR and although the gain appears to change when changing the RF gain setting, the signals are still weak. A question -- if I use a continuity check on PCB tracks using a multimeter am I likely to damage an IC. I am not sure if the checker voltage is too high.
The problem slowly got worse over time and now I cannot get the RF gain to change at all. It used to be that when the LNA got hot enough it would go, but since I soldered the base of the IC to the PCB heatsink it has remained at minimum gain.
Any help would be appreciated

Thanks and 73

Roger

On Sunday, January 21, 2018 at 8:17:24 PM UTC+13, Alan Hopper wrote:

Graeme Jury

unread,
Jul 26, 2018, 4:21:00 AM7/26/18
to herme...@googlegroups.com
Hi Roger,

I hope that you can get some help for this but I suspect that the LNA may not be healthy. The fact that it went originally albeit intermittently and gradually got worse makes the LNA suspect. You could try measuring the voltage on the multimeter while set to the range you are going to use for continuity measurements as on some meters it is powered by a single 1.5 volt battery but of course on others it is a 9 volt 216. Best to let the FPGA experts answer that one. If the worst comes to the worst and you have to swap out the LNA, I do have a new one sourced from a reliable supplier here that you can use.

73, Graeme zl2apv
--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Steve Haynal

unread,
Jul 29, 2018, 9:41:14 PM7/29/18
to Hermes-Lite
Hi Roger and Graeme,

I've never seen problems with testing continuity with a typical DVM. Do you have hot air for soldering? Hot air can help reseat the AD9866. There have been previous cases with the Hermes-Lite 1 when the LNA did not work and reheating the AD9866 with hot air fixed the issue. If all else fails, you can ship it back to me for repair.

The LNA is integrated with the AD9866. Graeme, do you have an extra AD9866 to use for replacement?

73,

Steve
KF7O

To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite+unsubscribe@googlegroups.com.

Roger Jamieson ZL1AMI

unread,
Jul 30, 2018, 6:54:56 PM7/30/18
to Hermes-Lite
Hi Steve,

I checked my DVM and on ohms without the "beeper" on the voltage is about 1.1v. With the beeper ON this goes up to 7.5v. I have checked the tracks for continuity and they seem OK. I had wondered if a "via" had perhaps failed. I will resolder the AD9866 and see what happens. I will also check with the scope the signals on the I2C lines.In its present state the transmitter also has no output. The amplifier draws current but there is no RF power. If I have no success I will arrange to ship it back for repair. I am not confident I would get the IC off the PCB without damaging the tracks as I do not have the facilities to work with these types of IC's. Graeme ZL2APV may be able to assist as he does have a spare AD9866 so I will check before I send anything to you.

73, Roger


On Sunday, January 21, 2018 at 8:17:24 PM UTC+13, Alan Hopper wrote:

Roger Jamieson ZL1AMI

unread,
Aug 1, 2018, 5:54:00 PM8/1/18
to Hermes-Lite
Hi Steve,

I have checked all the tracks and soldered joints and they appear to be ok. I put the oscilloscope onto the SDIO ( pin 20 ) and the SCLK (pin 22) and observed from start up. After power up both lines simultaneously go from 0v to +3v for 180 mS and then back to 0v. There is a brief burst of activity 27mS after they return to 0v then nothing more.The SEN line also shows a burst of activity at the same time as the SDIO and SCLK. At startup the RESET goes to 3v and back to 0v at the same time as the SDIO and SCLK then goes to 3v again 12mS after going to 0v. No activity is seen on any line following the initial burst which lasts approximately 150uS. No activity is seen following the startup of Quisk and changing the RF gain does not produce any activity either.
 I decided to reload the FPGA and the programmer reported 100% success. I then decided to try a "verify" command which gets to 6% then reports a failure ( I had not previously done a verify ). I then decided to do an erase which reported successful. I then did a blank check which reported a failure. A download of the .jic file goes slowly to about 16% then reports 100% success. A verify at this point still reports a failure at about 6%.
I think your original thought that the AD9866 was not getting set up correctly is absolutely right as it appears as if the FPGA is not doing what it should. Any suggestions on where to from here?
I did have trouble with the original USB Blaster which would not work properly and after borrowing one from Graeme ZL2APV I was able to download successfully. Is there something in the Altera programming software I need to set up? I have looked at the setup and there does not seem to be anything obvious.
Any help gratefully received.

Regards

Roger

On Sunday, January 21, 2018 at 8:17:24 PM UTC+13, Alan Hopper wrote:

Steve Haynal

unread,
Aug 2, 2018, 12:52:27 AM8/2/18
to Hermes-Lite
Hi Roger,

I suspect your problems programming and verifying are due to a noisy connection from the USB blaster to the HL2. The ribbon cable must be short, and the connection to the HL2 must be solid, otherwise you may see programming errors. Can you post a picture of your setup?

Your RESET, SDIO and SCLK activity sound okay to me, except I'd expect you to see activity on SDIO and SCLK when the RF LNA gain is changed and when the TxDAC gain is changed. Do you see any activity for TxDAC changes? After the burst of activity, is SEN left high or low?

Are you no longer able to run Quisk after programming? Being able to run Quisk is a fairly good sign that the FPGA's EEPROM was programmed. When you run Quisk, what code version and ID are reported at the very top of the main window?

Can you post some high resolution photos of the AD9866 and FPGA area, front and back?

73,

Steve
KF7O

Alan Hopper

unread,
Jan 20, 2019, 4:50:22 PM1/20/19
to Hermes-Lite
Hi Group,
I've been doing a lot of testing recently in trying to get a new version of Spark ready. I have seen the HL2 stop a few times, generally after many hours, a software stop start always recovers it.  This could be a software issue on my side but logging indicates I'm still sending packets to the radio at the point it stops so it should not be triggering the watchdog. Just wondered if anyone sees the same thing?
73 Alan M0NNB

Josh Logan

unread,
Jan 21, 2019, 1:07:20 AM1/21/19
to Alan Hopper, Hermes-Lite
I have seen it stop responding after multiple hours as well.  I usually need to do a power cycle on the HL2 to get it working again.


--
You received this message because you are subscribed to the Google Groups "Hermes-Lite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

Steve Haynal

unread,
Jan 21, 2019, 1:19:17 AM1/21/19
to Hermes-Lite
Hi Alan and Josh,

Are you connecting at 100Mbs or 1000Mbs? Which firmware are you using? Which HL2, beta2, beta3, beta5+?

I run my HL2 for days without issue. Usually this is with Quisk, 1000Mbs and beta3 or beta5.

I will turn on my beta2 and see if it shows any issues. 

Let me know your exact setup and I will try to duplicate the failure.

73,

Steve
kf7o
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite+unsubscribe@googlegroups.com.

Alan Hopper

unread,
Jan 21, 2019, 1:30:02 AM1/21/19
to Hermes-Lite
Steve,
this is with a beta3 with the latest stable firmware(I believe) and 1000Mbs, I'll recheck the firmware. As I said it could well be my software, it is just very slow to test and debug as it does not happen often.
Josh,
what software are you running?
73 Alan M0NNB
To unsubscribe from this group and stop receiving emails from it, send an email to hermes-lite...@googlegroups.com.

Alan Hopper

unread,
Jan 21, 2019, 2:26:40 AM1/21/19
to Hermes-Lite
Steve,
if I understand the rtl correctly it looks like the effective watchdog sensitivity varies with the sample rate and number of receivers( as the ratio of sent to received packets changes). So with high sample rate and 3 receivers, a quite short network blip could trigger it.  Possibly clutching at straws here:)
73 Alan M0NNB

Josh Logan

unread,
Jan 21, 2019, 11:32:42 AM1/21/19
to Alan Hopper, Hermes-Lite

I have only seen this failure with spark sdr.  I am usually running 3 receivers on different bands with 192k or 368k sample rate.

Hardware is beta7, firmware is beta3.  Connected at 1000base-t.

Josh


--

Steve Haynal

unread,
Jan 21, 2019, 11:16:22 PM1/21/19
to Hermes-Lite
Hi Alan,

I took this into account when rewriting the control, which is already in the current firmware. The watchdog counter is only incremented when a bandscope dump is or would be sent upstream. This is sent at a constant rate and independent of the number of receivers. The control now makes sure of this. The watchdog counter is cleared every time a downstream packet is received, which is also a constant rate. Currently, once 255 bandscope dumps have or would have been sent upstream without any downstream packets received, the watchdog will disconnect the HL2.

Does your disconnect happen roughly every 12 hours? I'm wondering if there is an interaction with DHCP renewal and some DHCP servers.

You can turn off or extend the watchdog timer as described below to see if it is the culprit.

73,

Steve
kf7o



 In the file dsopenhpsdr1.v change the line:

logic   [ 7:0]  watchdog_cnt = 8'h00;

to 

logic   [ 9:0]  watchdog_cnt = 10'h00;

There will be some additional warnings which you can ignore for now, but that will require 1024 ack misses before the HL2 watchdog activates. I will make this (or whatever you find that works) the default in the next firmware release.

If you want to disable the watchdog, change the line:

 
watchdog_cnt <= watchdog_cnt + 8'h01;

to 


watchdog_cnt <= watchdog_cnt + 8'h00;

Alan Hopper

unread,
Jan 22, 2019, 1:31:10 AM1/22/19
to Hermes-Lite
Steve,
thanks for the explanation, I should have looked at the code harder.  It is more often  than 12 hrs, I've added more logging and have a few more things to try.
73 Alan M0NNB

Alan Hopper

unread,
Jan 22, 2019, 7:20:39 AM1/22/19
to Hermes-Lite
Just compared the logged times of failure with the windows event log and they all coincide with either a failed windows update or the computer briefly going to sleep (I found a bug in my code that I believed was keeping the pc awake), so hopefully there is no issue. Josh I hope to have a new version out soon and I'll leave the logging in so we can hopefully find your issue.
73 Alan M0NNB

Alan Hopper

unread,
Jan 23, 2019, 8:56:35 AM1/23/19
to Hermes-Lite
Hi Group,
I measured the watchdog timeout as approx 670ms ( firmware https://github.com/softerhardware/Hermes-Lite2/tree/master/firmware/bitfiles/stable/20180603). This is obviously fine when everything is working smoothly but wonder if it is a bit tight for things like long term skimming where short network interruptions or pc hiccups can be expected.
73 Alan M0NNB 

Joe

unread,
Jan 23, 2019, 10:51:29 AM1/23/19
to Hermes-Lite
Allan,

  I use the HL2 for the low frequency bands 2200M & 630M I frequently leave it running for WSPR and I use the HL2 on both RX and TX. In order to get it restarted 
 I must power down both the program (PowerSRR) and recycle the power to the HL2. Keep in mind that sometimes it will run for days and others it fails the first night when I'm  asleep. Data looks good on the lights of the HL2 and switch. At first I thought I had a bad connection somewhere on the board but everything looks OK.
 I'm not sure if the PC has trouble connecting to the internet or not but when it fails I'm not around anyway. I don't no the cause and it's not a showstopper
 just a nuisance. 

73 Joe  wa9cgz 

Steve Haynal

unread,
Jan 24, 2019, 1:01:02 AM1/24/19
to Hermes-Lite
Hi Alan and Joe,

Your measurement of 670ms agrees with my calculation. We can lengthen this in the next firmware release. 10bits would give us 2.68 seconds. Do you think that is enough?

Joe, I didn't realize you have a HL2. Which firmware are you running?

73,

Steve
kf7o

Alan Hopper

unread,
Jan 24, 2019, 2:21:41 AM1/24/19
to Hermes-Lite
Hi Steve and Joe
it is good to know my measurement agrees with theory.  2.68 seconds sounds better, I'll leave the logging running over the next few weeks and try and record typical network dropouts.  I might add an auto restart feature to Spark for really long dropouts (like a router rebooting).
I have occasionally had to power the radio off and on recently, I think this has mainly been when the radio has been left powered but not running for a day or so, I shall pay more attention and see if there is a pattern.
73 Alan M0NNB 

Alan Hopper

unread,
Feb 2, 2019, 5:54:39 AM2/2/19
to Hermes-Lite
Hi Group,
In a week of fairly constant use I've twice had my HL2 b3 freeze and get into a state where the power needs cycling, On one occasion I noticed it coincided with my router indicating an fault.  In the fault state it still responds to discovery but won't start.
73 Alan M0NNB

Steve Haynal

unread,
Feb 3, 2019, 1:46:41 AM2/3/19
to Hermes-Lite
Hi Alan,

For another measurement point, I started my beta 2 on January 20 skimming 40M with Quisk, and it has run without issue for the past 13 days. How many "hops" are in your setup? I just have one switch between the PC and the HL2, although there is substantial other traffic through the switch. I have changed the RTL on github to increase the watchdog timeout to 2.68 seconds as discussed previously. The code on github now builds again and contains this change. I am hoping to make a beta release with this change in the next few days.

73,

Steve
kf7o

Alan Hopper

unread,
Feb 3, 2019, 2:25:31 AM2/3/19
to Hermes-Lite
Hi Steve,
my network is simple (but busy) with just a BT supplied switch/adsl modem/wifi device and wire between computer-switch-hl2.  I've just given the HL2 a better psu just in case something was happening there.  
73 Alan M0NNB

Steve Haynal

unread,
Feb 4, 2019, 12:59:46 AM2/4/19
to Hermes-Lite
Hi Alan,

Okay. I have just started SparkSDR with 3 receivers skimming FT8 and WSPR using the newest firmware. In the past with SparkSDR, I've noticed that after a few days of operation, the FT8 spots list no longer updates. It is stuck at midnight UTC. It is as if it has reached a limit and doesn't progress forward even the receivers are still running.

73,

Steve
kf7o

Alan Hopper

unread,
Feb 4, 2019, 1:09:04 AM2/4/19
to Hermes-Lite
Hi Steve,
that's interesting, is there anything in the logs?  I suspect that is a different issue and one I have hopefully fixed in the imminent next release.
73 Alan M0NNB 

Steve Haynal

unread,
Feb 9, 2019, 1:33:19 PM2/9/19
to Hermes-Lite
Hi Alan,

After ~4 days SparkSDR is in the state where the radios and waterfalls are running, but ft8 decode appears stalled at midnight. There are no recent reports to pskreporter. WSPR is still running and reporting. In the logs I see a message like what is below maybe once or twice an hour, even recently when ft8 spots update are stalled. There is nothing else unusual in the logs. I will try your new version.

Did the added watchdog delay solve your problems with loss of connection? I still have not seen this issue. I was running with 4 receivers, 96 kHz. Are you running at a higher bandwidth?

73, 

Steve
kf7o
2/9/2019 3:06:06 AM v1.0 beta 9
ft8 parse 1105451559   10  -1.0   1394.   0   KL7NC KC0RF 73                        FT8
String was not recognized as a valid DateTime.
mscorlib
System.DateTime ParseExact(System.String, System.String, System.Globalization.DateTimeFormatInfo, System.Globalization.DateTimeStyles)
   at System.DateTimeParse.ParseExact(String s, String format, DateTimeFormatInfo dtfi, DateTimeStyles style)
   at radio.JTSpot.Parse(String s, String stationLocator, String mycall, Int32 tunedFrequency)
   at radio.FT8ModDemod.decode(String filename)






I reached the point after ~

Alan Hopper

unread,
Feb 9, 2019, 1:50:15 PM2/9/19
to Hermes-Lite
Steve,
thanks for testing that.  I think I had a number of issues, sleep and windows update were triggering the watchdog, I have not had that happen for a while now, when it did happen the radio did not need power cycling. I have had one instance recently where the radio froze and did need power cycling, this was exactly at the time I saw the big blue light on my router go orange.

Hopefully the SparkSDR 2  will fix the psk reporting issue.  I try and pick a different sample rate each time I start as I've been caught out in the past just testing at one rate.

73 Alan M0NNB

Steve Haynal

unread,
Feb 9, 2019, 1:55:20 PM2/9/19
to Hermes-Lite
Hi Alan,

I just started your latest version. I like the way it looks! One issue I noticed with this version and the last is that if my first receiver is 40M and I then add receivers at higher frequencies, the filters do not switch to the highest frequency. I have to start adding at 17M and then add lower frequencies. Also, I like the add to current or add to new receiver feature, but when I tried to go past 4 receivers using add to new receiver for WSPR on one of the bands I currently have another receiver on the program crashed. I have virtual receivers not disabled, so enabled. Maybe these are all operator problems...

73,

Steve
kf7o

Alan Hopper

unread,
Feb 9, 2019, 2:12:22 PM2/9/19
to Hermes-Lite
Steve,
this is a bit of an unplanned release triggered by your 4 receiver code so is in flux. The 'virtual receivers disabled' mode is only hours old and not tested beyond the number of firmware receivers, switching this option probably only works once per startup and probably only at the point one receiver is running. So probably best to set at startup if needed and restart the program to cancel.

The filters should work, have you selected one of the smart options in the radio settings filter dialog?

I'll do some more testing tomorrow.

73 Alan M0NNB

Steve Haynal

unread,
Feb 12, 2019, 11:32:24 PM2/12/19
to Hermes-Lite
Hi Alan,

Yes, the filters are working as they should. I thought that when switching 17M to 80M it cut out 20M and below as I heard the filter switch, but in fact was just switching to the 20/30M filter. I switched to the smart filter option too. Nice!

I haven't gotten back around to the new NCO, but will take a look at the rounding you pointed to. In the past I did simulate with that step rounded and truncated and noticed no difference so went with the simpler truncation.

73,

Steve
kf7o

Alan Hopper

unread,
Apr 4, 2019, 5:36:06 AM4/4/19
to Hermes-Lite
I had my hl2 get into a state where it responds to discovery commands but won't start. I left it in this state and added some debugging to the sw, I get a 'host unreachable' 10051 error from the windows socket functions.  I do tend to leave my hl2 on all the time and often not running (as it is in a shed at the bottom of the garden). Powering down and up fixed it. Next time it happens is there anything useful to check? I forgot to see if the router could see it.
73 Alan M0NNB

Steve Haynal

unread,
Apr 7, 2019, 12:53:57 AM4/7/19
to Hermes-Lite
Hi Alan,

I still have not seen this problem yet, and it sounds like your unit was up longer this time. Next time, it would be good to know what state the LEDs were in, if you can ping the unit, what the router still sees, and if there is any traffic (arp) to or from the unit.

73,

Steve
kf7o

Alan Hopper

unread,
Apr 7, 2019, 3:52:34 AM4/7/19
to Hermes-Lite
Steve,
yep these issues are now very rare for me, next time it happens I'll resist the temptation to repower and check these things.
73 Alan M0NNB
Reply all
Reply to author
Forward
0 new messages