Performance and Headless.... Questions...

462 views

Skip to first unread message

wemana...@gmail.com

unread,

May 8, 2022, 10:08:58 PM5/8/22

to sdrtrunk

So... I am playing with the newer 050b3... and watching the various posts on various topics specifically the SIMD Vector stuff, and headless..

I already do "headless" on my main setups.. It basically makes a dummy X server that the program runs in and I can connect to it and check on it, and then disconnect... even play audio on things, till some recent lib change or program change broke it.. was quite handy to get that audio for some quickie checks... I don't need local audio, 99.9999999% of my audio is piped out to IceCast.

But this "new 'headless' " mode intrigues me, especially if its running on a Pi4, and even more so, since I do not need the waterfall or spectrum 99.9999999999999999% the CC data on events/messages is nice to review at times... but anyway...

Looking at the Issues on GitHub, I am not seeing this detailed on how it works, turning it on/off etc... Is there something that outlines this headless mode???????????

Next, would be the performance improvements vis a viv the SIMD vector stuff...

I did this on one active box, and 2 of my test setups... OK.. Meh??? I am not sure I notice much difference in any of this????

So WITHOUT A JAVA PROFILER.... how is this vector stuff improving things????? I am not really noticing any difference in top/htop in regards to operating.... Looking for ways to view this quantitatively to say yes or now with these its improving...

I freely admit that I have setups which are not optimized for SDRT, they were designed with OP25 in mind to replace Pi's running OP25.. although I moved one to production for a very very very heavy PI system using a Hydra SDR and works quite nicely, even before this... but I am not sure how I can measure these vector changes to go yeah they help, no they really don't do much.. I like to be able to put finger on things and yes its good, no its not really a help here...

Maybe these vector things become more important for the NEW MEGA SDR box I am building to replace 3 boxes and Pi's, and utilize the NBFM possibly, although that may not be possible as I have shared users on one channel and the one bozos use P25 on the main channel of interest, so with the lack of PL decode options, that may remove NBFM use for **me**... THANKS FOR THE SQUELCH! It is a blessing for other things.. maybe we could have a live streaming mode for IceCast for cough NOAA WX cough, put that at about a 1 billion on the wish list...my Pi can handle that, but it would be nice...

Not really sure that putting $200+ into a PI4 and AirSpy R2 is a good investment if the Pi4 can't keep up with the 10Mhz setting on it.. and for the 2 setups I might envision that setup it would require 10Mhz via AirSpy or a HackRFone to do them.. and not sure a PI4 could keep up with a heavy load PI, medium PII and a light PII even headless.... I am still intrigued to play with SDRT on a PI4..

So the tl;dr of this is:

Info on using this "headless mode?"

What way is there to measure these SIMD Vector enhancements v. non use????

sdrtrunk

unread,

May 9, 2022, 5:58:27 AM5/9/22

to sdrtrunk

Info on using this "headless mode?"

Headless mode normally means you're running the software on a machine that doesn't have a monitor/keyboard/mouse. You can also force headless mode by adding this command line switch when running the application

-Djava.awt.headless=true

What way is there to measure these SIMD Vector enhancements v. non use????

In the user preferences, there is a SIMD on/off switch. Keep in mind that these vector-capable workloads account for a smaller percentage of the overall workload of running the sdrtrunk application.

It would be a challenge to make a comparison between SIMD enabled and disabled because you might not see the same workloads across each run of the application. The calibration tests each SIMD-capable software component in isolation to determine which SIMD or non-SIMD variant to use for a given machine. You can see the scores for each component when you run the calibrations. Each score represents an average of the number of operations that it completed in a fixed timeframe. Each calibration goes through a warm-up test and then runs the actual test.

During development I ran these calibration tests on several different machines and the results were surprising. Many of the newest machines support 8 lanes of 32-bit floating point operations meaning you can do 8x floating point calculations in the same CPU cycle count as a single floating point calculation. A CPU that supports 8-lane operations will also support 4-lane and 2-lane operations. Some Intel CPUs support 16-lane (AVX-512) operations. For some tasks the non-SIMD variant performs best. For other machines that have 8-lane capable CPUs, the 4-lane SIMD variant performs better.

The SIMD optimizations are competing against the Java HotSpot compiler. At runtime, Java is able to 'auto-vectorize' chunks of code when it detects that the code is structured in a way that is compatible with vectorization. So, the cases I mentioned where the non-SIMD variants were faster, these can be cases where Java is already vectorizing the code and doing it more efficiently than I can express in code. Moreover, sdrtrunk has been using some SIMD vector operations all along, because of this Java auto-vectorization capability.

The best speed-ups that I saw during my tests were 7.5x for 8 lane SIMD operations.

cheers,

Denny

wemana...@gmail.com

unread,

May 10, 2022, 11:14:59 AM5/10/22

to sdrtrunk

On Monday, May 9, 2022 at 5:58:27 AM UTC-4 sdrtrunk wrote:

Info on using this "headless mode?"

Headless mode normally means you're running the software on a machine that doesn't have a monitor/keyboard/mouse

hehhehee... yeah... 99.999999% of my stuff is headless... the units in " 'my' DC" all basically share a head via KVM's of some sort....

I mostly "X in" to them via SSH and XDMCP, or SSH with X forwarding, and some have some SSH Linked KRFB/KRDC setups and another setup that the SDRT setup uses to headless X it. The only thing that has a head 24/7/365 is my "command desk."

This browser is run on a box no where near me. ....over a VPN, and SSH to some DC in +++++NO CARRIER

. You can also force headless mode by adding this command line switch when running the application
-Djava.awt.headless=true

I gave this a whirl
"./sdr-trunk -Djava.awt.headless=true"

, no joy... so I edited the sdr-trunk script and put it into the JAVA OPTS... so it does do headless..... guess this mode of headless is not *** for me ****, If you autostart up something etc.. it does work... *** I am *** just not comfortable with no way to check up on it...I check up on the SDRT's running for production use.. and at times to see whats going on on stuff I don't monitor... like CC rotation on Harris, and a CC failure causing a rotation on the main zone, I keep the next in rotation active for it, but on the Harris due to other issues thats not an option, and going forward once the full migration occurs it will need wide bandwidth SDR's aka AirSpy R2 to deal with it via SDRT..... The reason I don't use the auto rotation, is two fold. one for the one that likely ill need it in the future, I need channels 1-5 on Airspy, while 6-9 can be on a SDR V3 for now. with the ability to use 10MHz settings on an Airpsy this goes away, but that don't happen till that box is replaced.. shortly (I am on borrowed time with that system anyway...longer story and way way tangential to this!) .. and two is me and my mental view on how I want this done. at least for now.....

I do see some reduction in resource use (HTOP/TOP)...v. the GUI active... but again, some of that GUI data is of use, and again, I like to check up on things... I guess I was expecting more of an output like OP25 does to the terminal in this mode, and clearly I am wrong on that idea...

So *** FOR ME ** I guess this is not the path for my needs... again, *** my needs.... *** interesting, though...

What way is there to measure these SIMD Vector enhancements v. non use????

In the user preferences, there is a SIMD on/off switch. Keep in mind that these vector-capable workloads account for a smaller percentage of the overall workload of running the sdrtrunk application.

I am going to preface this as a discussion for understanding of these SIMD stuff. v the non SIMD version... this is not a critique of the program etc... its more discussion on how this benefits SDRT and when for my own edification... I like to see numbers/data I can crunch on to compare things...

I like to see data I can compare and contrast.. and I am just not seeing the huge differences like the 7.5-8x you listed above..but then again without a java profiler, would I even be able to????? And or some of the benefits that this was to offer... shrug... I guess *** MY ** EXPECTATIONS ARE OUT OF LINE with what happens/ed in this... based on your post,.... especially in re that I am not using some Ryzen ThreadRipper setup. IOW I read what I wanted to out of these improvements...

So for discussions sake.... I am just curious if these SIMD "optimizations" are being fought against the JAVA Compiler... then ummm... well... to what end does this get any added benefit v. your time etc. on them v. using that for bug squashing, enhancements etc..?????? And for the end user as well????? It took about 8 minutes or so to run those 18 optimizations on the one low end box that is active in production for this.. About 5 mins on the Ryzen and FX8300 test/play boxes... Couple with the minimal use possibilities??? ie: "vector-capable workloads account for a smaller percentage of the overall workload of running the sdrtrunk application." I know I have pursued coding something in a way to go, wait, I did all this to get just get this... yes. nothing even remotely on this level... which I wish I could so I could come up with a PL/DPL decoder on the NBFM... but thats another discussion...

But anyway.. this is what I see for some test setups, and playing around setups.. The main production units, one has SIMD setupsand the other has not been changed to 050b3 yet... that may happen over the weekend if I get some stuff I want to do with that box...

AMD A10-4655
WARN i.g.d.v.VectorUtilities - CPU supports maximum SIMD instructions of Species[float, 4, S_128_BIT]

AMD FX(tm)-8300 Eight-Core Processor
CPU supports maximum SIMD instructions of Species[float, 4, S_128_BIT]

Ryzen 5 3500
CPU supports maximum SIMD instructions of Species[float, 8, S_256_BIT]

Does that decide as 4 lanes of 128bit or ??? And 8 lanes of 256 bit as you discussed above... ??? Curious... So what do these do to help in re SDRT???? Especially in things like the USB thread???? Polyphaser thread likely???? Hetrodyne thread??? Other decoding process/threads for the CC, Grants???? IMBE and or AMBE decoding???? GUI improvements ie: drawing the data on the GUI especially in the message tab....for CC or grants etc....??? I seen names of stuff in there when it ran, and it may as well have been the teacher in Charlie Brown talking.. wahahahwhahahaahahahaahaha ....... WHOOOOOOOOOOOOSHHHHHHHHHHHHHHH!

Like I said, **** I think my expectations are way off base here ***** v. what is being used here by me, and v. development... which is clearly much much higher up the chain, than I will ever invest in, ever.

Additionally I am curious to understand what these get SDRT v. not using them and when using them on x CPU v. not using them may be a wash, especially if you are not using some Ryzen ThreadRipper, which I am thinking is where these come into play more ?????? V. the 5600g or a Pi4 SoC??? does the Broadcom SoC even do these on ARM????? or ARM at all???? Or would that be an option, like the new crapple ARM stuff????? Probably not???? I don't dig into CPU/SoC details like that much... wanna discuss antenna details.. :) ;) I can bore you to death.... :)

Again this is discussion for understanding, not a critique...

I LOVE SDRT.. it allows me to do things NOW that would take an arsenal of Pi's to do and radios be it SDR or actual scanner at $600/each... and couplers etc... SO THANK YOU...

sdrtrunk

unread,

May 16, 2022, 3:56:31 AM5/16/22

to sdrtrunk

Your CPUs show 128 bit and 256 bit SIMD support. These bits are divided into lanes for calculations based on the bit width of the values involved. For example, if you're performing byte addition, each byte is 8 bits wide and a 128-bit SIMD operation would support 16 parallel byte calculations (8 bits * 16 lanes = 128 bits SIMD). sdrtrunk mostly uses floating point math and floats are 32-bits wide. Thus, a 128-bit SIMD operation would support 4 lanes of 32-bits each (128 total) or 8 lanes of 32 bits each (256 total).

The speedups from using SIMD don't always translate into an overall speedup, or efficiency increase because each SIMD optimization focuses on such a small percentage of the overall workload. Likewise, I've only scratched the surface of what can be done across the sdrtrunk codebase. You've highlighted lots of other areas in the sdrtrunk codebase that could also benefirt from these optimizations.

So, why invest the effort?

Every little bit helps and I'll continue adding more SIMD optimizations as I have time. And, learning. For me, sdrtrunk has always been about teaching myself DSP and continuing to improve my Java skills.

cheers,

Denny

Reply all

Reply to author

Forward

0 new messages